swh.scheduler.journal_client module#
- swh.scheduler.journal_client.DISABLE_ORIGIN_THRESHOLD = 3#
Threshold to disable failing origins
- swh.scheduler.journal_client.MAX_NEXT_POSITION_OFFSET = 10#
Max next position offset to avoid date computation overflow
- swh.scheduler.journal_client.max_date(*dates: datetime | None) datetime [source]#
Return the max date of given (possibly None) dates
At least one date must be not None.
- swh.scheduler.journal_client.from_position_offset_to_days(position_offset: int) int [source]#
- Compute position offset to interval in days. Note that this does not bound the
position_offset input so client code should limit the date computation to avoid overflow errors.
index in [0:1]: interval 1 day
index in [2:4]: interval 2 days
index in [5:+inf]: interval 4^(index-4) days
- Parameters:
position_offset – The actual position offset for a given visit stats
- Returns:
The offset as an interval in number of days.
- swh.scheduler.journal_client.next_visit_queue_position(queue_position_per_visit_type: Dict[str, int], visit_stats: Dict) int [source]#
Compute the next visit queue position for the given visit_stats.
This takes the visit_stats next_position_offset value and compute a corresponding interval in “days” (with a random fudge factor of -/+ 10% range to avoid scheduling burst for hosters). Then computes out of this visit interval and the current visit stats’s position in the queue a new position.
As an implementation detail, if the visit stats does not have a queue position yet, this fallbacks to use the current global position (for the same visit type as the visit stats) to compute the new position in the queue. If there is no global state yet for the visit type, this starts up using 0 as default value.
- Parameters:
queue_position_per_visit_type – The global state of the queue per visit type
visit_stats – The actual visit information to compute the next position for
- Returns:
The actual next visit queue position for that visit stats
- swh.scheduler.journal_client.get_last_status(incoming_visit_status: Dict, known_visit_stats: Dict) Tuple[LastVisitStatus, bool | None] [source]#
Determine the last_visit_status and eventfulness of an origin according to the received visit_status object, and the state of the origin_visit_stats in db.
Note that at the time this function is called, out of order messages were already discarded. Thus why the implementation is rather simple.
- Parameters:
incoming_visit_status – Incoming visit status read ouf of the journal
known_visit_stats – Visit stats already registered in the backend
- Returns:
A tuple of (LastVisitStatus, Optional[bool]). LastVisitStatus represents the successfulness of the visit. Optional[bool] represents whether the snapshot is fresher than before (True/False) or None if there is no snapshot at all.