swh.scheduler.journal_client module

swh.scheduler.journal_client.DISABLE_ORIGIN_THRESHOLD = 3

Threshold to disable failing origins

swh.scheduler.journal_client.MAX_NEXT_POSITION_OFFSET = 10

Max next position offset to avoid date computation overflow

swh.scheduler.journal_client.max_date(*dates: Optional[datetime.datetime]) datetime.datetime[source]

Return the max date of given (possibly None) dates

At least one date must be not None.

swh.scheduler.journal_client.from_position_offset_to_days(position_offset: int) int[source]
Compute position offset to interval in days. Note that this does not bound the

position_offset input so client code should limit the date computation to avoid overflow errors.

  • index in [0:1]: interval 1 day

  • index in [2:4]: interval 2 days

  • index in [5:+inf]: interval 4^(index-4) days


position_offset – The actual position offset for a given visit stats


The offset as an interval in number of days.

swh.scheduler.journal_client.next_visit_queue_position(queue_position_per_visit_type: Dict, visit_stats: Dict) datetime.datetime[source]

Compute the next visit queue position for the given visit_stats.

This takes the visit_stats next_position_offset value and compute a corresponding interval in days (with a random fudge factor of -/+ 10% range to avoid scheduling burst for hosters). Then computes out of this visit interval and the current visit stats’s position in the queue a new position.

As an implementation detail, if the visit stats does not have a queue position yet, this fallbacks to use the current global position (for the same visit type as the visit stats) to compute the new position in the queue. If there is no global state yet for the visit type, this starts up using the utcnow function as default value.

  • queue_position_per_visit_type – The global state of the queue per visit type

  • visit_stats – The actual visit information to compute the next position for


The actual next visit queue position for that visit stats

swh.scheduler.journal_client.get_last_status(incoming_visit_status: Dict, known_visit_stats: Dict) Tuple[swh.scheduler.model.LastVisitStatus, Optional[bool]][source]

Determine the last_visit_status and eventfulness of an origin according to the received visit_status object, and the state of the origin_visit_stats in db.

Note that at the time this function is called, out of order messages were already discarded. Thus why the implementation is rather simple.

  • incoming_visit_status – Incoming visit status read ouf of the journal

  • known_visit_stats – Visit stats already registered in the backend


A tuple of (LastVisitStatus, Optional[bool]). LastVisitStatus represents the successfulness of the visit. Optional[bool] represents whether the snapshot is fresher than before (True/False) or None if there is no snapshot at all.

swh.scheduler.journal_client.process_journal_objects(messages: Dict[str, List[Dict]], *, scheduler: swh.scheduler.interface.SchedulerInterface) None[source]