This module implements a model of the frequency of updates of an origin and how long it takes to load it.
For each origin, a commit frequency is chosen deterministically based on the hash of its URL and assume all origins were created on an arbitrary epoch. From this we compute a number of commits, that is the product of these two.
And the run time of a load task is approximated as proportional to the number of commits since the previous visit of the origin (possibly 0).
generate_listed_origin(lister_id: uuid.UUID, now: Optional[datetime.datetime] = None) → swh.scheduler.model.ListedOrigin¶
Returns a globally unique new origin. Seed the last_update value according to the OriginModel and the passed timestamp.
lister – instance of the lister that generated this origin
now – time of listing, to emulate last_update (defaults to
OriginModel(type: str, origin: str)¶
Minimal run time for a visit (retrieved from production data)
Max run time for a visit
Run time per commit
EPOCH= datetime.datetime(2015, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)¶
The origin of all origins (at least according to Software Heritage)
Returns a random ‘average time between two commits’ of this origin, used to estimate the run time of a load task, and how much the loading architecture is lagging behind origin updates.
get_last_update(now: datetime.datetime) → datetime.datetime¶
Get the last_update value for this origin.
We assume that the origin had its first commit at EPOCH, and that one commit happened every self.seconds_between_commits(). This returns the last commit date before or equal to now.
get_current_snapshot_id(now: datetime.datetime) → bytes¶
Get the current snapshot for this origin.
To generate a snapshot id, we calculate the number of commits since the EPOCH, and hash it alongside the origin type and url.
load_task_characteristics(now: datetime.datetime) → Tuple[float, str, Optional[bytes]]¶
Returns the (run_time, end_status, snapshot id) of the next origin visit.
lister_process(env: swh.scheduler.simulator.common.Environment, lister_id: uuid.UUID) → Generator[simpy.events.Event, simpy.events.Event, None]¶
Every hour, generate new origins and update the last_update field for the ones this process generated in the past
load_task_process(env: swh.scheduler.simulator.common.Environment, task: swh.scheduler.simulator.common.Task, status_queue: swh.scheduler.simulator.common.Queue) → Iterator[simpy.events.Event]¶
A loading task. This pushes OriginVisitStatus objects to the status_queue to simulate the visible outcomes of the task.
Uses the load_task_duration function to determine its run time.