This module implements a model of the frequency of updates of an origin and how long it takes to load it.
For each origin, a commit frequency is chosen deterministically based on the hash of its URL and assume all origins were created on an arbitrary epoch. From this we compute a number of commits, that is the product of these two.
And the run time of a load task is approximated as proportional to the number of commits since the previous visit of the origin (possibly 0).
- swh.scheduler.simulator.origins.generate_listed_origin(lister_id: uuid.UUID, now: Optional[datetime.datetime] = None) swh.scheduler.model.ListedOrigin
Returns a globally unique new origin. Seed the last_update value according to the OriginModel and the passed timestamp.
lister – instance of the lister that generated this origin
now – time of listing, to emulate last_update (defaults to
- class swh.scheduler.simulator.origins.OriginModel(type: str, origin: str)
- MIN_RUN_TIME = 0.5
Minimal run time for a visit (retrieved from production data)
- MAX_RUN_TIME = 7200
Max run time for a visit
- PER_COMMIT_RUN_TIME = 0.1
Run time per commit
- EPOCH = datetime.datetime(2015, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)
The origin of all origins (at least according to Software Heritage)
Returns a random ‘average time between two commits’ of this origin, used to estimate the run time of a load task, and how much the loading architecture is lagging behind origin updates.
- get_last_update(now: datetime.datetime) datetime.datetime
Get the last_update value for this origin.
We assume that the origin had its first commit at EPOCH, and that one commit happened every self.seconds_between_commits(). This returns the last commit date before or equal to now.
- get_current_snapshot_id(now: datetime.datetime) bytes
Get the current snapshot for this origin.
To generate a snapshot id, we calculate the number of commits since the EPOCH, and hash it alongside the origin type and url.
- swh.scheduler.simulator.origins.lister_process(env: swh.scheduler.simulator.common.Environment, lister_id: uuid.UUID) Generator[simpy.events.Event, simpy.events.Event, None]
Every hour, generate new origins and update the last_update field for the ones this process generated in the past
- swh.scheduler.simulator.origins.load_task_process(env: swh.scheduler.simulator.common.Environment, task: swh.scheduler.simulator.common.Task, status_queue: swh.scheduler.simulator.common.Queue) Iterator[simpy.events.Event]
A loading task. This pushes OriginVisitStatus objects to the status_queue to simulate the visible outcomes of the task.
Uses the load_task_duration function to determine its run time.