swh.scheduler.simulator.origins module#
This module implements a model of the frequency of updates of an origin and how long it takes to load it.
For each origin, a commit frequency is chosen deterministically based on the hash of its URL and assume all origins were created on an arbitrary epoch. From this we compute a number of commits, that is the product of these two.
And the run time of a load task is approximated as proportional to the number of commits since the previous visit of the origin (possibly 0).
- swh.scheduler.simulator.origins.generate_listed_origin(lister_id: UUID, now: datetime | None = None) ListedOrigin [source]#
Returns a globally unique new origin. Seed the last_update value according to the OriginModel and the passed timestamp.
- Parameters:
lister – instance of the lister that generated this origin
now – time of listing, to emulate last_update (defaults to
datetime.now()
)
- class swh.scheduler.simulator.origins.OriginModel(type: str, origin: str)[source]#
Bases:
object
- MIN_RUN_TIME = 0.5#
Minimal run time for a visit (retrieved from production data)
- MAX_RUN_TIME = 7200#
Max run time for a visit
- PER_COMMIT_RUN_TIME = 0.1#
Run time per commit
- EPOCH = datetime.datetime(2015, 9, 1, 0, 0, tzinfo=datetime.timezone.utc)#
The origin of all origins (at least according to Software Heritage)
- seconds_between_commits()[source]#
Returns a random ‘average time between two commits’ of this origin, used to estimate the run time of a load task, and how much the loading architecture is lagging behind origin updates.
- get_last_update(now: datetime) datetime [source]#
Get the last_update value for this origin.
We assume that the origin had its first commit at EPOCH, and that one commit happened every self.seconds_between_commits(). This returns the last commit date before or equal to now.
- swh.scheduler.simulator.origins.lister_process(env: Environment, lister_id: UUID) Generator[Event, Event, None] [source]#
Every hour, generate new origins and update the last_update field for the ones this process generated in the past
- swh.scheduler.simulator.origins.load_task_process(env: Environment, task: Task, status_queue: Queue) Iterator[Event] [source]#
A loading task. This pushes OriginVisitStatus objects to the status_queue to simulate the visible outcomes of the task.
Uses the load_task_duration function to determine its run time.