swh.scheduler.model module#

swh.scheduler.model.check_timestamptz(value) None[source]#

Checks the date has a timezone.

class swh.scheduler.model.BaseSchedulerModel[source]#

Bases: object

Base class for database-backed objects.

These database-backed objects are defined through attrs-based attributes that match the columns of the database 1:1. This is a (very) lightweight ORM.

These attrs-based attributes have metadata specific to the functionality expected from these fields in the database:

  • primary_key: the column is a primary key; it should be filtered out when doing an update of the object

  • auto_primary_key: the column is a primary key, which is automatically handled by the database. It will not be inserted to. This must be matched with a database-side default value.

  • auto_now_add: the column is a timestamp that is set to the current time when the object is inserted, and never updated afterwards. This must be matched with a database-side default value.

  • auto_now: the column is a timestamp that is set to the current time when the object is inserted or updated.

Method generated by attrs for class BaseSchedulerModel.

classmethod primary_key_columns() Tuple[str, ...][source]#

Get the primary key columns for this object type

classmethod select_columns() Tuple[str, ...][source]#

Get all the database columns needed for a select on this object type

classmethod insert_columns_and_metavars() Tuple[Tuple[str, ...], Tuple[str, ...]][source]#
Get the database columns and metavars needed for an insert or update on

this object type.

This implements support for the auto_* field metadata attributes.

evolve(**kwargs) SchedulerModelType[source]#

Alias to call attr.evolve() on this object, returning a new object.

to_dict(**kwargs) Dict[str, Any][source]#

Alias to call attr.asdict() on this object.

to_tuple(**kwargs) Tuple[Any][source]#

Alias to call attr.astuple() on this object.

class swh.scheduler.model.Lister(name: str, instance_name: str, id: UUID | None = None, current_state: Dict[str, Any] = NOTHING, created: datetime | None = None, updated: datetime | None = None, last_listing_finished_at: datetime | None = None, first_visits_queue_prefix: str | None = None, first_visits_scheduled_at: datetime | None = None)[source]#

Bases: BaseSchedulerModel

Method generated by attrs for class Lister.

class swh.scheduler.model.ListedOrigin(lister_id: UUID, url: str, visit_type: str, extra_loader_arguments: Dict[str, Any] = NOTHING, last_update: datetime | None = None, is_fork: bool | None = None, forked_from_url: str | None = None, enabled: bool = True, first_seen: datetime | None = None, last_seen: datetime | None = None)[source]#

Bases: BaseSchedulerModel

Basic information about a listed origin, output by a lister

Method generated by attrs for class ListedOrigin.

class swh.scheduler.model.LastVisitStatus(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

successful = 'successful'#
failed = 'failed'#
not_found = 'not_found'#
swh.scheduler.model.convert_last_visit_status(s: None | str | LastVisitStatus) LastVisitStatus | None[source]#
class swh.scheduler.model.OriginVisitStats(url: str, visit_type: str, last_successful: datetime | None = None, last_visit: datetime | None = None, last_visit_status: None | str | LastVisitStatus = None, last_scheduled: datetime | None = None, last_snapshot: bytes | None = None, next_visit_queue_position: int | None = None, next_position_offset: int = 4, successive_visits: int = 1)[source]#

Bases: BaseSchedulerModel

Represents an aggregated origin visits view.

Method generated by attrs for class OriginVisitStats.

url#
visit_type#
last_successful#
last_visit#
last_visit_status#
last_scheduled#
last_snapshot#
next_visit_queue_position#
next_position_offset#
successive_visits#
check_last_successful(attribute, value)[source]#
check_last_visit(attribute, value)[source]#
class swh.scheduler.model.SchedulerMetrics(lister_id: UUID, visit_type: str, last_update: datetime | None = None, origins_known: int = 0, origins_enabled: int = 0, origins_never_visited: int = 0, origins_with_pending_changes: int = 0)[source]#

Bases: BaseSchedulerModel

Metrics for the scheduler, aggregated by (lister_id, visit_type)

Method generated by attrs for class SchedulerMetrics.

lister_id#
visit_type#
last_update#
origins_known#

Number of known (enabled or disabled) origins

origins_enabled#

Number of origins that were present in the latest listings

origins_never_visited#

Number of enabled origins that have never been visited (according to the visit cache)

origins_with_pending_changes#

Number of enabled origins with known activity (recorded by a lister) since our last visit

class swh.scheduler.model.TaskType(type: str, description: str, backend_name: str, default_interval: timedelta | None = None, min_interval: timedelta | None = None, max_interval: timedelta | None = None, backoff_factor: float | None = None, max_queue_length: int | None = 1000, num_retries: int | None = None, retry_delay: timedelta | None = None)[source]#

Bases: BaseSchedulerModel

Type of schedulable tasks

Method generated by attrs for class TaskType.

type#

Short identifier for the task type

description#

Human-readable task description

backend_name#

Name of the task in the job-running backend

default_interval#

Default interval for newly scheduled tasks

min_interval#

Minimum interval between two runs of a task

max_interval#

Maximum interval between two runs of a task

backoff_factor#

Adjustment factor for the backoff between two task runs

max_queue_length#

Maximum length of the queue for this type of tasks, default to 1000 if not provided

num_retries#

Default number of retries on transient failures

retry_delay#

Retry delay for the task

class swh.scheduler.model.TaskArguments(args: List[Any] = [], kwargs: Dict[str, Any] = {})[source]#

Bases: BaseSchedulerModel

Method generated by attrs for class TaskArguments.

args#
kwargs#
class swh.scheduler.model.Task(type: str, arguments: TaskArguments, next_run: datetime, status: Literal['next_run_not_scheduled', 'next_run_scheduled', 'completed', 'disabled'] = 'next_run_not_scheduled', policy: Literal['recurring', 'oneshot'] = 'recurring', retries_left: int | None = None, id: int | None = None, current_interval: timedelta | None = None, priority: Literal['high', 'normal', 'low'] | None = None)[source]#

Bases: BaseSchedulerModel

Represents a schedulable task

Method generated by attrs for class Task.

type#

Task type

arguments#

Task arguments passed to the underlying job scheduler

next_run#

The interval between two runs of this task taking into account the backoff factor

status#

Status of the task

policy#

Whether the task is one-shot or recurring

retries_left#

The number of “short delay” retries of the task in case of transient failure

id#

Task Identifier (populated by database)

current_interval#

The next run of this task should be run on or after that time

priority#

Priority of the task, either low, normal or high

class swh.scheduler.model.TaskRun(task: int | None, id: int | None = None, backend_id: str | None = None, scheduled: datetime | None = None, started: datetime | None = None, ended: datetime | None = None, metadata: Dict[str, Any] | None = None, status: Literal['scheduled', 'started', 'eventful', 'uneventful', 'failed', 'permfailed', 'lost'] = 'scheduled')[source]#

Bases: BaseSchedulerModel

Represents the execution of a task sent to the job-running backend

Method generated by attrs for class TaskRun.

task#

Task identifier

id#

Task run identifier (populated by database)

backend_id#

id of the task run in the job-running backend

scheduled#

Scheduled run time for task

started#

Task starting time

ended#

Task ending time

metadata#

Useful metadata for the given task run. For instance, the worker that took on the job, or the logs for the run

status#

Status of the task run