Command-line interface#

Shared command-line interface#

swh scheduler#

Software Heritage Scheduler tools.

Use a local scheduler instance by default (plugged to the main scheduler db).

Expected configuration:

swh scheduler [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file. This has a higher priority than SWH_CONFIG_FILENAME environment variable if set.

-d, --database <database>#

Scheduling database DSN (imply cls is ‘postgresql’)

-u, --url <url>#

Scheduler’s url access (imply cls is ‘remote’)

--no-stdout#

Do NOT output logs on the console

Commands

add-forge-now

Manipulate add-forge-now requests.

celery-monitor

Monitoring of Celery

journal-client

Keep the the origin visits stats table up…

origin

Manipulate listed origins.

rpc-serve

Starts a swh-scheduler API HTTP server.

schedule-recurrent

Starts the scheduler for recurrent visits.

simulator

Scheduler simulator.

start-listener

Starts a swh-scheduler listener service.

start-runner

Starts a swh-scheduler runner service.

start-runner-first-visits

Starts a swh-scheduler runner service for…

task

Manipulate tasks.

task-type

Manipulate task types.

Scheduler task utilities#

swh scheduler task#

Manipulate tasks.

Expected configuration:

swh scheduler task [OPTIONS] COMMAND [ARGS]...

add#

Schedule one task from arguments.

The first argument is the name of the task type. Flag options (policy, priority) are task configuration. Further options are positional and keyword argument(s) of the task, in YAML format. Keyword args are of the form key=value.

Usage sample:


swh-scheduler --database 'service=swh-scheduler' \
    task add list-pypi


swh-scheduler --database 'service=swh-scheduler' \
    task add list-debian-distribution --policy=oneshot distribution=stretch

Note: if the priority is not given, the task won’t have the priority set, which is considered as the lowest priority level.

swh scheduler task add [OPTIONS] TASK_TYPE_NAME [OPTIONS]...

Options

-p, --policy <policy>#
Options:

recurring | oneshot

-P, --priority <priority>#
Options:

low | normal | high

-n, --next-run <next_run>#

Arguments

TASK_TYPE_NAME#

Required argument

OPTIONS#

Optional argument(s)

list#

List tasks.

swh scheduler task list [OPTIONS]

Options

-i, --task-id <ID>#

List only tasks whose id is ID.

-t, --task-type <TYPE>#

List only tasks of type TYPE

-l, --limit <limit>#

The maximum number of tasks to fetch.

-s, --status <STATUS>#

List tasks whose status is STATUS.

Options:

next_run_not_scheduled | next_run_scheduled | completed | disabled

-p, --policy <policy>#

List tasks whose policy is POLICY.

Options:

recurring | oneshot

-P, --priority <priority>#

List tasks whose priority is PRIORITY.

Options:

all | low | normal | high

-b, --before <DATETIME>#

Limit to tasks supposed to run before the given date.

-a, --after <DATETIME>#

Limit to tasks supposed to run after the given date.

-r, --list-runs#

Also list past executions of each task.

list-pending#

List tasks with no priority that are going to be run.

You can override the number of tasks to fetch with the –limit flag.

swh scheduler task list-pending [OPTIONS] TASK_TYPES...

Options

-l, --limit <num_tasks>#

The maximum number of tasks to fetch

-b, --before <before>#

List all jobs supposed to run before the given date

Arguments

TASK_TYPES#

Required argument(s)

respawn#

Respawn tasks.

Respawn tasks given by their ids (see the ‘task list’ command to find task ids) at the given date (immediately by default).

For example:


swh-scheduler task respawn 1 3 12
swh scheduler task respawn [OPTIONS] TASK_IDS...

Options

-n, --next-run <DATETIME>#

Re spawn the selected tasks at this date

Arguments

TASK_IDS#

Required argument(s)

schedule#

Schedule tasks from a CSV input file.

The following columns are expected, and can be set through the -c option:

- type: the type of the task to be scheduled (mandatory)
- args: the arguments passed to the task (JSON list, defaults to an empty
list)
- kwargs: the keyword arguments passed to the task (JSON object, defaults
to an empty dict)
- next_run: the date at which the task should run (datetime, defaults to
now)

The CSV can be read either from a named file, or from stdin (use - as filename).

Use sample:


cat scheduling-task.txt | \
    python3 -m swh.scheduler.cli \
        --database 'service=swh-scheduler-dev' \
        task schedule \
            --columns type --columns kwargs --columns policy \
            --delimiter ';' -
swh scheduler task schedule [OPTIONS] FILE

Options

-c, --columns <columns>#

columns present in the CSV file

Options:

type | args | kwargs | policy | next_run

-d, --delimiter <delimiter>#

Arguments

FILE#

Required argument

schedule_origins#

Schedules tasks for origins that are already known.

The first argument is the name of the task type, further ones are keyword argument(s) of the task in the form key=value, where value is in YAML format.

Usage sample:


swh-scheduler --database 'service=swh-scheduler' \
    task schedule_origins index-origin-metadata
swh scheduler task schedule_origins [OPTIONS] TYPE [OPTIONS]...

Options

-b, --batch-size <origin_batch_size>#

Number of origins per task

Default:

10

--page-token <page_token>#

Only schedule tasks for origins whose ID is greater

Default:

0

--limit <limit>#

Limit the tasks scheduling up to this number of tasks

-g, --storage-url <storage_url>#

URL of the (graph) storage API

--dry-run, --no-dry-run#

List only what would be scheduled.

Arguments

TYPE#

Required argument

OPTIONS#

Optional argument(s)

swh scheduler task_type#

Manipulate task types.

Expected configuration:

swh scheduler task_type [OPTIONS] COMMAND [ARGS]...

add#

Create a new task type

swh scheduler task_type add [OPTIONS] TYPE TASK_NAME DESCRIPTION

Options

-i, --default-interval <default_interval>#

Default interval (“90 days” by default)

--min-interval <min_interval>#

Minimum interval (default interval if not set)

-i, --max-interval <max_interval>#

Maximal interval (default interval if not set)

-f, --backoff-factor <backoff_factor>#

Backoff factor

Arguments

TYPE#

Required argument

TASK_NAME#

Required argument

DESCRIPTION#

Required argument

list#

swh scheduler task_type list [OPTIONS]

Options

-v, --verbose#

Verbose mode

-t, --task_type <task_type>#

List task types of given type

-n, --task_name <task_name>#

List task types of given backend task name

register#

Register missing task-type entries in the scheduler.

According to declared tasks in each loaded worker (e.g. lister, loader, …) plugins.

swh scheduler task_type register [OPTIONS]

Options

-p, --plugins <plugins>#

Registers task-types for provided plugins. Defaults to all

Scheduler server utilities#

swh scheduler runner#

Starts a swh-scheduler runner service.

This process is responsible for checking for ready-to-run tasks and schedule them.

Expected configuration:

swh scheduler runner [OPTIONS]

Options

-p, --period <period>#

Period (in s) at witch pending tasks are checked and executed. Set to 0 (default) for a one shot.

--task-type <task_type_names>#

Task types to schedule. If not provided, this iterates over every task types referenced in the scheduler backend.

--with-priority, --without-priority#

Determine if those tasks should be the ones with priority or not.By default, this deals with tasks without any priority.

swh scheduler listener#

Starts a swh-scheduler listener service.

This service is responsible for listening at task lifecycle events and handle their workflow status in the database.

Expected configuration:

swh scheduler listener [OPTIONS]

swh scheduler rpc-serve#

Starts a swh-scheduler API HTTP server.

Expected configuration:

swh scheduler rpc-serve [OPTIONS]

Options

--host <host>#

Host to run the scheduler server api

--port <port>#

Binding port of the server

--debug, --nodebug#

Indicates if the server should run in debug mode. Defaults to True if log-level is DEBUG, False otherwise.

swh scheduler celery-monitor#

Monitoring of Celery

swh scheduler celery-monitor [OPTIONS] COMMAND [ARGS]...

Options

--timeout <timeout>#

Timeout for celery remote control

--pattern <pattern>#

Celery destination pattern

list-running#

List running tasks on the lister workers

swh scheduler celery-monitor list-running [OPTIONS]

Options

--format <format>#

Output format

Options:

pretty | csv

ping-workers#

Check which workers respond to the celery remote control

swh scheduler celery-monitor ping-workers [OPTIONS]