Command-line interface

Shared command-line interface

swh scheduler

Software Heritage Scheduler tools.

Use a local scheduler instance by default (plugged to the main scheduler db).

swh scheduler [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

-d, --database <database>

Scheduling database DSN (imply cls is ‘local’)

-u, --url <url>

Scheduler’s url access (imply cls is ‘remote’)

--no-stdout

Do NOT output logs on the console

Commands

celery-monitor

Monitoring of Celery

journal-client

Keep the the origin visits stats table up to…

origin

Manipulate listed origins.

rpc-serve

Starts a swh-scheduler API HTTP server.

simulator

Scheduler simulator.

start-listener

Starts a swh-scheduler listener service.

start-runner

Starts a swh-scheduler runner service.

task

Manipulate tasks.

task-type

Manipulate task types.

Scheduler task utilities

swh scheduler task

Manipulate tasks.

swh scheduler task [OPTIONS] COMMAND [ARGS]...

add

Schedule one task from arguments.

The first argument is the name of the task type, further ones are positional and keyword argument(s) of the task, in YAML format. Keyword args are of the form key=value.

Usage sample:

swh-scheduler –database ‘service=swh-scheduler’ task add list-pypi

swh-scheduler –database ‘service=swh-scheduler’ task add list-debian-distribution –policy=oneshot distribution=stretch

Note: if the priority is not given, the task won’t have the priority set, which is considered as the lowest priority level.

swh scheduler task add [OPTIONS] TYPE [OPTIONS]...

Options

-p, --policy <policy>
Options

recurring | oneshot

-P, --priority <priority>
Options

low | normal | high

-n, --next-run <next_run>

Arguments

TYPE

Required argument

OPTIONS

Optional argument(s)

archive

Archive task/task_run whose (task_type is ‘oneshot’ and task_status is ‘completed’) or (task_type is ‘recurring’ and task_status is ‘disabled’).

With –dry-run flag set (default), only list those.

swh scheduler task archive [OPTIONS]

Options

-b, --before <before>

Task whose ended date is anterior will be archived. Default to current month’s first day.

-a, --after <after>

Task whose ended date is after the specified date will be archived. Default to prior month’s first day.

--batch-index <batch_index>

Batch size of tasks to read from db to archive

--bulk-index <bulk_index>

Batch size of tasks to bulk index

--batch-clean <batch_clean>

Batch size of task to clean after archival

--dry-run, --no-dry-run

Default to list only what would be archived.

--verbose

Verbose mode

--cleanup, --no-cleanup

Clean up archived tasks (default)

--start-from <start_from>

(Optional) default page to start from.

list

List tasks.

swh scheduler task list [OPTIONS]

Options

-i, --task-id <ID>

List only tasks whose id is ID.

-t, --task-type <TYPE>

List only tasks of type TYPE

-l, --limit <limit>

The maximum number of tasks to fetch.

-s, --status <STATUS>

List tasks whose status is STATUS.

Options

next_run_not_scheduled | next_run_scheduled | completed | disabled

-p, --policy <policy>

List tasks whose policy is POLICY.

Options

recurring | oneshot

-P, --priority <priority>

List tasks whose priority is PRIORITY.

Options

all | low | normal | high

-b, --before <DATETIME>

Limit to tasks supposed to run before the given date.

-a, --after <DATETIME>

Limit to tasks supposed to run after the given date.

-r, --list-runs

Also list past executions of each task.

list-pending

List tasks with no priority that are going to be run.

You can override the number of tasks to fetch with the –limit flag.

swh scheduler task list-pending [OPTIONS] TASK_TYPES...

Options

-l, --limit <num_tasks>

The maximum number of tasks to fetch

-b, --before <before>

List all jobs supposed to run before the given date

Arguments

TASK_TYPES

Required argument(s)

respawn

Respawn tasks.

Respawn tasks given by their ids (see the ‘task list’ command to find task ids) at the given date (immediately by default).

Eg.

swh-scheduler task respawn 1 3 12

swh scheduler task respawn [OPTIONS] TASK_IDS...

Options

-n, --next-run <DATETIME>

Re spawn the selected tasks at this date

Arguments

TASK_IDS

Required argument(s)

schedule

Schedule tasks from a CSV input file.

The following columns are expected, and can be set through the -c option:

  • type: the type of the task to be scheduled (mandatory)

  • args: the arguments passed to the task (JSON list, defaults to an empty list)

  • kwargs: the keyword arguments passed to the task (JSON object, defaults to an empty dict)

  • next_run: the date at which the task should run (datetime, defaults to now)

The CSV can be read either from a named file, or from stdin (use - as filename).

Use sample:

cat scheduling-task.txt | python3 -m swh.scheduler.cli –database ‘service=swh-scheduler-dev’ task schedule –columns type –columns kwargs –columns policy –delimiter ‘;’ -

swh scheduler task schedule [OPTIONS] FILE

Options

-c, --columns <columns>

columns present in the CSV file

Options

type | args | kwargs | policy | next_run

-d, --delimiter <delimiter>

Arguments

FILE

Required argument

schedule_origins

Schedules tasks for origins that are already known.

The first argument is the name of the task type, further ones are keyword argument(s) of the task in the form key=value, where value is in YAML format.

Usage sample:

swh-scheduler –database ‘service=swh-scheduler’ task schedule_origins index-origin-metadata

swh scheduler task schedule_origins [OPTIONS] TYPE [OPTIONS]...

Options

-b, --batch-size <origin_batch_size>

Number of origins per task

Default

10

--page-token <page_token>

Only schedule tasks for origins whose ID is greater

Default

0

--limit <limit>

Limit the tasks scheduling up to this number of tasks

-g, --storage-url <storage_url>

URL of the (graph) storage API

--dry-run, --no-dry-run

List only what would be scheduled.

Arguments

TYPE

Required argument

OPTIONS

Optional argument(s)

swh scheduler task_type

Manipulate task types.

swh scheduler task_type [OPTIONS] COMMAND [ARGS]...

add

Create a new task type

swh scheduler task_type add [OPTIONS] TYPE TASK_NAME DESCRIPTION

Options

-i, --default-interval <default_interval>

Default interval (“90 days” by default)

--min-interval <min_interval>

Minimum interval (default interval if not set)

-i, --max-interval <max_interval>

Maximal interval (default interval if not set)

-f, --backoff-factor <backoff_factor>

Backoff factor

Arguments

TYPE

Required argument

TASK_NAME

Required argument

DESCRIPTION

Required argument

list

swh scheduler task_type list [OPTIONS]

Options

-v, --verbose

Verbose mode

-t, --task_type <task_type>

List task types of given type

-n, --task_name <task_name>

List task types of given backend task name

register

Register missing task-type entries in the scheduler.

According to declared tasks in each loaded worker (e.g. lister, loader, …) plugins.

swh scheduler task_type register [OPTIONS]

Options

-p, --plugins <plugins>

Registers task-types for provided plugins. Defaults to all

Options

all | loader.svn | loader.mercurial | loader.mercurial_from_disk | loader.git | loader.git_disk | loader.archive | loader.cran | loader.debian | loader.deposit | loader.nixguix | loader.npm | loader.pypi | lister.bitbucket | lister.cgit | lister.cran | lister.debian | lister.gitea | lister.github | lister.gitlab | lister.gnu | lister.launchpad | lister.npm | lister.packagist | lister.phabricator | lister.pypi | lister.sourceforge | deposit.worker

Scheduler server utilities

swh scheduler runner

Starts a swh-scheduler runner service.

This process is responsible for checking for ready-to-run tasks and schedule them.

swh scheduler runner [OPTIONS]

Options

-p, --period <period>

Period (in s) at witch pending tasks are checked and executed. Set to 0 (default) for a one shot.

swh scheduler listener

Starts a swh-scheduler listener service.

This service is responsible for listening at task lifecycle events and handle their workflow status in the database.

swh scheduler listener [OPTIONS]

swh scheduler rpc-serve

Starts a swh-scheduler API HTTP server.

swh scheduler rpc-serve [OPTIONS]

Options

--host <host>

Host to run the scheduler server api

--port <port>

Binding port of the server

--debug, --nodebug

Indicates if the server should run in debug mode. Defaults to True if log-level is DEBUG, False otherwise.

swh scheduler celery-monitor

Monitoring of Celery

swh scheduler celery-monitor [OPTIONS] COMMAND [ARGS]...

Options

--timeout <timeout>

Timeout for celery remote control

--pattern <pattern>

Celery destination pattern

list-running

List running tasks on the lister workers

swh scheduler celery-monitor list-running [OPTIONS]

Options

--format <format>

Output format

Options

pretty | csv

ping-workers

Check which workers respond to the celery remote control

swh scheduler celery-monitor ping-workers [OPTIONS]