Command-line interface

Shared command-line interface

swh

Command line interface for Software Heritage.

swh [OPTIONS] COMMAND [ARGS]...

Options

-l, --log-level <log_level>

Log level (defaults to INFO).

Options

NOTSET|DEBUG|INFO|WARNING|ERROR|CRITICAL

--log-config <log_config>

Python yaml logging configuration file.

--sentry-dsn <sentry_dsn>

DSN of the Sentry instance to report to

auth

Authenticate Software Heritage users with OpenID Connect.

This CLI tool eases the retrieval of a bearer token to authenticate a user querying the Software Heritage Web API.

swh auth [OPTIONS] COMMAND [ARGS]...

Options

--oidc-server-url <oidc_server_url>

URL of OpenID Connect server (default to “https://auth.softwareheritage.org/auth/”)

--realm-name <realm_name>

Name of the OpenID Connect authentication realm (default to “SoftwareHeritage”)

--client-id <client_id>

OpenID Connect client identifier in the realm (default to “swh-web”)

generate-token

Generate a new bearer token for Web API authentication.

Login with USERNAME, create a new OpenID Connect session and get bearer token.

User will be prompted for his password and token will be printed to standard output.

The created OpenID Connect session is an offline one so the provided token has a much longer expiration time than classical OIDC sessions (usually several dozens of days).

swh auth generate-token [OPTIONS] USERNAME

Arguments

USERNAME

Required argument

login

Alias for ‘generate-token’

swh auth login [OPTIONS] USERNAME

Arguments

USERNAME

Required argument

logout

Alias for ‘revoke-token’

swh auth logout [OPTIONS] TOKEN

Arguments

TOKEN

Required argument

revoke-token

Revoke a bearer token used for Web API authentication.

Use TOKEN to logout from an offline OpenID Connect session.

The token is definitely revoked after that operation.

swh auth revoke-token [OPTIONS] TOKEN

Arguments

TOKEN

Required argument

db

Software Heritage database generic tools.

swh db [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

create

Create a database for the Software Heritage <module>.

and potentially execute superuser-level initialization steps.

Example:

swh db create -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables or by the mean of a properly crafted libpq connection URI. See psql(1) man page (section ENVIRONMENTS) for details.

Note: this command requires a postgresql connection with superuser permissions.

Example:

PGPORT=5434 swh db create indexer swh db create -d postgresql://superuser:passwd@pghost:5433/swh-storage storage

swh db create [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

-T, --template <template>

Template database from which to build this database.

Default

template1

Arguments

MODULE

Required argument

init

Initialize a database for the Software Heritage <module>.

Example:

swh db init -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables. See psql(1) man page (section ENVIRONMENTS) for details.

Examples:

PGPORT=5434 swh db init indexer swh db init -d postgresql://user:passwd@pghost:5433/swh-storage storage swh db init –flavor read_replica -d swh-storage storage

swh db init [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

--flavor <flavor>

Database flavor.

Arguments

MODULE

Required argument

init-admin

Execute superuser-level initialization steps (e.g pg extensions, admin functions, …)

Example:

PGPASSWORD=… swh db init-admin -d swh-test scheduler

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables or by the mean of a properly crafted libpq connection URI. See psql(1) man page (section ENVIRONMENTS) for details.

Note: this command requires a postgresql connection with superuser permissions (e.g postgres, swh-admin, …)

Example:

PGPORT=5434 swh db init-admin scheduler swh db init-admin -d postgresql://superuser:passwd@pghost:5433/swh-scheduler scheduler

swh db init-admin [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

Arguments

MODULE

Required argument

deposit

Deposit main command

swh deposit [OPTIONS] COMMAND [ARGS]...
admin

Server administration tasks (manipulate user or collections)

swh deposit admin [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Optional extra configuration file.

--platform <platform>

development or production platform

Options

development|production

collection

Manipulate collections.

swh deposit admin collection [OPTIONS] COMMAND [ARGS]...
create
swh deposit admin collection create [OPTIONS]

Options

--name <name>

Required Collection’s name

list

List existing collections.

This entrypoint is not paginated yet as there is not a lot of entry.

swh deposit admin collection list [OPTIONS]
deposit

Manipulate deposit.

swh deposit admin deposit [OPTIONS] COMMAND [ARGS]...
reschedule

Reschedule the deposit loading

This will:

  • check the deposit’s status to something reasonable (failed or done). That means that the checks have passed alright but something went wrong during the loading (failed: loading failed, done: loading ok, still for some reasons as in bugs, we need to reschedule it)

  • reset the deposit’s status to ‘verified’ (prior to any loading but after the checks which are fine) and removes the different archives’ identifiers (swh-id, …)

  • trigger back the loading task through the scheduler

swh deposit admin deposit reschedule [OPTIONS]

Options

--deposit-id <deposit_id>

Required Deposit identifier

user

Manipulate user.

swh deposit admin user [OPTIONS] COMMAND [ARGS]...
create

Create a user with some needed information (password, collection)

If the collection does not exist, the collection is then created alongside.

The password is stored encrypted using django’s utilities.

swh deposit admin user create [OPTIONS]

Options

--username <username>

Required User’s name

--password <password>

Required Desired user’s password (plain).

--firstname <firstname>

User’s first name

--lastname <lastname>

User’s last name

--email <email>

User’s email

--collection <collection>

User’s collection

--provider-url <provider_url>

Provider URL

--domain <domain>

The domain

exists

Check if user exists.

swh deposit admin user exists [OPTIONS] USERNAME

Arguments

USERNAME

Required argument

list

List existing users.

This entrypoint is not paginated yet as there is not a lot of entry.

swh deposit admin user list [OPTIONS]
status

Deposit’s status

swh deposit status [OPTIONS]

Options

--url <url>

(Optional) Deposit server api endpoint. By default, https://deposit.softwareheritage.org/1

--username <username>

Required (Mandatory) User’s name

--password <password>

Required (Mandatory) User’s associated password

--deposit-id <deposit_id>

Required Deposit identifier.

-f, --format <output_format>

Output format results.

Options

logging|yaml|json

upload

Software Heritage Public Deposit Client

Create/Update deposit through the command line.

More documentation can be found at https://docs.softwareheritage.org/devel/swh-deposit/getting-started.html.

swh deposit upload [OPTIONS]

Options

--username <username>

Required (Mandatory) User’s name

--password <password>

Required (Mandatory) User’s associated password

--archive <archive>

(Optional) Software archive to deposit

--metadata <metadata>

(Optional) Path to xml metadata file. If not provided, this will use a file named <archive>.metadata.xml

--archive-deposit, --no-archive-deposit

Deprecated (ignored)

--metadata-deposit, --no-metadata-deposit

Deprecated (ignored)

--collection <collection>

(Optional) User’s collection. If not provided, this will be fetched.

--slug <slug>

(Optional) External system information identifier. If not provided, it will be generated

--partial, --no-partial

(Optional) The deposit will be partial, other deposits will have to take place to finalize it.

--deposit-id <deposit_id>

(Optional) Update an existing partial deposit with its identifier

--swhid <swhid>

(Optional) Update existing completed deposit (status done) with new metadata

--replace, --no-replace

(Optional) Update by replacing existing metadata to a deposit

--url <url>

(Optional) Deposit server api endpoint. By default, https://deposit.softwareheritage.org/1

--verbose, --no-verbose

Verbose mode

--name <name>

Software name

--author <author>

Software author(s), this can be repeated as many times as there are authors

-f, --format <output_format>

Output format results.

Options

logging|yaml|json

fs

Software Heritage virtual file system

swh fs [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file (default: /home/jenkins/.config/swh/global.yml)

clean

Clean on-disk cache(s).

swh fs clean [OPTIONS]
mount

Mount the Software Heritage virtual file system at PATH.

If specified, objects referenced by the given SWHIDs will be prefetched and used to populate the virtual file system (VFS). Otherwise the VFS will be populated on-demand, when accessing its content.

Example:
$ mkdir swhfs
$ swh fs mount swhfs/
$ grep printf swhfs/archive/swh:1:cnt:c839dea9e8e6f0528b468214348fee8669b305b2
printf(“Hello, World!”);
$
swh fs mount [OPTIONS] PATH [SWHID]...

Options

-f, --foreground, -d, --daemon

whether to run FUSE attached to the console (foreground) or daemonized in the background (default: daemon)

Arguments

PATH

Required argument

[SWHID]...

Optional argument(s)

umount

Unmount a mounted virtual file system.

Note: this is equivalent to fusermount -u PATH, which can be used to unmount any FUSE-based virtual file system. See man fusermount3.

swh fs umount [OPTIONS] PATH

Arguments

PATH

Required argument

identify

Compute the Software Heritage persistent identifier (SWHID) for the given source code object(s).

For more details about SWHIDs see:

Tip: you can pass “-” to identify the content of standard input.

Examples:
$ swh identify fork.c kmod.c sched/deadline.c
swh:1:cnt:2e391c754ae730bd2d8520c2ab497c403220c6e3 fork.c
swh:1:cnt:0277d1216f80ae1adeed84a686ed34c9b2931fc2 kmod.c
swh:1:cnt:57b939c81bce5d06fa587df8915f05affbe22b82 sched/deadline.c
$ swh identify –no-filename /usr/src/linux/kernel/
swh:1:dir:f9f858a48d663b3809c9e2f336412717496202ab
$ swh identify –type snapshot helloworld.git/
swh:1:snp:510aa88bdc517345d258c1fc2babcd0e1f905e93 helloworld.git
swh identify [OPTIONS] OBJECTS...

Options

--dereference, --no-dereference

follow (or not) symlinks for OBJECTS passed as arguments (default: follow)

--filename, --no-filename

show/hide file name (default: show)

-t, --type <obj_type>

type of object to identify (default: auto)

Options

auto|content|directory|origin|snapshot

-x, --exclude <PATTERN>

Exclude directories using glob patterns (e.g., ‘*.git’ to exclude all .git directories)

-v, --verify <SWHID>

reference identifier to be compared with computed one

Arguments

OBJECTS

Required argument(s)

indexer

Software Heritage Indexer tools.

The Indexer is used to mine the content of the archive and extract derived information from archive source code artifacts.

swh indexer [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

journal-client

Listens for new objects from the SWH Journal, and schedules tasks to run relevant indexers (currently, only origin-intrinsic-metadata) on these new objects.

swh indexer journal-client [OPTIONS]

Options

-s, --scheduler-url <scheduler_url>

URL of the scheduler API

--origin-metadata-task-type <origin_metadata_task_type>

Name of the task running the origin metadata indexer.

--broker <brokers>

Kafka broker to connect to.

--prefix <prefix>

Prefix of Kafka topic names to read from.

--group-id <group_id>

Consumer/group id for reading from Kafka.

-m, --stop-after-objects <stop_after_objects>

Maximum number of objects to replay. Default is to run forever.

mapping

Manage Software Heritage Indexer mappings.

swh indexer mapping [OPTIONS] COMMAND [ARGS]...
list

Prints the list of known mappings.

swh indexer mapping list [OPTIONS]
list-terms

Prints the list of known CodeMeta terms, and which mappings support them.

swh indexer mapping list-terms [OPTIONS]

Options

--exclude-mapping <exclude_mapping>

Exclude the given mapping from the output

--concise

Don’t print the list of mappings supporting each term.

translate

Prints the list of known mappings.

swh indexer mapping translate [OPTIONS] MAPPING_NAME FILE

Arguments

MAPPING_NAME

Required argument

FILE

Required argument

rpc-serve

Starts a Software Heritage Indexer RPC HTTP server.

swh indexer rpc-serve [OPTIONS] CONFIG_PATH

Options

--host <host>

Host to run the server

--port <port>

Binding port of the server

--debug, --nodebug

Indicates if the server should run in debug mode

Arguments

CONFIG_PATH

Required argument

schedule

Manipulate Software Heritage Indexer tasks.

Via SWH Scheduler’s API.

swh indexer schedule [OPTIONS] COMMAND [ARGS]...

Options

-s, --scheduler-url <scheduler_url>

URL of the scheduler API

-i, --indexer-storage-url <indexer_storage_url>

URL of the indexer storage API

-g, --storage-url <storage_url>

URL of the (graph) storage API

--dry-run, --no-dry-run

List only what would be scheduled.

reindex_origin_metadata

Schedules indexing tasks for origins that were already indexed.

swh indexer schedule reindex_origin_metadata [OPTIONS]

Options

-b, --batch-size <origin_batch_size>

Number of origins per task

Default

10

-t, --tool-id <tool_ids>

Restrict search of old metadata to this/these tool ids.

-m, --mapping <mappings>

Mapping(s) that should be re-scheduled (eg. ‘npm’, ‘gemspec’, ‘maven’)

--task-type <task_type>

Name of the task type to schedule.

Default

index-origin-metadata

lister

Software Heritage Lister tools.

swh lister [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

-d, --db-url <db_url>

SQLAlchemy DB URL; see <http://docs.sqlalchemy.org/en/latest/core/engines.html#database-urls>

db-init

Initialize the database model for given listers.

swh lister db-init [OPTIONS]

Options

-D, --drop-tables

Drop tables before creating the database schema

run

Trigger a full listing run for a particular forge instance. The output of this listing results in “oneshot” tasks in the scheduler db with a priority defined by the user

swh lister run [OPTIONS] [OPTIONS]...

Options

-l, --lister <lister>

Lister to run

Options

bitbucket|cgit|cran|debian|gitea|github|gitlab|gnu|launchpad|npm|packagist|phabricator|pypi

-p, --priority <priority>

Task priority for the listed repositories to ingest

Options

high|medium|low

Arguments

OPTIONS

Optional argument(s)

loader

Loader cli tools

swh loader [OPTIONS] COMMAND [ARGS]...
list

List supported loaders and optionally their arguments

swh loader list [OPTIONS] [[all|archive|cran|debian|deposit|git|git_disk|mercu
                rial|nixguix|npm|pypi|svn]]

Arguments

TYPE

Optional argument

run

Ingest with loader <type> the origin located at <url>

swh loader run [OPTIONS] [archive|cran|debian|deposit|git|git_disk|mercurial|n
               ixguix|npm|pypi|svn] URL [OPTIONS]...

Arguments

TYPE

Required argument

URL

Required argument

OPTIONS

Optional argument(s)

objstorage

Software Heritage Objstorage tools.

swh objstorage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

fsck

Check the objstorage is not corrupted.

swh objstorage fsck [OPTIONS]
import

Import a local directory in an existing objstorage.

swh objstorage import [OPTIONS] DIRECTORY...

Arguments

DIRECTORY

Required argument(s)

replay

Fill a destination Object Storage using a journal stream.

This is typically used for a mirror configuration, by reading a Journal and retrieving objects from an existing source ObjStorage.

There can be several ‘replayers’ filling a given ObjStorage as long as they use the same group-id. You can use the KAFKA_GROUP_INSTANCE_ID environment variable to use KIP-345 static group membership.

This service retrieves object ids to copy from the ‘content’ topic. It will only copy object’s content if the object’s description in the kafka nmessage has the status:visible set.

--exclude-sha1-file may be used to exclude some hashes to speed-up the replay in case many of the contents are already in the destination objstorage. It must contain a concatenation of all (sha1) hashes, and it must be sorted. This file will not be fully loaded into memory at any given time, so it can be arbitrarily large.

--check-dst sets whether the replayer should check in the destination ObjStorage before copying an object. You can turn that off if you know you’re copying to an empty ObjStorage.

swh objstorage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>

Stop after processing this many objects. Default is to run forever.

--exclude-sha1-file <exclude_sha1_file>

File containing a sorted array of hashes to be excluded.

--check-dst, --no-check-dst

Check whether the destination contains the object before copying.

rpc-serve

Run a standalone objstorage server.

This is not meant to be run on production systems.

swh objstorage rpc-serve [OPTIONS]

Options

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

-p, --port <PORT>

Binding port of the server

Default

5003

scanner

Software Heritage Scanner tools.

swh scanner [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

YAML configuration file

scan

Scan a source code project to discover files and directories already present in the archive

swh scanner scan [OPTIONS] ROOT_PATH

Options

-u, --api-url <API_URL>

URL for the api request

-x, --exclude <PATTERN>

Exclude directories using glob patterns (e.g., ‘*.git’ to exclude all .git directories)

-f, --output-format <out_fmt>

The output format

Default

text

Options

text|json|ndjson|sunburst

-i, --interactive

Show the result in a dashboard

Arguments

ROOT_PATH

Required argument

scheduler

Software Heritage Scheduler tools.

Use a local scheduler instance by default (plugged to the main scheduler db).

swh scheduler [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

-d, --database <database>

Scheduling database DSN (imply cls is ‘local’)

-u, --url <url>

Scheduler’s url access (imply cls is ‘remote’)

--no-stdout

Do NOT output logs on the console

celery-monitor

Monitoring of Celery

swh scheduler celery-monitor [OPTIONS] COMMAND [ARGS]...

Options

--timeout <timeout>

Timeout for celery remote control

--pattern <pattern>

Celery destination pattern

list-running

List running tasks on the lister workers

swh scheduler celery-monitor list-running [OPTIONS]

Options

--format <format>

Output format

Options

pretty|csv

ping-workers

Check which workers respond to the celery remote control

swh scheduler celery-monitor ping-workers [OPTIONS]
rpc-serve

Starts a swh-scheduler API HTTP server.

swh scheduler rpc-serve [OPTIONS]

Options

--host <host>

Host to run the scheduler server api

--port <port>

Binding port of the server

--debug, --nodebug

Indicates if the server should run in debug mode. Defaults to True if log-level is DEBUG, False otherwise.

start-listener

Starts a swh-scheduler listener service.

This service is responsible for listening at task lifecycle events and handle their workflow status in the database.

swh scheduler start-listener [OPTIONS]
start-runner

Starts a swh-scheduler runner service.

This process is responsible for checking for ready-to-run tasks and schedule them.

swh scheduler start-runner [OPTIONS]

Options

-p, --period <period>

Period (in s) at witch pending tasks are checked and executed. Set to 0 (default) for a one shot.

task

Manipulate tasks.

swh scheduler task [OPTIONS] COMMAND [ARGS]...
add

Schedule one task from arguments.

The first argument is the name of the task type, further ones are positional and keyword argument(s) of the task, in YAML format. Keyword args are of the form key=value.

Usage sample:

swh-scheduler –database ‘service=swh-scheduler’ task add list-pypi

swh-scheduler –database ‘service=swh-scheduler’ task add list-debian-distribution –policy=oneshot distribution=stretch

Note: if the priority is not given, the task won’t have the priority set, which is considered as the lowest priority level.

swh scheduler task add [OPTIONS] TYPE [OPTIONS]...

Options

-p, --policy <policy>
Options

recurring|oneshot

-P, --priority <priority>
Options

low|normal|high

-n, --next-run <next_run>

Arguments

TYPE

Required argument

OPTIONS

Optional argument(s)

archive

Archive task/task_run whose (task_type is ‘oneshot’ and task_status is ‘completed’) or (task_type is ‘recurring’ and task_status is ‘disabled’).

With –dry-run flag set (default), only list those.

swh scheduler task archive [OPTIONS]

Options

-b, --before <before>

Task whose ended date is anterior will be archived. Default to current month’s first day.

-a, --after <after>

Task whose ended date is after the specified date will be archived. Default to prior month’s first day.

--batch-index <batch_index>

Batch size of tasks to read from db to archive

--bulk-index <bulk_index>

Batch size of tasks to bulk index

--batch-clean <batch_clean>

Batch size of task to clean after archival

--dry-run, --no-dry-run

Default to list only what would be archived.

--verbose

Verbose mode

--cleanup, --no-cleanup

Clean up archived tasks (default)

--start-from <start_from>

(Optional) default page to start from.

list

List tasks.

swh scheduler task list [OPTIONS]

Options

-i, --task-id <ID>

List only tasks whose id is ID.

-t, --task-type <TYPE>

List only tasks of type TYPE

-l, --limit <limit>

The maximum number of tasks to fetch.

-s, --status <STATUS>

List tasks whose status is STATUS.

Options

next_run_not_scheduled|next_run_scheduled|completed|disabled

-p, --policy <policy>

List tasks whose policy is POLICY.

Options

recurring|oneshot

-P, --priority <priority>

List tasks whose priority is PRIORITY.

Options

all|low|normal|high

-b, --before <DATETIME>

Limit to tasks supposed to run before the given date.

-a, --after <DATETIME>

Limit to tasks supposed to run after the given date.

-r, --list-runs

Also list past executions of each task.

list-pending

List the tasks that are going to be run.

You can override the number of tasks to fetch

swh scheduler task list-pending [OPTIONS] TASK_TYPES...

Options

-l, --limit <limit>

The maximum number of tasks to fetch

-b, --before <before>

List all jobs supposed to run before the given date

Arguments

TASK_TYPES

Required argument(s)

respawn

Respawn tasks.

Respawn tasks given by their ids (see the ‘task list’ command to find task ids) at the given date (immediately by default).

Eg.

swh-scheduler task respawn 1 3 12

swh scheduler task respawn [OPTIONS] TASK_IDS...

Options

-n, --next-run <DATETIME>

Re spawn the selected tasks at this date

Arguments

TASK_IDS

Required argument(s)

schedule

Schedule tasks from a CSV input file.

The following columns are expected, and can be set through the -c option:

  • type: the type of the task to be scheduled (mandatory)

  • args: the arguments passed to the task (JSON list, defaults to an empty list)

  • kwargs: the keyword arguments passed to the task (JSON object, defaults to an empty dict)

  • next_run: the date at which the task should run (datetime, defaults to now)

The CSV can be read either from a named file, or from stdin (use - as filename).

Use sample:

cat scheduling-task.txt | python3 -m swh.scheduler.cli –database ‘service=swh-scheduler-dev’ task schedule –columns type –columns kwargs –columns policy –delimiter ‘;’ -

swh scheduler task schedule [OPTIONS] FILE

Options

-c, --columns <columns>

columns present in the CSV file

Options

type|args|kwargs|policy|next_run

-d, --delimiter <delimiter>

Arguments

FILE

Required argument

schedule_origins

Schedules tasks for origins that are already known.

The first argument is the name of the task type, further ones are keyword argument(s) of the task in the form key=value, where value is in YAML format.

Usage sample:

swh-scheduler –database ‘service=swh-scheduler’ task schedule_origins index-origin-metadata

swh scheduler task schedule_origins [OPTIONS] TYPE [OPTIONS]...

Options

-b, --batch-size <origin_batch_size>

Number of origins per task

Default

10

--page-token <page_token>

Only schedule tasks for origins whose ID is greater

Default

0

--limit <limit>

Limit the tasks scheduling up to this number of tasks

-g, --storage-url <storage_url>

URL of the (graph) storage API

--dry-run, --no-dry-run

List only what would be scheduled.

Arguments

TYPE

Required argument

OPTIONS

Optional argument(s)

task-type

Manipulate task types.

swh scheduler task-type [OPTIONS] COMMAND [ARGS]...
add

Create a new task type

swh scheduler task-type add [OPTIONS] TYPE TASK_NAME DESCRIPTION

Options

-i, --default-interval <default_interval>

Default interval (“90 days” by default)

--min-interval <min_interval>

Minimum interval (default interval if not set)

-i, --max-interval <max_interval>

Maximal interval (default interval if not set)

-f, --backoff-factor <backoff_factor>

Backoff factor

Arguments

TYPE

Required argument

TASK_NAME

Required argument

DESCRIPTION

Required argument

list
swh scheduler task-type list [OPTIONS]

Options

-v, --verbose

Verbose mode

-t, --task_type <task_type>

List task types of given type

-n, --task_name <task_name>

List task types of given backend task name

register

Register missing task-type entries in the scheduler.

According to declared tasks in each loaded worker (e.g. lister, loader, …) plugins.

swh scheduler task-type register [OPTIONS]

Options

-p, --plugins <plugins>

Registers task-types for provided plugins. Defaults to all

Options

all|loader.svn|loader.mercurial|loader.git|loader.git_disk|loader.archive|loader.cran|loader.debian|loader.deposit|loader.nixguix|loader.npm|loader.pypi|lister.bitbucket|lister.cgit|lister.cran|lister.debian|lister.gitea|lister.github|lister.gitlab|lister.gnu|lister.launchpad|lister.npm|lister.packagist|lister.phabricator|lister.pypi|deposit.worker

storage

Software Heritage Storage tools.

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

--check-config <check_config>

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options

no|read|write

backfill

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries: - brokers: a list of kafka endpoints (the journal) in which entries will be

added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>
--end-object <end_object>
--dry-run

Arguments

OBJECT_TYPE

Required argument

replay

Fill a Storage by reading a Journal.

There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>

Stop after processing this many objects. Default is to run forever.

rpc-serve

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

swh storage rpc-serve [OPTIONS]

Options

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

--port <PORT>

Binding port of the server

Default

5002

--debug, --no-debug

Indicates if the server should run in debug mode

vault

Software Heritage Vault tools.

swh vault [OPTIONS] COMMAND [ARGS]...
rpc-serve

Software Heritage Vault RPC server.

swh vault rpc-serve [OPTIONS]

Options

-C, --config-file <CONFIGFILE>

Configuration file.

--no-stdout

Do NOT output logs on the console

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

--port <PORT>

Binding port of the server

--debug, --no-debug

Indicates if the server should run in debug mode

Database initialization utilities

swh db-init

Initialize a database for the Software Heritage <module>.

Example:

swh db init -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables. See psql(1) man page (section ENVIRONMENTS) for details.

Examples:

PGPORT=5434 swh db init indexer swh db init -d postgresql://user:passwd@pghost:5433/swh-storage storage swh db init –flavor read_replica -d swh-storage storage

swh db-init [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

--flavor <flavor>

Database flavor.

Arguments

MODULE

Required argument