Command-line interface

Shared command-line interface

swh

Command line interface for Software Heritage.

swh [OPTIONS] COMMAND [ARGS]...

Options

-l, --log-level <log_level>

Log level (defaults to INFO).

Options

NOTSET|DEBUG|INFO|WARNING|ERROR|CRITICAL

--log-config <log_config>

Python yaml logging configuration file.

--sentry-dsn <sentry_dsn>

DSN of the Sentry instance to report to

db

Software Heritage database generic tools.

swh db [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

create

Create a database for the Software Heritage <module>.

and potentially execute superuser-level initialization steps.

Example:

swh db create -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables or by the mean of a properly crafted libpq connection URI. See psql(1) man page (section ENVIRONMENTS) for details.

Note: this command requires a postgresql connection with superuser permissions.

Example:

PGPORT=5434 swh db create indexer swh db create -d postgresql://superuser:passwd@pghost:5433/swh-storage storage

swh db create [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

-T, --template <template>

Template database from which to build this database.

Default

template1

Arguments

MODULE

Required argument

init

Initialize a database for the Software Heritage <module>.

Example:

swh db init -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables. See psql(1) man page (section ENVIRONMENTS) for details.

Examples:

PGPORT=5434 swh db init indexer swh db init -d postgresql://user:passwd@pghost:5433/swh-storage storage swh db init –flavor read_replica -d swh-storage storage

swh db init [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

--flavor <flavor>

Database flavor.

Arguments

MODULE

Required argument

objstorage

Software Heritage Objstorage tools.

swh objstorage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

fsck

Check the objstorage is not corrupted.

swh objstorage fsck [OPTIONS]
import

Import a local directory in an existing objstorage.

swh objstorage import [OPTIONS] DIRECTORY...

Arguments

DIRECTORY

Required argument(s)

replay

Fill a destination Object Storage using a journal stream.

This is typically used for a mirror configuration, by reading a Journal and retrieving objects from an existing source ObjStorage.

There can be several ‘replayers’ filling a given ObjStorage as long as they use the same group-id. You can use the KAFKA_GROUP_INSTANCE_ID environment variable to use KIP-345 static group membership.

This service retrieves object ids to copy from the ‘content’ topic. It will only copy object’s content if the object’s description in the kafka nmessage has the status:visible set.

--exclude-sha1-file may be used to exclude some hashes to speed-up the replay in case many of the contents are already in the destination objstorage. It must contain a concatenation of all (sha1) hashes, and it must be sorted. This file will not be fully loaded into memory at any given time, so it can be arbitrarily large.

--check-dst sets whether the replayer should check in the destination ObjStorage before copying an object. You can turn that off if you know you’re copying to an empty ObjStorage.

swh objstorage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>

Stop after processing this many objects. Default is to run forever.

--exclude-sha1-file <exclude_sha1_file>

File containing a sorted array of hashes to be excluded.

--check-dst, --no-check-dst

Check whether the destination contains the object before copying.

rpc-serve

Run a standalone objstorage server.

This is not meant to be run on production systems.

swh objstorage rpc-serve [OPTIONS]

Options

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

-p, --port <PORT>

Binding port of the server

Default

5003

storage

Software Heritage Storage tools.

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

--check-config <check_config>

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options

no|read|write

backfill

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries: - brokers: a list of kafka endpoints (the journal) in which entries will be

added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>
--end-object <end_object>
--dry-run

Arguments

OBJECT_TYPE

Required argument

replay

Fill a Storage by reading a Journal.

There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>

Stop after processing this many objects. Default is to run forever.

rpc-serve

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

swh storage rpc-serve [OPTIONS]

Options

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

--port <PORT>

Binding port of the server

Default

5002

--debug, --no-debug

Indicates if the server should run in debug mode

Database initialization utilities

swh db-init

Initialize a database for the Software Heritage <module>.

Example:

swh db init -d swh-test storage

If you want to specify non-default postgresql connection parameters, please provide them using standard environment variables. See psql(1) man page (section ENVIRONMENTS) for details.

Examples:

PGPORT=5434 swh db init indexer swh db init -d postgresql://user:passwd@pghost:5433/swh-storage storage swh db init –flavor read_replica -d swh-storage storage

swh db-init [OPTIONS] MODULE

Options

-d, --db-name <db_name>

Database name.

Default

softwareheritage-dev

--flavor <flavor>

Database flavor.

Arguments

MODULE

Required argument