Command-line interface#

swh storage#

Software Heritage Storage tools.

Usage

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

--check-config <check_config>#

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options:

no | read | write

backfill#

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries:

  • brokers: a list of kafka endpoints (the journal) in which entries will be added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

Usage

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>#
--end-object <end_object>#
--dry-run#

Arguments

OBJECT_TYPE#

Required argument

blocking#

Configure blocking of origins, preventing them from being archived

These tools require read/write access to the blocking database. An entry must be added to the configuration file as follow:


storage:
  …

blocking_admin:
  cls: postgresql
  db: "service=swh-blocking-admin"

Usage

swh storage blocking [OPTIONS] COMMAND [ARGS]...

clear-request#

Remove all blocking states for the given request

Usage

swh storage blocking clear-request [OPTIONS] SLUG

Options

-m, --message <message>#

an explanation for this change

Arguments

SLUG#

Required argument

history#

Get the history for a request

Usage

swh storage blocking history [OPTIONS] SLUG

Arguments

SLUG#

Required argument

list-requests#

List blocking requests

Usage

swh storage blocking list-requests [OPTIONS]

Options

-a, --include-cleared-requests, --exclude-cleared-requests#

Show requests without any blocking state

new-request#

Create a new request to block objects

SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.

A reason for the request must be specified, either using the -m option or via the provided editor.

Usage

swh storage blocking new-request [OPTIONS] SLUG

Options

-m, --message <REASON>#

why the request was made

Arguments

SLUG#

Required argument

origin-state#

Get the blocking state for a set of Origins

If an object given in the arguments is not listed in the output, it means no blocking state is set in any requests.

Usage

swh storage blocking origin-state [OPTIONS] ORIGIN

Arguments

ORIGIN#

Optional argument(s)

status#

Get the blocking states defined by a request

Usage

swh storage blocking status [OPTIONS] SLUG

Arguments

SLUG#

Required argument

update-objects#

Update the blocking state of given objects

The blocked state of the provided Origins will be updated to NEW_STATE for the request SLUG.

NEW_STATE must be one of “blocked”, “decision-pending” or “non-blocked”.

origins must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.

An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.

Usage

swh storage blocking update-objects [OPTIONS] SLUG NEW_STATE

Options

-m, --message <message>#

an explanation for this change

-f, --file <file>#

a file with one Origin per line

Arguments

SLUG#

Required argument

NEW_STATE#

Required argument

cassandra#

Usage

swh storage cassandra [OPTIONS] COMMAND [ARGS]...

init#

Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend

Usage

swh storage cassandra init [OPTIONS]

list-migrations#

Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend

Usage

swh storage cassandra list-migrations [OPTIONS]

mark-upgraded#

Marks a migration as run

Exit codes:

  • 0: ok

  • 1: unexpected crash

  • 2: (unassigned)

  • 3: nothing to do

Usage

swh storage cassandra mark-upgraded [OPTIONS]

Options

--migration <migration_ids>#

upgrade#

Applies all pending migrations that can run automatically

Exit codes:

  • 0: migrations applied

  • 1: unexpected crash

  • 2: (unassigned)

  • 3: no migrations to run

  • 4: some required migrations need to be manually applied

  • 5: some optional migrations need to be manually applied

  • 6: some required (and optional) migrations could not be applied because a dependency is missing (only if –migration was passed)

  • 7: some optional migrations could not be applied because a dependency is missing (only if –migration was passed)

Usage

swh storage cassandra upgrade [OPTIONS]

Options

--migration <migration_ids>#

create-object-reference-partitions#

Create object_reference partitions from START_DATE to END_DATE

Usage

swh storage create-object-reference-partitions [OPTIONS] START END

Arguments

START#

Required argument

END#

Required argument

masking#

Configure masking on archived objects

These tools require read/write access to the masking database. An entry must be added to the configuration file as follow:


storage:
  …

masking_admin:
  cls: postgresql
  db: "service=swh-masking-admin"

Usage

swh storage masking [OPTIONS] COMMAND [ARGS]...

clear-request#

Remove all masking states for the given request

Usage

swh storage masking clear-request [OPTIONS] SLUG

Options

-m, --message <message>#

an explanation for this change

Arguments

SLUG#

Required argument

history#

Get the history for a request

Usage

swh storage masking history [OPTIONS] SLUG

Arguments

SLUG#

Required argument

list-requests#

List masking requests

Usage

swh storage masking list-requests [OPTIONS]

Options

-a, --include-cleared-requests, --exclude-cleared-requests#

Show requests without any masking state

new-request#

Create a new request to mask objects

SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.

A reason for the request must be specified, either using the -m option or via the provided editor.

Usage

swh storage masking new-request [OPTIONS] SLUG

Options

-m, --message <REASON>#

why the request was made

Arguments

SLUG#

Required argument

object-state#

Get the masking state for a set of SWHIDs

If an object given in the arguments is not listed in the output, it means no masking state is set in any requests.

Usage

swh storage masking object-state [OPTIONS] SWHID

Arguments

SWHID#

Optional argument(s)

patching#

Tools to manage the patching of objects

Usage

swh storage masking patching [OPTIONS] COMMAND [ARGS]...
set#

Set display names (patching entries)

Usage

swh storage masking patching set [OPTIONS] INPUT

Options

--clear, --keep#

Clear the display names table before inserting new entries

Arguments

INPUT#

Required argument

status#

Get the masking states defined by a request

Usage

swh storage masking status [OPTIONS] SLUG

Arguments

SLUG#

Required argument

update-objects#

Update the state of given objects

The masked state of the provided SWHIDs will be updated to NEW_STATE for the request SLUG.

NEW_STATE must be one of “visible”, “decision-pending” or “restricted”.

SWHIDs must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.

An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.

Usage

swh storage masking update-objects [OPTIONS] SLUG NEW_STATE

Options

-m, --message <message>#

an explanation for this change

-f, --file <file>#

a file with on SWHID per line

Arguments

SLUG#

Required argument

NEW_STATE#

Required argument

remove-old-object-reference-partitions#

Remove object_reference partitions for values older than BEFORE

Usage

swh storage remove-old-object-reference-partitions [OPTIONS] BEFORE

Options

--force#

do not ask for confirmation before removing tables

Arguments

BEFORE#

Required argument

replay#

Fill a Storage by reading a Journal.

This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

The expected configuration file should have 2 sections:

In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:

storage:
  [...]
journal_client:
  [...]
replayer:
  error_reporter:
    host: redis.local
    port: 6379
    db: 1

Usage

swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>#

Stop after processing this many objects. Default is to run forever.

-t, --type <object_types>#

Object types to replay

Options:

origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid

-X, --known-mismatched-hashes <invalid_hashes_file>#

File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.

rpc-serve#

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

Usage

swh storage rpc-serve [OPTIONS]

Options

--host <IP>#

Host ip address to bind the server on

Default:

'0.0.0.0'

--port <PORT>#

Binding port of the server

Default:

5002

--debug, --no-debug#

Indicates if the server should run in debug mode