Command-line interface#

swh storage#

Software Heritage Storage tools.

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

--check-config <check_config>#

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options:

no | read | write

backfill#

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries:

  • brokers: a list of kafka endpoints (the journal) in which entries will be added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>#
--end-object <end_object>#
--dry-run#

Arguments

OBJECT_TYPE#

Required argument

blocking#

Configure blocking of origins, preventing them from being archived

These tools require read/write access to the blocking database. An entry must be added to the configuration file as follow:


storage:
  …

blocking_admin:
  cls: postgresql
  db: "service=swh-blocking-admin"
swh storage blocking [OPTIONS] COMMAND [ARGS]...

clear-request#

Remove all blocking states for the given request

swh storage blocking clear-request [OPTIONS] SLUG

Options

-m, --message <message>#

an explanation for this change

Arguments

SLUG#

Required argument

history#

Get the history for a request

swh storage blocking history [OPTIONS] SLUG

Arguments

SLUG#

Required argument

list-requests#

List blocking requests

swh storage blocking list-requests [OPTIONS]

Options

-a, --include-cleared-requests, --exclude-cleared-requests#

Show requests without any blocking state

new-request#

Create a new request to block objects

SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.

A reason for the request must be specified, either using the -m option or via the provided editor.

swh storage blocking new-request [OPTIONS] SLUG

Options

-m, --message <REASON>#

why the request was made

Arguments

SLUG#

Required argument

origin-state#

Get the blocking state for a set of Origins

If an object given in the arguments is not listed in the output, it means no blocking state is set in any requests.

swh storage blocking origin-state [OPTIONS] ORIGIN

Arguments

ORIGIN#

Optional argument(s)

status#

Get the blocking states defined by a request

swh storage blocking status [OPTIONS] SLUG

Arguments

SLUG#

Required argument

update-objects#

Update the blocking state of given objects

The blocked state of the provided Origins will be updated to NEW_STATE for the request SLUG.

NEW_STATE must be one of “blocked”, “decision-pending” or “non_blocked”.

origins must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.

An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.

swh storage blocking update-objects [OPTIONS] SLUG NEW_STATE

Options

-m, --message <message>#

an explanation for this change

-f, --file <file>#

a file with one Origin per line

Arguments

SLUG#

Required argument

NEW_STATE#

Required argument

create-keyspace#

Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend

swh storage create-keyspace [OPTIONS]

create-object-reference-partitions#

Create object_reference partitions from START_DATE to END_DATE

swh storage create-object-reference-partitions [OPTIONS] START END

Arguments

START#

Required argument

END#

Required argument

masking#

Configure masking on archived objects

These tools require read/write access to the masking database. An entry must be added to the configuration file as follow:


storage:
  …

masking_admin:
  cls: postgresql
  db: "service=swh-masking-admin"
swh storage masking [OPTIONS] COMMAND [ARGS]...

clear-request#

Remove all masking states for the given request

swh storage masking clear-request [OPTIONS] SLUG

Options

-m, --message <message>#

an explanation for this change

Arguments

SLUG#

Required argument

history#

Get the history for a request

swh storage masking history [OPTIONS] SLUG

Arguments

SLUG#

Required argument

list-requests#

List masking requests

swh storage masking list-requests [OPTIONS]

Options

-a, --include-cleared-requests, --exclude-cleared-requests#

Show requests without any masking state

new-request#

Create a new request to mask objects

SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.

A reason for the request must be specified, either using the -m option or via the provided editor.

swh storage masking new-request [OPTIONS] SLUG

Options

-m, --message <REASON>#

why the request was made

Arguments

SLUG#

Required argument

object-state#

Get the masking state for a set of SWHIDs

If an object given in the arguments is not listed in the output, it means no masking state is set in any requests.

swh storage masking object-state [OPTIONS] SWHID

Arguments

SWHID#

Optional argument(s)

patching#

Tools to manage the patching of objects

swh storage masking patching [OPTIONS] COMMAND [ARGS]...
set#

Set display names (patching entries)

swh storage masking patching set [OPTIONS] INPUT

Options

--clear, --keep#

Clear the display names table before inserting new entries

Arguments

INPUT#

Required argument

status#

Get the masking states defined by a request

swh storage masking status [OPTIONS] SLUG

Arguments

SLUG#

Required argument

update-objects#

Update the state of given objects

The masked state of the provided SWHIDs will be updated to NEW_STATE for the request SLUG.

NEW_STATE must be one of “visible”, “decision-pending” or “restricted”.

SWHIDs must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.

An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.

swh storage masking update-objects [OPTIONS] SLUG NEW_STATE

Options

-m, --message <message>#

an explanation for this change

-f, --file <file>#

a file with on SWHID per line

Arguments

SLUG#

Required argument

NEW_STATE#

Required argument

remove-old-object-reference-partitions#

Remove object_reference partitions for values older than BEFORE

swh storage remove-old-object-reference-partitions [OPTIONS] BEFORE

Options

--force#

do not ask for confirmation before removing tables

Arguments

BEFORE#

Required argument

replay#

Fill a Storage by reading a Journal.

This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

The expected configuration file should have 2 sections:

In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:

storage:
  [...]
journal_client:
  [...]
replayer:
  error_reporter:
    host: redis.local
    port: 6379
    db: 1
swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>#

Stop after processing this many objects. Default is to run forever.

-t, --type <object_types>#

Object types to replay

Options:

origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid

-X, --known-mismatched-hashes <invalid_hashes_file>#

File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.

rpc-serve#

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

swh storage rpc-serve [OPTIONS]

Options

--host <IP>#

Host ip address to bind the server on

Default:

'0.0.0.0'

--port <PORT>#

Binding port of the server

Default:

5002

--debug, --no-debug#

Indicates if the server should run in debug mode