Command-line interface#

swh storage#

Software Heritage Storage tools.

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

--check-config <check_config>#

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options:

no | read | write

backfill#

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries:

  • brokers: a list of kafka endpoints (the journal) in which entries will be added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>#
--end-object <end_object>#
--dry-run#

Arguments

OBJECT_TYPE#

Required argument

create-object-reference-partitions#

Create object_reference partitions from START_DATE to END_DATE

swh storage create-object-reference-partitions [OPTIONS] START END

Arguments

START#

Required argument

END#

Required argument

replay#

Fill a Storage by reading a Journal.

This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

The expected configuration file should have 2 sections:

In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:

storage:
  [...]
journal_client:
  [...]
replayer:
  error_reporter:
    host: redis.local
    port: 6379
    db: 1
swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>#

Stop after processing this many objects. Default is to run forever.

-t, --type <object_types>#

Object types to replay

Options:

origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid

-X, --known-mismatched-hashes <invalid_hashes_file>#

File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.

rpc-serve#

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

swh storage rpc-serve [OPTIONS]

Options

--host <IP>#

Host ip address to bind the server on

Default:

0.0.0.0

--port <PORT>#

Binding port of the server

Default:

5002

--debug, --no-debug#

Indicates if the server should run in debug mode