Command-line interface

swh storage

Software Heritage Storage tools.

swh storage [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

Configuration file.

--check-config <check_config>

Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.

Options

no | read | write

backfill

Run the backfiller

The backfiller list objects from a Storage and produce journal entries from there.

Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).

The configuration file requires the following entries:

  • brokers: a list of kafka endpoints (the journal) in which entries will be added.

  • storage_dbconn: URL to connect to the storage DB.

  • prefix: the prefix of the topics (topics will be <prefix>.<object_type>).

  • client_id: the kafka client ID.

swh storage backfill [OPTIONS] OBJECT_TYPE

Options

--start-object <start_object>
--end-object <end_object>
--dry-run

Arguments

OBJECT_TYPE

Required argument

replay

Fill a Storage by reading a Journal.

This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.

The expected configuration file should have 2 sections:

In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:

storage:
  [...]
journal_client:
  [...]
replayer:
  error_reporter:
    host: redis.local
    port: 6379
    db: 1
swh storage replay [OPTIONS]

Options

-n, --stop-after-objects <stop_after_objects>

Stop after processing this many objects. Default is to run forever.

-t, --type <object_types>

Object types to replay

Options

origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid

rpc-serve

Software Heritage Storage RPC server.

Do NOT use this in a production environment.

swh storage rpc-serve [OPTIONS]

Options

--host <IP>

Host ip address to bind the server on

Default

0.0.0.0

--port <PORT>

Binding port of the server

Default

5002

--debug, --no-debug

Indicates if the server should run in debug mode