Command-line interface#
swh storage#
Software Heritage Storage tools.
swh storage [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
Configuration file.
- --check-config <check_config>#
Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.
- Options:
no | read | write
backfill#
Run the backfiller
The backfiller list objects from a Storage and produce journal entries from there.
Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).
The configuration file requires the following entries:
brokers: a list of kafka endpoints (the journal) in which entries will be added.
storage_dbconn: URL to connect to the storage DB.
prefix: the prefix of the topics (topics will be <prefix>.<object_type>).
client_id: the kafka client ID.
swh storage backfill [OPTIONS] OBJECT_TYPE
Options
- --start-object <start_object>#
- --end-object <end_object>#
- --dry-run#
Arguments
- OBJECT_TYPE#
Required argument
create-object-reference-partitions#
Create object_reference partitions from START_DATE to END_DATE
swh storage create-object-reference-partitions [OPTIONS] START END
Arguments
- START#
Required argument
- END#
Required argument
replay#
Fill a Storage by reading a Journal.
This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.
The expected configuration file should have 2 sections:
storage: the configuration of the storage in which to add objects received from the kafka journal,
journal_client: the configuration of access to the kafka journal. See the documentation of swh.journal for more details on the possible configuration entries in this section.
https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:
storage:
[...]
journal_client:
[...]
replayer:
error_reporter:
host: redis.local
port: 6379
db: 1
swh storage replay [OPTIONS]
Options
- -n, --stop-after-objects <stop_after_objects>#
Stop after processing this many objects. Default is to run forever.
- -t, --type <object_types>#
Object types to replay
- Options:
origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid
- -X, --known-mismatched-hashes <invalid_hashes_file>#
File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.
rpc-serve#
Software Heritage Storage RPC server.
Do NOT use this in a production environment.
swh storage rpc-serve [OPTIONS]
Options
- --host <IP>#
Host ip address to bind the server on
- Default:
0.0.0.0
- --port <PORT>#
Binding port of the server
- Default:
5002
- --debug, --no-debug#
Indicates if the server should run in debug mode