Software Heritage Storage tools.
swh storage [OPTIONS] COMMAND [ARGS]...
- -C, --config-file <config_file>#
- --check-config <check_config>#
Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.
no | read | write
Run the backfiller
The backfiller list objects from a Storage and produce journal entries from there.
Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).
The configuration file requires the following entries:
brokers: a list of kafka endpoints (the journal) in which entries will be added.
storage_dbconn: URL to connect to the storage DB.
prefix: the prefix of the topics (topics will be <prefix>.<object_type>).
client_id: the kafka client ID.
swh storage backfill [OPTIONS] OBJECT_TYPE
- --start-object <start_object>#
- --end-object <end_object>#
Create object_reference partitions from START_DATE to END_DATE
swh storage create-object-reference-partitions [OPTIONS] START END
Fill a Storage by reading a Journal.
This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.
The expected configuration file should have 2 sections:
storage: the configuration of the storage in which to add objects received from the kafka journal,
journal_client: the configuration of access to the kafka journal. See the documentation of swh.journal for more details on the possible configuration entries in this section.
In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:
storage: [...] journal_client: [...] replayer: error_reporter: host: redis.local port: 6379 db: 1
swh storage replay [OPTIONS]
- -n, --stop-after-objects <stop_after_objects>#
Stop after processing this many objects. Default is to run forever.
- -t, --type <object_types>#
Object types to replay
origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid
- -X, --known-mismatched-hashes <invalid_hashes_file>#
File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.
Software Heritage Storage RPC server.
Do NOT use this in a production environment.
swh storage rpc-serve [OPTIONS]
- --host <IP>#
Host ip address to bind the server on
- --port <PORT>#
Binding port of the server
- --debug, --no-debug#
Indicates if the server should run in debug mode