Command-line interface#
swh storage#
Software Heritage Storage tools.
swh storage [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
Configuration file.
- --check-config <check_config>#
Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.
- Options:
no | read | write
backfill#
Run the backfiller
The backfiller list objects from a Storage and produce journal entries from there.
Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).
The configuration file requires the following entries:
brokers: a list of kafka endpoints (the journal) in which entries will be added.
storage_dbconn: URL to connect to the storage DB.
prefix: the prefix of the topics (topics will be <prefix>.<object_type>).
client_id: the kafka client ID.
swh storage backfill [OPTIONS] OBJECT_TYPE
Options
- --start-object <start_object>#
- --end-object <end_object>#
- --dry-run#
Arguments
- OBJECT_TYPE#
Required argument
blocking#
Configure blocking of origins, preventing them from being archived
These tools require read/write access to the blocking database. An entry must be added to the configuration file as follow:
storage:
…
blocking_admin:
cls: postgresql
db: "service=swh-blocking-admin"
swh storage blocking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all blocking states for the given request
swh storage blocking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
an explanation for this change
Arguments
- SLUG#
Required argument
history#
Get the history for a request
swh storage blocking history [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
list-requests#
List blocking requests
swh storage blocking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
Show requests without any blocking state
new-request#
Create a new request to block objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
swh storage blocking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
why the request was made
Arguments
- SLUG#
Required argument
origin-state#
Get the blocking state for a set of Origins
If an object given in the arguments is not listed in the output, it means no blocking state is set in any requests.
swh storage blocking origin-state [OPTIONS] ORIGIN
Arguments
- ORIGIN#
Optional argument(s)
status#
Get the blocking states defined by a request
swh storage blocking status [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
update-objects#
Update the blocking state of given objects
The blocked state of the provided Origins will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “blocked”, “decision-pending” or “non_blocked”.
origins must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
swh storage blocking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
an explanation for this change
- -f, --file <file>#
a file with one Origin per line
Arguments
- SLUG#
Required argument
- NEW_STATE#
Required argument
create-keyspace#
Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend
swh storage create-keyspace [OPTIONS]
create-object-reference-partitions#
Create object_reference partitions from START_DATE to END_DATE
swh storage create-object-reference-partitions [OPTIONS] START END
Arguments
- START#
Required argument
- END#
Required argument
masking#
Configure masking on archived objects
These tools require read/write access to the masking database. An entry must be added to the configuration file as follow:
storage:
…
masking_admin:
cls: postgresql
db: "service=swh-masking-admin"
swh storage masking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all masking states for the given request
swh storage masking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
an explanation for this change
Arguments
- SLUG#
Required argument
history#
Get the history for a request
swh storage masking history [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
list-requests#
List masking requests
swh storage masking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
Show requests without any masking state
new-request#
Create a new request to mask objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
swh storage masking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
why the request was made
Arguments
- SLUG#
Required argument
object-state#
Get the masking state for a set of SWHIDs
If an object given in the arguments is not listed in the output, it means no masking state is set in any requests.
swh storage masking object-state [OPTIONS] SWHID
Arguments
- SWHID#
Optional argument(s)
patching#
Tools to manage the patching of objects
swh storage masking patching [OPTIONS] COMMAND [ARGS]...
set#
Set display names (patching entries)
swh storage masking patching set [OPTIONS] INPUT
Options
- --clear, --keep#
Clear the display names table before inserting new entries
Arguments
- INPUT#
Required argument
status#
Get the masking states defined by a request
swh storage masking status [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
update-objects#
Update the state of given objects
The masked state of the provided SWHIDs will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “visible”, “decision-pending” or “restricted”.
SWHIDs must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
swh storage masking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
an explanation for this change
- -f, --file <file>#
a file with on SWHID per line
Arguments
- SLUG#
Required argument
- NEW_STATE#
Required argument
remove-old-object-reference-partitions#
Remove object_reference partitions for values older than BEFORE
swh storage remove-old-object-reference-partitions [OPTIONS] BEFORE
Options
- --force#
do not ask for confirmation before removing tables
Arguments
- BEFORE#
Required argument
replay#
Fill a Storage by reading a Journal.
This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.
The expected configuration file should have 2 sections:
storage: the configuration of the storage in which to add objects received from the kafka journal,
journal_client: the configuration of access to the kafka journal. See the documentation of swh.journal for more details on the possible configuration entries in this section.
https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:
storage:
[...]
journal_client:
[...]
replayer:
error_reporter:
host: redis.local
port: 6379
db: 1
swh storage replay [OPTIONS]
Options
- -n, --stop-after-objects <stop_after_objects>#
Stop after processing this many objects. Default is to run forever.
- -t, --type <object_types>#
Object types to replay
- Options:
origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid
- -X, --known-mismatched-hashes <invalid_hashes_file>#
File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.
rpc-serve#
Software Heritage Storage RPC server.
Do NOT use this in a production environment.
swh storage rpc-serve [OPTIONS]
Options
- --host <IP>#
Host ip address to bind the server on
- Default:
'0.0.0.0'
- --port <PORT>#
Binding port of the server
- Default:
5002
- --debug, --no-debug#
Indicates if the server should run in debug mode