Command-line interface#
swh storage#
Software Heritage Storage tools.
Usage
swh storage [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
Configuration file.
- --check-config <check_config>#
Check the configuration of the storage at startup for read or write access; if set, override the value present in the configuration file if any. Defaults to ‘read’ for the ‘backfill’ command, and ‘write’ for ‘rpc-server’ and ‘replay’ commands.
- Options:
no | read | write
backfill#
Run the backfiller
The backfiller list objects from a Storage and produce journal entries from there.
Typically used to rebuild a journal or compensate for missing objects in a journal (eg. due to a downtime of this later).
The configuration file requires the following entries:
brokers: a list of kafka endpoints (the journal) in which entries will be added.
storage_dbconn: URL to connect to the storage DB.
prefix: the prefix of the topics (topics will be <prefix>.<object_type>).
client_id: the kafka client ID.
Usage
swh storage backfill [OPTIONS] OBJECT_TYPE
Options
- --start-object <start_object>#
- --end-object <end_object>#
- --dry-run#
Arguments
- OBJECT_TYPE#
Required argument
blocking#
Configure blocking of origins, preventing them from being archived
These tools require read/write access to the blocking database. An entry must be added to the configuration file as follow:
storage:
…
blocking_admin:
cls: postgresql
db: "service=swh-blocking-admin"
Usage
swh storage blocking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all blocking states for the given request
Usage
swh storage blocking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
an explanation for this change
Arguments
- SLUG#
Required argument
history#
Get the history for a request
Usage
swh storage blocking history [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
list-requests#
List blocking requests
Usage
swh storage blocking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
Show requests without any blocking state
new-request#
Create a new request to block objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
Usage
swh storage blocking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
why the request was made
Arguments
- SLUG#
Required argument
origin-state#
Get the blocking state for a set of Origins
If an object given in the arguments is not listed in the output, it means no blocking state is set in any requests.
Usage
swh storage blocking origin-state [OPTIONS] ORIGIN
Arguments
- ORIGIN#
Optional argument(s)
status#
Get the blocking states defined by a request
Usage
swh storage blocking status [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
update-objects#
Update the blocking state of given objects
The blocked state of the provided Origins will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “blocked”, “decision-pending” or “non-blocked”.
origins must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
Usage
swh storage blocking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
an explanation for this change
- -f, --file <file>#
a file with one Origin per line
Arguments
- SLUG#
Required argument
- NEW_STATE#
Required argument
cassandra#
Usage
swh storage cassandra [OPTIONS] COMMAND [ARGS]...
init#
Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend
Usage
swh storage cassandra init [OPTIONS]
list-migrations#
Creates a Cassandra keyspace with table definitions suitable for use by swh-storage’s Cassandra backend
Usage
swh storage cassandra list-migrations [OPTIONS]
mark-upgraded#
Marks a migration as run
Exit codes:
0: ok
1: unexpected crash
2: (unassigned)
3: nothing to do
Usage
swh storage cassandra mark-upgraded [OPTIONS]
Options
- --migration <migration_ids>#
upgrade#
Applies all pending migrations that can run automatically
Exit codes:
0: migrations applied
1: unexpected crash
2: (unassigned)
3: no migrations to run
4: some required migrations need to be manually applied
5: some optional migrations need to be manually applied
6: some required (and optional) migrations could not be applied because a dependency is missing (only if –migration was passed)
7: some optional migrations could not be applied because a dependency is missing (only if –migration was passed)
Usage
swh storage cassandra upgrade [OPTIONS]
Options
- --migration <migration_ids>#
create-object-reference-partitions#
Create object_reference partitions from START_DATE to END_DATE
Usage
swh storage create-object-reference-partitions [OPTIONS] START END
Arguments
- START#
Required argument
- END#
Required argument
masking#
Configure masking on archived objects
These tools require read/write access to the masking database. An entry must be added to the configuration file as follow:
storage:
…
masking_admin:
cls: postgresql
db: "service=swh-masking-admin"
Usage
swh storage masking [OPTIONS] COMMAND [ARGS]...
clear-request#
Remove all masking states for the given request
Usage
swh storage masking clear-request [OPTIONS] SLUG
Options
- -m, --message <message>#
an explanation for this change
Arguments
- SLUG#
Required argument
history#
Get the history for a request
Usage
swh storage masking history [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
list-requests#
List masking requests
Usage
swh storage masking list-requests [OPTIONS]
Options
- -a, --include-cleared-requests, --exclude-cleared-requests#
Show requests without any masking state
new-request#
Create a new request to mask objects
SLUG is a human-readable unique identifier for the request. It is an internal identifier that will be used in subsequent commands to address this newly recorded request.
A reason for the request must be specified, either using the -m option or via the provided editor.
Usage
swh storage masking new-request [OPTIONS] SLUG
Options
- -m, --message <REASON>#
why the request was made
Arguments
- SLUG#
Required argument
object-state#
Get the masking state for a set of SWHIDs
If an object given in the arguments is not listed in the output, it means no masking state is set in any requests.
Usage
swh storage masking object-state [OPTIONS] SWHID
Arguments
- SWHID#
Optional argument(s)
patching#
Tools to manage the patching of objects
Usage
swh storage masking patching [OPTIONS] COMMAND [ARGS]...
set#
Set display names (patching entries)
Usage
swh storage masking patching set [OPTIONS] INPUT
Options
- --clear, --keep#
Clear the display names table before inserting new entries
Arguments
- INPUT#
Required argument
status#
Get the masking states defined by a request
Usage
swh storage masking status [OPTIONS] SLUG
Arguments
- SLUG#
Required argument
update-objects#
Update the state of given objects
The masked state of the provided SWHIDs will be updated to NEW_STATE for the request SLUG.
NEW_STATE must be one of “visible”, “decision-pending” or “restricted”.
SWHIDs must be provided one per line, either via the standard input or a file specified via the -f option. - is synonymous for the standard input.
An explanation for this change must be added to the request history. It can either be specified by the -m option or via the provided editor.
Usage
swh storage masking update-objects [OPTIONS] SLUG NEW_STATE
Options
- -m, --message <message>#
an explanation for this change
- -f, --file <file>#
a file with on SWHID per line
Arguments
- SLUG#
Required argument
- NEW_STATE#
Required argument
remove-old-object-reference-partitions#
Remove object_reference partitions for values older than BEFORE
Usage
swh storage remove-old-object-reference-partitions [OPTIONS] BEFORE
Options
- --force#
do not ask for confirmation before removing tables
Arguments
- BEFORE#
Required argument
replay#
Fill a Storage by reading a Journal.
This is typically used for a mirror configuration, reading the Software Heritage kafka journal to retrieve objects of the Software Heritage main storage to feed a replication storage. There can be several ‘replayers’ filling a Storage as long as they use the same group-id.
The expected configuration file should have 2 sections:
storage: the configuration of the storage in which to add objects received from the kafka journal,
journal_client: the configuration of access to the kafka journal. See the documentation of swh.journal for more details on the possible configuration entries in this section.
https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
In addition to these 2 mandatory config sections, a third ‘replayer’ may be specified with a ‘error_reporter’ config entry allowing to specify redis connection parameters that will be used to report non-recoverable mirroring, eg.:
storage:
[...]
journal_client:
[...]
replayer:
error_reporter:
host: redis.local
port: 6379
db: 1
Usage
swh storage replay [OPTIONS]
Options
- -n, --stop-after-objects <stop_after_objects>#
Stop after processing this many objects. Default is to run forever.
- -t, --type <object_types>#
Object types to replay
- Options:
origin | origin_visit | origin_visit_status | snapshot | revision | release | directory | content | skipped_content | metadata_authority | metadata_fetcher | raw_extrinsic_metadata | extid
- -X, --known-mismatched-hashes <invalid_hashes_file>#
File of SWHIDs of objects that are known to have invalid hashes but still need to be replayed.
rpc-serve#
Software Heritage Storage RPC server.
Do NOT use this in a production environment.
Usage
swh storage rpc-serve [OPTIONS]
Options
- --host <IP>#
Host ip address to bind the server on
- Default:
'0.0.0.0'
- --port <PORT>#
Binding port of the server
- Default:
5002
- --debug, --no-debug#
Indicates if the server should run in debug mode