Command-line interface#
swh scrubber#
main command group of the datastore scrubber
Expected config format:
scrubber:
cls: postgresql
db: "service=..." # libpq DSN
# for storage checkers + origin locator only:
storage:
cls: postgresql # cannot be remote for checkers, as they need direct
# access to the pg DB
db": "service=..." # libpq DSN
objstorage:
cls: memory
# for journal checkers only:
journal:
# see https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
# for the full list of options
sasl.mechanism: SCRAM-SHA-512
security.protocol: SASL_SSL
sasl.username: ...
sasl.password: ...
group_id: ...
privileged: True
message.max.bytes: 524288000
brokers:
- "broker1.journal.softwareheritage.org:9093
- "broker2.journal.softwareheritage.org:9093
- "broker3.journal.softwareheritage.org:9093
- "broker4.journal.softwareheritage.org:9093
- "broker5.journal.softwareheritage.org:9093
object_types: [directory, revision, snapshot, release]
auto_offset_reset: earliest
swh scrubber [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
Configuration file.
check#
group of commands which read from data stores and report errors.
swh scrubber check [OPTIONS] COMMAND [ARGS]...
init#
Initialise a scrubber check configuration for the datastore defined in the configuration file and given object_type.
A checker configuration configuration consists simply in a set of:
backend: the datastore type being scrubbed (storage, objstorage or journal),
object-type: the type of object being checked,
nb-partitions: the number of partitions the hash space is divided in; must be a power of 2,
name: an unique name for easier reference,
check-hashes: flag (default to True) to select the hash validation step for this scrubbing configuration,
check-references: flag (default to True for storage and False for the journal backend) to select the reference validation step for this scrubbing configuration.
swh scrubber check init [OPTIONS] {storage|journal|objstorage}
Options
- --object-type <object_type>#
- Options:
snapshot | revision | release | directory | content
- --nb-partitions <nb_partitions>#
- --name <name>#
- --check-hashes, --no-check-hashes#
- --check-references, --no-check-references#
Arguments
- BACKEND#
Required argument
list#
List the know configurations
swh scrubber check list [OPTIONS]
run#
Run the scrubber checker configured as name and reports corrupt objects to the scrubber DB.
This runs a single thread; parallelism is achieved by running this command multiple times.
This command references an existing scrubbing configuration (either by name or by id); the configuration holds the object type, number of partitions and the storage configuration this scrubbing session will check on.
swh scrubber check run [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Config ID (is config name is not given as argument)
- --use-journal#
Flag only relevant for running an object storage scrubber, if set content ids are consumed from a kafka topic of SWH journal instead of getting them from a storage
- --limit <limit>#
Arguments
- NAME#
Optional argument
running#
List partitions being checked for the check session <name>
swh scrubber check running [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Arguments
- NAME#
Optional argument
stalled#
List the stuck partitions for a given config
swh scrubber check stalled [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- --for <delay>#
Delay for a partition to be considered as stuck; in seconds or ‘auto’
- --reset#
Reset the stalled partition so it can be grabbed by a scrubber worker
Arguments
- NAME#
Optional argument
stats#
Display statistics for the check session <name>
swh scrubber check stats [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- -j, --json#
Arguments
- NAME#
Optional argument
fix#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber fix [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#
locate#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber locate [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#