Command-line interface#
swh scrubber#
main command group of the datastore scrubber
Expected config format:
scrubber:
cls: postgresql
db: "service=..." # libpq DSN
# for storage checkers + origin locator only:
storage:
cls: postgresql # cannot be remote for checkers, as they need direct
# access to the pg DB
db": "service=..." # libpq DSN
objstorage:
cls: memory
# for journal checkers only:
journal:
# see https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
# for the full list of options
sasl.mechanism: SCRAM-SHA-512
security.protocol: SASL_SSL
sasl.username: ...
sasl.password: ...
group_id: ...
privileged: True
message.max.bytes: 524288000
brokers:
- "broker1.journal.softwareheritage.org:9093
- "broker2.journal.softwareheritage.org:9093
- "broker3.journal.softwareheritage.org:9093
- "broker4.journal.softwareheritage.org:9093
- "broker5.journal.softwareheritage.org:9093
object_types: [directory, revision, snapshot, release]
auto_offset_reset: earliest
swh scrubber [OPTIONS] COMMAND [ARGS]...
Options
- -C, --config-file <config_file>#
Configuration file.
check#
group of commands which read from data stores and report errors.
swh scrubber check [OPTIONS] COMMAND [ARGS]...
init#
Initialise a scrubber check configuration for the datastore defined in the configuration file and given object_type.
A checker configuration configuration consists simply in a set of:
backend: the datastore type being scrubbed (storage or journal),
object-type: the type of object being checked,
nb-pertitions: the number of partitions the hash space is divided in; must be a power of 2,
name: an unique name for easier reference,
check-hashes: flag (default to True) to select the hash validation step for this scrubbing configuration,
check-references: flag (default to True for storage and False for the journal backend) to select the reference validation step for this scrubbing configuration.
swh scrubber check init [OPTIONS] {storage|journal}
Options
- --object-type <object_type>#
- Options:
snapshot | revision | release | directory
- --nb-partitions <nb_partitions>#
- --name <name>#
- --check-hashes, --no-check-hashes#
- --check-references, --no-check-references#
Arguments
- BACKEND#
Required argument
journal#
Reads a complete kafka journal, and reports corrupt objects to the scrubber DB.
swh scrubber check journal [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Config ID (is config name is not given as argument)
Arguments
- NAME#
Optional argument
list#
List the know configurations
swh scrubber check list [OPTIONS]
running#
List partitions being checked for the check session <name>
swh scrubber check running [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Arguments
- NAME#
Optional argument
stalled#
List the stuck partitions for a given config
swh scrubber check stalled [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- --for <delay>#
Delay for a partition to be considered as stuck; in seconds or ‘auto’
- --reset#
Reset the stalled partition so it can be grabbed by a scrubber worker
Arguments
- NAME#
Optional argument
stats#
Display statistics for the check session <name>
swh scrubber check stats [OPTIONS] [NAME]
Options
- --config-id <config_id>#
- -j, --json#
Arguments
- NAME#
Optional argument
storage#
Reads a swh-storage instance, and reports corrupt objects to the scrubber DB.
This runs a single thread; parallelism is achieved by running this command multiple times.
This command references an existing scrubbing configuration (either by name or by id); the configuration holds the object type, number of partitions and the storage configuration this scrubbing session will check on.
All objects of type object_type
are ordered, and split into the given
number of partitions.
Then, this process will check all partitions. The status of the ongoing check session is stored in the database, so the number of concurrent workers can be dynamically adjusted.
swh scrubber check storage [OPTIONS] [NAME]
Options
- --config-id <config_id>#
Config ID (is config name is not given as argument)
- --limit <limit>#
Arguments
- NAME#
Optional argument
fix#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber fix [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#
locate#
For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.
swh scrubber locate [OPTIONS]
Options
- --start-object <start_object>#
- --end-object <end_object>#