Command-line interface#

swh scrubber#

main command group of the datastore scrubber

Expected config format:

scrubber:
    cls: postgresql
    db: "service=..."    # libpq DSN

# for storage checkers + origin locator only:
storage:
    cls: postgresql     # cannot be remote for checkers, as they need direct
                        # access to the pg DB
    db": "service=..."  # libpq DSN
    objstorage:
        cls: memory

# for journal checkers only:
journal:
    # see https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
    # for the full list of options
    sasl.mechanism: SCRAM-SHA-512
    security.protocol: SASL_SSL
    sasl.username: ...
    sasl.password: ...
    group_id: ...
    privileged: True
    message.max.bytes: 524288000
    brokers:
      - "broker1.journal.softwareheritage.org:9093
      - "broker2.journal.softwareheritage.org:9093
      - "broker3.journal.softwareheritage.org:9093
      - "broker4.journal.softwareheritage.org:9093
      - "broker5.journal.softwareheritage.org:9093
    object_types: [directory, revision, snapshot, release]
    auto_offset_reset: earliest
swh scrubber [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

check#

group of commands which read from data stores and report errors.

swh scrubber check [OPTIONS] COMMAND [ARGS]...

init#

Initialise a scrubber check configuration for the datastore defined in the configuration file and given object_type.

A checker configuration configuration consists simply in a set of:

  • backend: the datastore type being scrubbed (storage, objstorage or journal),

  • object-type: the type of object being checked,

  • nb-partitions: the number of partitions the hash space is divided in; must be a power of 2,

  • name: an unique name for easier reference,

  • check-hashes: flag (default to True) to select the hash validation step for this scrubbing configuration,

  • check-references: flag (default to True for storage and False for the journal backend) to select the reference validation step for this scrubbing configuration.

swh scrubber check init [OPTIONS] {storage|journal|objstorage}

Options

--object-type <object_type>#
Options:

snapshot | revision | release | directory | content

--nb-partitions <nb_partitions>#
--name <name>#
--check-hashes, --no-check-hashes#
--check-references, --no-check-references#

Arguments

BACKEND#

Required argument

list#

List the know configurations

swh scrubber check list [OPTIONS]

run#

Run the scrubber checker configured as name and reports corrupt objects to the scrubber DB.

This runs a single thread; parallelism is achieved by running this command multiple times.

This command references an existing scrubbing configuration (either by name or by id); the configuration holds the object type, number of partitions and the storage configuration this scrubbing session will check on.

swh scrubber check run [OPTIONS] [NAME]

Options

--config-id <config_id>#

Config ID (is config name is not given as argument)

--use-journal#

Flag only relevant for running an object storage scrubber, if set content ids are consumed from a kafka topic of SWH journal instead of getting them from a storage

--limit <limit>#

Arguments

NAME#

Optional argument

running#

List partitions being checked for the check session <name>

swh scrubber check running [OPTIONS] [NAME]

Options

--config-id <config_id>#

Arguments

NAME#

Optional argument

stalled#

List the stuck partitions for a given config

swh scrubber check stalled [OPTIONS] [NAME]

Options

--config-id <config_id>#
--for <delay>#

Delay for a partition to be considered as stuck; in seconds or ‘auto’

--reset#

Reset the stalled partition so it can be grabbed by a scrubber worker

Arguments

NAME#

Optional argument

stats#

Display statistics for the check session <name>

swh scrubber check stats [OPTIONS] [NAME]

Options

--config-id <config_id>#
-j, --json#

Arguments

NAME#

Optional argument

fix#

For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.

swh scrubber fix [OPTIONS]

Options

--start-object <start_object>#
--end-object <end_object>#

locate#

For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.

swh scrubber locate [OPTIONS]

Options

--start-object <start_object>#
--end-object <end_object>#