Command-line interface#

swh scrubber#

main command group of the datastore scrubber

Expected config format:

scrubber:
    cls: postgresql
    db: "service=..."    # libpq DSN

# for storage checkers + origin locator only:
storage:
    cls: postgresql     # cannot be remote for checkers, as they need direct
                        # access to the pg DB
    db": "service=..."  # libpq DSN
    objstorage:
        cls: memory

# for journal checkers only:
journal:
    # see https://docs.softwareheritage.org/devel/apidoc/swh.journal.client.html
    # for the full list of options
    sasl.mechanism: SCRAM-SHA-512
    security.protocol: SASL_SSL
    sasl.username: ...
    sasl.password: ...
    group_id: ...
    privileged: True
    message.max.bytes: 524288000
    brokers:
      - "broker1.journal.softwareheritage.org:9093
      - "broker2.journal.softwareheritage.org:9093
      - "broker3.journal.softwareheritage.org:9093
      - "broker4.journal.softwareheritage.org:9093
      - "broker5.journal.softwareheritage.org:9093
    object_types: [directory, revision, snapshot, release]
    auto_offset_reset: earliest
swh scrubber [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

check#

group of commands which read from data stores and report errors.

swh scrubber check [OPTIONS] COMMAND [ARGS]...

init#

Initialise a scrubber check configuration for the datastore defined in the configuration file and given object_type.

A checker configuration configuration consists simply in a set of:

  • backend: the datastore type being scrubbed (storage or journal),

  • object-type: the type of object being checked,

  • nb-pertitions: the number of partitions the hash space is divided in; must be a power of 2,

  • name: an unique name for easier reference,

  • check-hashes: flag (default to True) to select the hash validation step for this scrubbing configuration,

  • check-references: flag (default to True for storage and False for the journal backend) to select the reference validation step for this scrubbing configuration.

swh scrubber check init [OPTIONS] {storage|journal}

Options

--object-type <object_type>#
Options:

snapshot | revision | release | directory

--nb-partitions <nb_partitions>#
--name <name>#
--check-hashes, --no-check-hashes#
--check-references, --no-check-references#

Arguments

BACKEND#

Required argument

journal#

Reads a complete kafka journal, and reports corrupt objects to the scrubber DB.

swh scrubber check journal [OPTIONS] [NAME]

Options

--config-id <config_id>#

Config ID (is config name is not given as argument)

Arguments

NAME#

Optional argument

list#

List the know configurations

swh scrubber check list [OPTIONS]

running#

List partitions being checked for the check session <name>

swh scrubber check running [OPTIONS] [NAME]

Options

--config-id <config_id>#

Arguments

NAME#

Optional argument

stalled#

List the stuck partitions for a given config

swh scrubber check stalled [OPTIONS] [NAME]

Options

--config-id <config_id>#
--for <delay>#

Delay for a partition to be considered as stuck; in seconds or ‘auto’

--reset#

Reset the stalled partition so it can be grabbed by a scrubber worker

Arguments

NAME#

Optional argument

stats#

Display statistics for the check session <name>

swh scrubber check stats [OPTIONS] [NAME]

Options

--config-id <config_id>#
-j, --json#

Arguments

NAME#

Optional argument

storage#

Reads a swh-storage instance, and reports corrupt objects to the scrubber DB.

This runs a single thread; parallelism is achieved by running this command multiple times.

This command references an existing scrubbing configuration (either by name or by id); the configuration holds the object type, number of partitions and the storage configuration this scrubbing session will check on.

All objects of type object_type are ordered, and split into the given number of partitions.

Then, this process will check all partitions. The status of the ongoing check session is stored in the database, so the number of concurrent workers can be dynamically adjusted.

swh scrubber check storage [OPTIONS] [NAME]

Options

--config-id <config_id>#

Config ID (is config name is not given as argument)

--limit <limit>#

Arguments

NAME#

Optional argument

fix#

For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.

swh scrubber fix [OPTIONS]

Options

--start-object <start_object>#
--end-object <end_object>#

locate#

For each known corrupt object reported in the scrubber DB, looks up origins that may contain this object, and records them; so they can be used later for recovery.

swh scrubber locate [OPTIONS]

Options

--start-object <start_object>#
--end-object <end_object>#