swh.scrubber.storage_checker module#

Reads all objects in a swh-storage instance and recomputes their checksums.

swh.scrubber.storage_checker.postgresql_storage_db(storage)[source]#
class swh.scrubber.storage_checker.StorageChecker(db: ScrubberDb, storage: StorageInterface, object_type: str, nb_partitions: int, start_partition_id: int, end_partition_id: int)[source]#

Bases: object

Reads a chunk of a swh-storage database, recomputes checksums, and reports errors in a separate database.

db: ScrubberDb#
storage: StorageInterface#
object_type: str#

directory/revision/release/snapshot

nb_partitions: int#

Number of partitions to split the whole set of objects into. Must be a power of 2.

start_partition_id: int#

First partition id to check (inclusive). Must be in the range [0, nb_partitions).

end_partition_id: int#

Last partition id to check (exclusive). Must be in the range (start_partition_id, nb_partitions]

datastore_info() Datastore[source]#

Returns a Datastore instance representing the swh-storage instance being checked.

statsd() Statsd[source]#
run() None[source]#

Runs on all objects of object_type in a partition between start_partition_id (inclusive) and end_partition_id (exclusive)

check_object_hashes(objects: Iterable[Union[Revision, Release, Snapshot, Directory, Content]])[source]#

Recomputes hashes, and reports mismatches.

check_object_references(objects: Iterable[Union[Revision, Release, Snapshot, Directory, Content]])[source]#

Check all objects references by these objects exist.