swh.scrubber.objstorage_checker module#

swh.scrubber.objstorage_checker.get_objstorage_datastore(objstorage_config)[source]#
class swh.scrubber.objstorage_checker.ObjectStorageCheckerProtocol(*args, **kwargs)[source]#

Bases: Protocol

db: ScrubberDb#
objstorage: ObjStorageInterface#
property config: ConfigEntry#
property statsd: Statsd#
class swh.scrubber.objstorage_checker.ContentCheckerMixin(*args, **kwargs)[source]#

Bases: ObjectStorageCheckerProtocol

Mixin class implementing content checks used by object storage checkers.

check_content(content: Content) None[source]#

Checks if a content exists in an object storage (if check_references is set to True in checker config) or if a content is corrupted in an object storage (if check_hashes is set to True in checker config).

class swh.scrubber.objstorage_checker.ObjectStorageCheckerFromStoragePartition(db: ScrubberDb, config_id: int, storage: StorageInterface, objstorage: ObjStorageInterface | None = None, limit: int = 0)[source]#

Bases: BasePartitionChecker, ContentCheckerMixin

A partition based checker to detect missing and corrupted contents in an object storage.

It iterates on content objects referenced in a storage instance, check they are available in a given object storage instance (if check_references is set to True in checker config) then retrieve their bytes from it in order to recompute checksums and detect corruptions (if check_hashes is set to True in checker config).

check_partition(object_type: ObjectType, partition_id: int) None[source]#

Abstract method that derived classes must implement to check objects in partition.

class swh.scrubber.objstorage_checker.ObjectStorageCheckerFromJournal(db: ScrubberDb, config_id: int, journal_client_config: Dict[str, Any], objstorage: ObjStorageInterface)[source]#

Bases: BaseChecker, ContentCheckerMixin

A journal based checker to detect missing and corrupted contents in an object storage.

It iterates on content objects referenced in a kafka topic, check they are available in a given object storage instance then retrieve their bytes from it in order to recompute checksums and detect corruptions.

run() None[source]#

Run the checker processing, derived classes must implement this method.

process_kafka_messages(all_messages: Dict[str, List[bytes]])[source]#