swh.scrubber.fixer module#

Reads all known corrupts objects from the swh-scrubber database, and tries to recover them.

Currently, only recovery from Git origins is implemented

swh.scrubber.fixer.get_object_from_clone(clone_path: Path, swhid: CoreSWHID) None | bytes | ShaFile[source]#

Reads the original object matching the corrupt_object from the given clone if it exists, and returns a Dulwich object if possible, or a the raw manifest.

swh.scrubber.fixer.get_fixed_object_from_clone(clone_path: Path, corrupt_object: CorruptObject) FixedObject | None[source]#

Reads the original object matching the corrupt_object from the given clone if it exists, and returns a FixedObject instance ready to be inserted in the database.

class swh.scrubber.fixer.Fixer(db: ScrubberDb, start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000'), end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff'))[source]#

Bases: object

Reads a chunk of corrupt objects in the swh-scrubber database, tries to recover them through various means (brute-forcing fields and re-downloading from the origin) recomputes checksums, and writes them back to the swh-scrubber database if successful.

db: ScrubberDb#

Database to read from and write to.

start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000')#

Minimum SWHID to check (in alphabetical order)

end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff')#

Maximum SWHID to check (in alphabetical order)

run()[source]#
recover_objects_from_origin(origin_url)[source]#

Clones an origin, and cherry-picks original objects that are known to be corrupt in the database.

recover_corrupt_object(corrupt_object: CorruptObject, cur: cursor, clone_path: Path) None[source]#