swh.scrubber.fixer module#
Reads all known corrupts objects from the swh-scrubber database, and tries to recover them.
Currently, only recovery from Git origins is implemented
- swh.scrubber.fixer.get_object_from_clone(clone_path: Path, swhid: CoreSWHID) None | bytes | ShaFile [source]#
Reads the original object matching the
corrupt_object
from the given clone if it exists, and returns a Dulwich object if possible, or a the raw manifest.
- swh.scrubber.fixer.get_fixed_object_from_clone(clone_path: Path, corrupt_object: CorruptObject) FixedObject | None [source]#
Reads the original object matching the
corrupt_object
from the given clone if it exists, and returns aFixedObject
instance ready to be inserted in the database.
- class swh.scrubber.fixer.Fixer(db: ScrubberDb, start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000'), end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff'))[source]#
Bases:
object
Reads a chunk of corrupt objects in the swh-scrubber database, tries to recover them through various means (brute-forcing fields and re-downloading from the origin) recomputes checksums, and writes them back to the swh-scrubber database if successful.
- db: ScrubberDb#
Database to read from and write to.
- start_object: CoreSWHID = CoreSWHID.from_string('swh:1:cnt:0000000000000000000000000000000000000000')#
Minimum SWHID to check (in alphabetical order)
- end_object: CoreSWHID = CoreSWHID.from_string('swh:1:snp:ffffffffffffffffffffffffffffffffffffffff')#
Maximum SWHID to check (in alphabetical order)
- recover_objects_from_origin(origin_url)[source]#
Clones an origin, and cherry-picks original objects that are known to be corrupt in the database.
- recover_corrupt_object(corrupt_object: CorruptObject, cur: cursor, clone_path: Path) None [source]#