swh.scrubber.db module#
- class swh.scrubber.db.Datastore(package: str, cls: str, instance: str)[source]#
Bases:
object
Represents a datastore being scrubbed; eg. swh-storage or swh-journal.
- class swh.scrubber.db.CorruptObject(id: swh.model.swhids.CoreSWHID, datastore: swh.scrubber.db.Datastore, first_occurrence: datetime.datetime, object_: bytes)[source]#
Bases:
object
- class swh.scrubber.db.MissingObject(id: swh.model.swhids.CoreSWHID, datastore: swh.scrubber.db.Datastore, first_occurrence: datetime.datetime)[source]#
Bases:
object
- class swh.scrubber.db.MissingObjectReference(missing_id: swh.model.swhids.CoreSWHID, reference_id: swh.model.swhids.CoreSWHID, datastore: swh.scrubber.db.Datastore, first_occurrence: datetime.datetime)[source]#
Bases:
object
- class swh.scrubber.db.FixedObject(id: swh.model.swhids.CoreSWHID, object_: bytes, method: str, recovery_date: Union[datetime.datetime, NoneType] = None)[source]#
Bases:
object
- class swh.scrubber.db.ScrubberDb(conn: connection, pool: Optional[AbstractConnectionPool] = None)[source]#
Bases:
BaseDb
create a DB proxy
- Parameters:
conn – psycopg2 connection to the SWH DB
pool – psycopg2 pool of connections
- current_version = 5#
- datastore_get_or_add(datastore: Datastore) int [source]#
Creates a datastore if it does not exist, and returns its id.
- checked_partition_upsert(datastore: Datastore, object_type: ObjectType, partition_id: int, nb_partitions: int, date: datetime) None [source]#
Records in the database the given partition was last checked at the given date.
- checked_partition_get_last_date(datastore: Datastore, object_type: ObjectType, partition_id: int, nb_partitions: int) Optional[datetime] [source]#
Returns the last date the given partition was checked in the given datastore, or
None
if it was never checked.Currently, this matches partition id and number exactly, with no regard for partitions that contain or are contained by it.
- checked_partition_iter(datastore: Datastore) Iterator[Tuple[ObjectType, int, int, datetime]] [source]#
Yields tuples of
(partition_id, nb_partitions, last_date)
- corrupt_object_iter() Iterator[CorruptObject] [source]#
Yields all records in the ‘corrupt_object’ table.
- corrupt_object_get(start_id: CoreSWHID, end_id: CoreSWHID, limit: int = 100) List[CorruptObject] [source]#
Yields a page of records in the ‘corrupt_object’ table, ordered by id.
- Parameters:
start_id – Only return objects after this id
end_id – Only return objects before this id
in_origin – An origin URL. If provided, only returns objects that may be found in the given origin
- corrupt_object_grab_by_id(cur: cursor, start_id: CoreSWHID, end_id: CoreSWHID, limit: int = 100) List[CorruptObject] [source]#
Returns a page of records in the ‘corrupt_object’ table for a fixer, ordered by id
These records are not already fixed (ie. do not have a corresponding entry in the ‘fixed_object’ table), and they are selected with an exclusive update lock.
- Parameters:
start_id – Only return objects after this id
end_id – Only return objects before this id
- corrupt_object_grab_by_origin(cur: cursor, origin_url: str, start_id: Optional[CoreSWHID] = None, end_id: Optional[CoreSWHID] = None, limit: int = 100) List[CorruptObject] [source]#
Returns a page of records in the ‘corrupt_object’ table for a fixer, ordered by id
These records are not already fixed (ie. do not have a corresponding entry in the ‘fixed_object’ table), and they are selected with an exclusive update lock.
- Parameters:
origin_url – only returns objects that may be found in the given origin
- missing_object_add(id: CoreSWHID, reference_ids: Iterable[CoreSWHID], datastore: Datastore) None [source]#
Adds a “hole” to the inventory, ie. an object missing from a datastore that is referenced by an other object of the same datastore.
If the missing object is already known to be missing by the scrubber database, this only records the reference (which can be useful to locate an origin to recover the object from). If that reference is already known too, this is a noop.
- Parameters:
id – SWHID of the missing object (the hole)
reference_id – SWHID of the object referencing the missing object
datastore – representation of the swh-storage/swh-journal/… instance containing this hole
- missing_object_iter() Iterator[MissingObject] [source]#
Yields all records in the ‘missing_object’ table.
- missing_object_reference_iter(missing_id: CoreSWHID) Iterator[MissingObjectReference] [source]#
Yields all records in the ‘missing_object_reference’ table.
- object_origin_get(after: str = '', limit: int = 1000) List[str] [source]#
Returns origins with non-fixed corrupt objects, ordered by URL.
- Parameters:
after – if given, only returns origins with an URL after this value
- fixed_object_add(cur: cursor, fixed_objects: List[FixedObject]) None [source]#
- fixed_object_iter() Iterator[FixedObject] [source]#