swh.storage.backfill module#
Storage backfiller.
The backfiller goal is to produce back part or all of the objects from a storage to the journal topics
Current implementation consists in the JournalBackfiller class.
It simply reads the objects from the storage and sends every object identifier back to the journal.
- swh.storage.backfill.directory_converter(db: BaseDb, directory_d: Dict[str, Any]) Directory [source]#
Convert directory from the flat representation to swh model compatible objects.
- swh.storage.backfill.raw_extrinsic_metadata_converter(db: BaseDb, metadata: Dict[str, Any]) RawExtrinsicMetadata [source]#
Convert a raw extrinsic metadata from the flat representation to swh model compatible objects.
- swh.storage.backfill.extid_converter(db: BaseDb, extid: Dict[str, Any]) ExtID [source]#
Convert an extid from the flat representation to swh model compatible objects.
- swh.storage.backfill.revision_converter(db: BaseDb, revision_d: Dict[str, Any]) Revision [source]#
Convert revision from the flat representation to swh model compatible objects.
- swh.storage.backfill.release_converter(db: BaseDb, release_d: Dict[str, Any]) Release [source]#
Convert release from the flat representation to swh model compatible objects.
- swh.storage.backfill.snapshot_converter(db: BaseDb, snapshot_d: Dict[str, Any]) Snapshot [source]#
Convert snapshot from the flat representation to swh model compatible objects.
- swh.storage.backfill.object_to_offset(object_id, numbits)[source]#
- Compute the index of the range containing object id, when dividing
space into 2^numbits.
- swh.storage.backfill.byte_ranges(numbits: int, start_object: str | None = None, end_object: str | None = None) Iterator[Tuple[bytes | None, bytes | None]] [source]#
- Generate start/end pairs of bytes spanning numbits bits and
constrained by optional start_object and end_object.
- Parameters:
numbits – Number of bits in which we divide input space
start_object – Hex object id contained in the first range returned
end_object – Hex object id contained in the last range returned
- Yields:
2^numbits pairs of bytes
- swh.storage.backfill.raw_extrinsic_metadata_target_ranges(start_object: str | None = None, end_object: str | None = None) Iterator[Tuple[str | None, str | None]] [source]#
Generate ranges of values for the target attribute of raw_extrinsic_metadata objects.
This generates one range for all values before the first SWHID (which would correspond to raw origin URLs), then a number of hex-based ranges for each known type of SWHID (2**12 ranges for directories, 2**8 ranges for all other types). Finally, it generates one extra range for values above all possible SWHIDs.
- swh.storage.backfill.integer_ranges(start: str, end: str, block_size: int = 1000) Iterator[Tuple[int | None, int | None]] [source]#
- swh.storage.backfill.fetch(db, obj_type, start, end)[source]#
Fetch all obj_type’s identifiers from db.
This opens one connection, stream objects and when done, close the connection.
- class swh.storage.backfill.JournalBackfiller(config=None)[source]#
Bases:
object
Class in charge of reading the storage’s objects and sends those back to the journal’s topics.
This is designed to be run periodically.
- property db#