swh.journal.backfill module

Module defining journal backfiller classes.

Those backfiller goal is to produce back part or all of the objects from the storage to the journal topics

At the moment, a first naive implementation is the JournalBackfiller. It simply reads the objects from the storage and sends every object identifier back to the journal.

swh.journal.backfill.directory_converter(db, directory)[source]

Convert directory from the flat representation to swh model compatible objects.

swh.journal.backfill.revision_converter(db, revision)[source]

Convert revision from the flat representation to swh model compatible objects.

swh.journal.backfill.release_converter(db, release)[source]

Convert release from the flat representation to swh model compatible objects.

swh.journal.backfill.snapshot_converter(db, snapshot)[source]

Convert snapshot from the flat representation to swh model compatible objects.

swh.journal.backfill.origin_visit_converter(db, origin_visit)[source]
swh.journal.backfill.object_to_offset(object_id, numbits)[source]
Compute the index of the range containing object id, when dividing

space into 2^numbits.

Parameters
  • object_id (str) – The hex representation of object_id

  • numbits (int) – Number of bits in which we divide input space

Returns

The index of the range containing object id

swh.journal.backfill.byte_ranges(numbits, start_object=None, end_object=None)[source]
Generate start/end pairs of bytes spanning numbits bits and

constrained by optional start_object and end_object.

Parameters
  • numbits (int) – Number of bits in which we divide input space

  • start_object (str) – Hex object id contained in the first range returned

  • end_object (str) – Hex object id contained in the last range returned

Yields

2^numbits pairs of bytes

swh.journal.backfill.integer_ranges(start, end, block_size=1000)[source]
swh.journal.backfill.compute_query(obj_type, start, end)[source]
swh.journal.backfill.fetch(db, obj_type, start, end)[source]

Fetch all obj_type’s identifiers from db.

This opens one connection, stream objects and when done, close the connection.

Parameters
  • db (BaseDb) – Db connection object

  • obj_type (str) – Object type

  • start (Union[bytes|Tuple]) – Range start identifier

  • end (Union[bytes|Tuple]) – Range end identifier

Raises

ValueError if obj_type is not supported

Yields

Objects in the given range

class swh.journal.backfill.JournalBackfiller(config=None)[source]

Bases: object

Class in charge of reading the storage’s objects and sends those back to the journal’s topics.

This is designed to be run periodically.

check_config(config)[source]
parse_arguments(object_type, start_object, end_object)[source]

Parse arguments

Raises
  • ValueError for unsupported object type

  • ValueError if object ids are not parseable

Returns

Parsed start and end object ids

run(object_type, start_object, end_object, dry_run=False)[source]

Reads storage’s subscribed object types and send them to the journal’s reading topic.