swh.graph.swhid module

class swh.graph.swhid.SwhidType(value)[source]

Bases: enum.Enum

types of existing SWHIDs, used to serialize ExtendedSWHID type as a (char) integer

Note that the order does matter also for driving the binary search in SWHID-indexed maps. Integer values also matter, for compatibility with the Java layer.

content = 0
directory = 1
origin = 2
release = 3
revision = 4
snapshot = 5
classmethod from_extended_object_type(object_type: swh.model.identifiers.ExtendedObjectType)swh.graph.swhid.SwhidType[source]
to_extended_object_type()swh.model.identifiers.ExtendedObjectType[source]
swh.graph.swhid.str_to_bytes(swhid_str: str)bytes[source]

Convert a SWHID to a byte sequence

The binary format used to represent SWHIDs as 22-byte long byte sequences as follows:

  • 1 byte for the namespace version represented as a C unsigned char

  • 1 byte for the object type, as the int value of SwhidType enums, represented as a C unsigned char

  • 20 bytes for the SHA1 digest as a byte sequence

Parameters

swhid – persistent identifier

Returns

byte sequence representation of swhid

Return type

bytes

swh.graph.swhid.bytes_to_str(bytes: bytes)str[source]

Inverse function of str_to_bytes()

See str_to_bytes() for a description of the binary SWHID format.

Parameters

bytes – byte sequence representation of swhid

Returns

persistent identifier

Return type

swhid

class swh.graph.swhid.SwhidToNodeMap(fname: str, mode: str = 'rb', length: Optional[int] = None)[source]

Bases: swh.graph.swhid._OnDiskMap, collections.abc.MutableMapping

memory mapped map from SWHIDs to a continuous range 0..N of (8-byte long) integers

This is the converse mapping of NodeToSwhidMap.

The on-disk serialization format is a sequence of fixed length (30 bytes) records with the following fields:

  • SWHID (22 bytes): binary SWHID representation as per str_to_bytes()

  • long (8 bytes): big endian long integer

The records are sorted lexicographically by SWHID type and checksum, where type is the integer value of SwhidType. SWHID lookup in the map is performed via binary search. Hence a huge map with, say, 11 B entries, will require ~30 disk seeks.

Note that, due to fixed size + ordering, it is not possible to create these maps by random writing. Hence, __setitem__ can be used only to update the value associated to an existing key, rather than to add a missing item. To create an entire map from scratch, you should do so sequentially, using static method write_record() (or, at your own risk, by hand via the mmap mm).

RECORD_BIN_FMT = '>BB20sq'
RECORD_SIZE = 30
classmethod write_record(f: BinaryIO, swhid: str, int: int)None[source]

write a logical record to a file-like object

Parameters
  • f – file-like object to write the record to

  • swhid – textual SWHID

  • int – SWHID integer identifier

iter_prefix(prefix: str)[source]
iter_type(swhid_type: str)Iterator[Tuple[str, int]][source]
class swh.graph.swhid.NodeToSwhidMap(fname: str, mode: str = 'rb', length: Optional[int] = None)[source]

Bases: swh.graph.swhid._OnDiskMap, collections.abc.MutableMapping

memory mapped map from a continuous range of 0..N (8-byte long) integers to SWHIDs

This is the converse mapping of SwhidToNodeMap.

The on-disk serialization format is a sequence of fixed length records (22 bytes), each being the binary representation of a SWHID as per str_to_bytes().

The records are sorted by long integer, so that integer lookup is possible via fixed-offset seek.

RECORD_BIN_FMT = 'BB20s'
RECORD_SIZE = 22
classmethod write_record(f: BinaryIO, swhid: str)None[source]

write a SWHID to a file-like object

Parameters
  • f – file-like object to write the record to

  • swhid – textual SWHID