swh.graph.pid module¶
-
class
swh.graph.pid.
PidType
(value)[source]¶ Bases:
enum.Enum
types of existing PIDs, used to serialize PID type as a (char) integer
Note that the order does matter also for driving the binary search in PID-indexed maps. Integer values also matter, for compatibility with the Java layer.
-
content
= 0¶
-
directory
= 1¶
-
origin
= 2¶
-
release
= 3¶
-
revision
= 4¶
-
snapshot
= 5¶
-
-
swh.graph.pid.
str_to_bytes
(pid_str: str) → bytes[source]¶ Convert a PID to a byte sequence
The binary format used to represent PIDs as 22-byte long byte sequences as follows:
1 byte for the namespace version represented as a C unsigned char
1 byte for the object type, as the int value of
PidType
enums, represented as a C unsigned char20 bytes for the SHA1 digest as a byte sequence
- Parameters
pid – persistent identifier
- Returns
byte sequence representation of pid
- Return type
bytes
-
swh.graph.pid.
bytes_to_str
(bytes: bytes) → str[source]¶ Inverse function of
str_to_bytes()
See
str_to_bytes()
for a description of the binary PID format.- Parameters
bytes – byte sequence representation of pid
- Returns
persistent identifier
- Return type
pid
-
class
swh.graph.pid.
PidToNodeMap
(fname: str, mode: str = 'rb', length: Optional[int] = None)[source]¶ Bases:
swh.graph.pid._OnDiskMap
,collections.abc.MutableMapping
memory mapped map from SWHIDs to a continuous range 0..N of (8-byte long) integers
This is the converse mapping of
NodeToPidMap
.The on-disk serialization format is a sequence of fixed length (30 bytes) records with the following fields:
PID (22 bytes): binary PID representation as per
str_to_bytes()
long (8 bytes): big endian long integer
The records are sorted lexicographically by PID type and checksum, where type is the integer value of
PidType
. PID lookup in the map is performed via binary search. Hence a huge map with, say, 11 B entries, will require ~30 disk seeks.Note that, due to fixed size + ordering, it is not possible to create these maps by random writing. Hence, __setitem__ can be used only to update the value associated to an existing key, rather than to add a missing item. To create an entire map from scratch, you should do so sequentially, using static method
write_record()
(or, at your own risk, by hand via the mmapmm
).-
RECORD_BIN_FMT
= '>BB20sq'¶
-
RECORD_SIZE
= 30¶
-
class
swh.graph.pid.
NodeToPidMap
(fname: str, mode: str = 'rb', length: Optional[int] = None)[source]¶ Bases:
swh.graph.pid._OnDiskMap
,collections.abc.MutableMapping
memory mapped map from a continuous range of 0..N (8-byte long) integers to SWHIDs
This is the converse mapping of
PidToNodeMap
.The on-disk serialization format is a sequence of fixed length records (22 bytes), each being the binary representation of a PID as per
str_to_bytes()
.The records are sorted by long integer, so that integer lookup is possible via fixed-offset seek.
-
RECORD_BIN_FMT
= 'BB20s'¶
-
RECORD_SIZE
= 22¶
-