swh.model.swhids module#

Classes to represent SWH persistend IDentifiers.

CoreSWHID represents a SWHID with no qualifier, and QualifiedSWHID represents a SWHID that may have qualifiers. ExtendedSWHID extends the definition of SWHID to other object types, and is used internally in Software Heritage; it does not support qualifiers.

class swh.model.swhids.ObjectType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Possible object types of a QualifiedSWHID or CoreSWHID.

The values of each variant is what is used in the SWHID’s string representation.

SNAPSHOT = 'snp'#
REVISION = 'rev'#
RELEASE = 'rel'#
DIRECTORY = 'dir'#
CONTENT = 'cnt'#
class swh.model.swhids.ExtendedObjectType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Possible object types of an ExtendedSWHID.

The variants are a superset of ObjectType’s

SNAPSHOT = 'snp'#
REVISION = 'rev'#
RELEASE = 'rel'#
DIRECTORY = 'dir'#
CONTENT = 'cnt'#
ORIGIN = 'ori'#
RAW_EXTRINSIC_METADATA = 'emd'#
class swh.model.swhids.CoreSWHID(*, namespace: str = 'swh', scheme_version: int = 1, object_id: bytes, object_type)[source]#

Bases: _BaseSWHID[ObjectType]

Dataclass holding the relevant info associated to a SoftWare Heritage persistent IDentifier (SWHID).

Unlike QualifiedSWHID, it is restricted to core SWHIDs, ie. SWHIDs with no qualifiers.

Raises:

swh.model.exceptions.ValidationError – In case of invalid object type or id

To get the raw SWHID string from an instance of this class, use the str() function:

>>> swhid = CoreSWHID(
...     object_type=ObjectType.CONTENT,
...     object_id=bytes.fromhex('8ff44f081d43176474b267de5451f2c2e88089d0'),
... )
>>> str(swhid)
'swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0'

And vice-versa with CoreSWHID.from_string():

>>> swhid == CoreSWHID.from_string(
...     "swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0"
... )
True

Method generated by attrs for class CoreSWHID.

object_type: _TObjectType#

the type of object the identifier points to

to_extended() ExtendedSWHID[source]#

Converts this CoreSWHID into an ExtendedSWHID.

As ExtendedSWHID is a superset of CoreSWHID, this is lossless.

to_qualified() QualifiedSWHID[source]#

Converts this CoreSWHID into a QualifiedSWHID.

As QualifiedSWHID is a superset of CoreSWHID, this is lossless.

class swh.model.swhids.QualifiedSWHID(*, namespace: str = 'swh', scheme_version: int = 1, object_id: bytes, object_type, origin: str | None = None, visit: str | CoreSWHID | None = None, anchor: str | CoreSWHID | None = None, path: str | bytes | None = None, lines: str | Tuple[int, int | None] | None = None)[source]#

Bases: _BaseSWHID[ObjectType]

Dataclass holding the relevant info associated to a SoftWare Heritage persistent IDentifier (SWHID)

Raises:

swh.model.exceptions.ValidationError – In case of invalid object type or id

To get the raw SWHID string from an instance of this class, use the str() function:

>>> swhid = QualifiedSWHID(
...     object_type=ObjectType.CONTENT,
...     object_id=bytes.fromhex('8ff44f081d43176474b267de5451f2c2e88089d0'),
...     lines=(5, 10),
... )
>>> str(swhid)
'swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0;lines=5-10'

And vice-versa with QualifiedSWHID.from_string():

>>> swhid == QualifiedSWHID.from_string(
...     "swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0;lines=5-10"
... )
True

Method generated by attrs for class QualifiedSWHID.

object_type: _TObjectType#

the type of object the identifier points to

origin#

the software origin where an object has been found or observed in the wild, as an URI

visit#

the core identifier of a snapshot corresponding to a specific visit of a repository containing the designated object

anchor#

a designated node in the Merkle DAG relative to which a path to the object is specified, as the core identifier of a directory, a revision, a release, or a snapshot

path#

the absolute file path, from the root directory associated to the anchor node, to the object; when the anchor denotes a directory or a revision, and almost always when it’s a release, the root directory is uniquely determined; when the anchor denotes a snapshot, the root directory is the one pointed to by HEAD (possibly indirectly), and undefined if such a reference is missing

Lines#

alias of Tuple[int, int | None]

lines#

line number(s) of interest, usually within a content object

Type:

lines

check_visit(attribute, value)[source]#
check_anchor(attribute, value)[source]#
to_dict() Dict[str, str | bytes | CoreSWHID | Lines | None][source]#

Returns a dictionary version of this QSWHID for json serialization

qualifiers() Dict[str, str][source]#

Returns URL-escaped qualifiers of this SWHID, for use in serialization

classmethod from_string(s: str) QualifiedSWHID[source]#
class swh.model.swhids.ExtendedSWHID(*, namespace: str = 'swh', scheme_version: int = 1, object_id: bytes, object_type)[source]#

Bases: _BaseSWHID[ExtendedObjectType]

Dataclass holding the relevant info associated to a SoftWare Heritage persistent IDentifier (SWHID).

It extends CoreSWHID, by allowing non-standard object types; and should only be used internally to Software Heritage.

Raises:

swh.model.exceptions.ValidationError – In case of invalid object type or id

To get the raw SWHID string from an instance of this class, use the str() function:

>>> swhid = ExtendedSWHID(
...     object_type=ExtendedObjectType.CONTENT,
...     object_id=bytes.fromhex('8ff44f081d43176474b267de5451f2c2e88089d0'),
... )
>>> str(swhid)
'swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0'

And vice-versa with CoreSWHID.from_string():

>>> swhid == ExtendedSWHID.from_string(
...     "swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0"
... )
True

Method generated by attrs for class ExtendedSWHID.

object_type: _TObjectType#

the type of object the identifier points to