swh.model.model module

exception swh.model.model.MissingData[source]

Bases: Exception

Raised by Content.with_data when it has no way of fetching the data (but not when fetching the data fails).

swh.model.model.freeze_optional_dict(d: Union[None, Dict[KT, VT], swh.model.collections.ImmutableDict[KT, VT]]) → Optional[swh.model.collections.ImmutableDict[KT, VT]][source]
swh.model.model.dictify(value)[source]

Helper function used by BaseModel.to_dict()

class swh.model.model.BaseModel[source]

Bases: object

Base class for SWH model classes.

Provides serialization/deserialization to/from Python dictionaries, that are suitable for JSON/msgpack-like formats.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize() → Optional[ModelType][source]

Returns an anonymized version of the object, if needed.

If the object model does not need/support anonymization, returns None.

class swh.model.model.HashableObject[source]

Bases: object

Mixin to automatically compute object identifier hash when the associated model is instantiated.

abstract static compute_hash(object_dict)[source]

Derived model classes must implement this to compute the object hash from its dict representation.

class swh.model.model.Person(fullname: bytes, name: Optional[bytes], email: Optional[bytes])[source]

Bases: swh.model.model.BaseModel

Represents the author/committer of a revision or release.

object_type: typing_extensions.Final = 'person'
classmethod from_fullname(fullname: bytes)[source]

Returns a Person object, by guessing the name and email from the fullname, in the name <email> format.

The fullname is left unchanged.

anonymize()swh.model.model.Person[source]

Returns an anonymized version of the Person object.

Anonymization is simply a Person which fullname is the hashed, with unset name or email.

class swh.model.model.Timestamp(seconds: int, microseconds: int)[source]

Bases: swh.model.model.BaseModel

Represents a naive timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp'
check_seconds(attribute, value)[source]

Check that seconds fit in a 64-bits signed integer.

check_microseconds(attribute, value)[source]

Checks that microseconds are positive and < 1000000.

class swh.model.model.TimestampWithTimezone(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool)[source]

Bases: swh.model.model.BaseModel

Represents a TZ-aware timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp_with_timezone'
check_offset(attribute, value)[source]

Checks the offset is a 16-bits signed integer (in theory, it should always be between -14 and +14 hours).

check_negative_utc(attribute, value)[source]
classmethod from_dict(obj: Union[Dict, datetime.datetime, int])[source]

Builds a TimestampWithTimezone from any of the formats accepted by swh.model.normalize_timestamp().

classmethod from_datetime(dt: datetime.datetime)[source]
classmethod from_iso8601(s)[source]

Builds a TimestampWithTimezone from an ISO8601-formatted string.

class swh.model.model.Origin(url: str)[source]

Bases: swh.model.model.BaseModel

Represents a software source: a VCS and an URL.

object_type: typing_extensions.Final = 'origin'
class swh.model.model.OriginVisit(origin: str, date: datetime.datetime, type: str, visit: Optional[int] = None)[source]

Bases: swh.model.model.BaseModel

Represents an origin visit with a given type at a given point in time, by a SWH loader.

object_type: typing_extensions.Final = 'origin_visit'
type

Should not be set before calling ‘origin_visit_add()’.

to_dict()[source]

Serializes the date as a string and omits the visit id if it is None.

class swh.model.model.OriginVisitStatus(origin: str, visit: int, date: datetime.datetime, status: str, snapshot: Optional[bytes], metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a visit update of an origin at a given point in time.

object_type: typing_extensions.Final = 'origin_visit_status'
class swh.model.model.TargetType(value)[source]

Bases: enum.Enum

The type of content pointed to by a snapshot branch. Usually a revision or an alias.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ALIAS = 'alias'
class swh.model.model.ObjectType(value)[source]

Bases: enum.Enum

The type of content pointed to by a release. Usually a revision

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
class swh.model.model.SnapshotBranch(target: bytes, target_type: swh.model.model.TargetType)[source]

Bases: swh.model.model.BaseModel

Represents one of the branches of a snapshot.

object_type: typing_extensions.Final = 'snapshot_branch'
check_target(attribute, value)[source]

Checks the target type is not an alias, checks the target is a valid sha1_git.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Snapshot(branches, id: bytes = b'')[source]

Bases: swh.model.model.BaseModel, swh.model.model.HashableObject

Represents the full state of an origin at a given point in time.

object_type: typing_extensions.Final = 'snapshot'
static compute_hash(object_dict)[source]

Derived model classes must implement this to compute the object hash from its dict representation.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Release(name: bytes, message: Optional[bytes], target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata=None, id: bytes = b'')[source]

Bases: swh.model.model.BaseModel, swh.model.model.HashableObject

object_type: typing_extensions.Final = 'release'
static compute_hash(object_dict)[source]

Derived model classes must implement this to compute the object hash from its dict representation.

check_author(attribute, value)[source]

If the author is None, checks the date is None too.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize()swh.model.model.Release[source]

Returns an anonymized version of the Release object.

Anonymization consists in replacing the author with an anonymized Person object.

class swh.model.model.RevisionType(value)[source]

Bases: enum.Enum

An enumeration.

GIT = 'git'
TAR = 'tar'
DSC = 'dsc'
SUBVERSION = 'svn'
MERCURIAL = 'hg'
swh.model.model.tuplify_extra_headers(value: Iterable) → Tuple[source]
class swh.model.model.Revision(message: Optional[bytes], author: swh.model.model.Person, committer: swh.model.model.Person, date: Optional[swh.model.model.TimestampWithTimezone], committer_date: Optional[swh.model.model.TimestampWithTimezone], type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata=None, parents: Tuple[bytes, …] = (), id: bytes = b'', extra_headers=())[source]

Bases: swh.model.model.BaseModel, swh.model.model.HashableObject

object_type: typing_extensions.Final = 'revision'
static compute_hash(object_dict)[source]

Derived model classes must implement this to compute the object hash from its dict representation.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize()swh.model.model.Revision[source]

Returns an anonymized version of the Revision object.

Anonymization consists in replacing the author and committer with an anonymized Person object.

class swh.model.model.DirectoryEntry(name: bytes, type: str, target: bytes, perms: int)[source]

Bases: swh.model.model.BaseModel

object_type: typing_extensions.Final = 'directory_entry'
perms

Usually one of the values of swh.model.from_disk.DentryPerms.

class swh.model.model.Directory(entries: Tuple[swh.model.model.DirectoryEntry, …], id: bytes = b'')[source]

Bases: swh.model.model.BaseModel, swh.model.model.HashableObject

object_type: typing_extensions.Final = 'directory'
static compute_hash(object_dict)[source]

Derived model classes must implement this to compute the object hash from its dict representation.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.BaseContent(status: str)[source]

Bases: swh.model.model.BaseModel

classmethod from_dict(d, use_subclass=True)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

get_hash(hash_name)[source]
hashes() → Dict[str, bytes][source]

Returns a dictionary {hash_name: hash_value}

class swh.model.model.Content(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str = 'visible', data: Optional[bytes] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'content'
check_length(attribute, value)[source]

Checks the length is positive.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data, status='visible', ctime=None)swh.model.model.Content[source]

Generate a Content from a given data byte string.

This populates the Content with the hashes and length for the data passed as argument, as well as the data itself.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

with_data()swh.model.model.Content[source]

Loads the data attribute; meaning that it is guaranteed not to be None after this call.

This call is almost a no-op, but subclasses may overload this method to lazy-load data (eg. from disk or objstorage).

class swh.model.model.SkippedContent(sha1: Optional[bytes], sha1_git: Optional[bytes], sha256: Optional[bytes], blake2s256: Optional[bytes], length: Optional[int], status: str, reason: Optional[str] = None, origin: Optional[str] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'skipped_content'
check_reason(attribute, value)[source]

Checks the reason is full if status != absent.

check_length(attribute, value)[source]

Checks the length is positive or -1.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data: bytes, reason: str, ctime: Optional[datetime.datetime] = None)swh.model.model.SkippedContent[source]

Generate a SkippedContent from a given data byte string.

This populates the SkippedContent with the hashes and length for the data passed as argument.

You can use attr.evolve on such a generated content to nullify some of its attributes, e.g. for tests.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.MetadataAuthorityType(value)[source]

Bases: enum.Enum

An enumeration.

DEPOSIT = 'deposit'
FORGE = 'forge'
REGISTRY = 'registry'
class swh.model.model.MetadataAuthority(type: swh.model.model.MetadataAuthorityType, url: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents an entity that provides metadata about an origin or software artifact.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.MetadataFetcher(name: str, version: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a software component used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

class swh.model.model.MetadataTargetType(value)[source]

Bases: enum.Enum

The type of object extrinsic metadata refer to.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ORIGIN = 'origin'
class swh.model.model.RawExtrinsicMetadata(type: swh.model.model.MetadataTargetType, id: Union[str, swh.model.identifiers.SWHID], discovery_date: datetime.datetime, authority: swh.model.model.MetadataAuthority, fetcher: swh.model.model.MetadataFetcher, format: str, metadata: bytes, origin: Optional[str] = None, visit: Optional[int] = None, snapshot: Optional[swh.model.identifiers.SWHID] = None, release: Optional[swh.model.identifiers.SWHID] = None, revision: Optional[swh.model.identifiers.SWHID] = None, path: Optional[bytes] = None, directory: Optional[swh.model.identifiers.SWHID] = None)[source]

Bases: swh.model.model.BaseModel

id

URL if type=MetadataTargetType.ORIGIN, else core SWHID

check_id(attribute, value)[source]
check_origin(attribute, value)[source]
check_visit(attribute, value)[source]
check_snapshot(attribute, value)[source]
check_release(attribute, value)[source]
check_revision(attribute, value)[source]
check_path(attribute, value)[source]
check_directory(attribute, value)[source]
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.