swh.model.model module

exception swh.model.model.MissingData[source]

Bases: Exception

Raised by Content.with_data when it has no way of fetching the data (but not when fetching the data fails).

swh.model.model.KeyType

The type returned by BaseModel.unique_key().

alias of Union[Dict[str, str], Dict[str, bytes], bytes]

swh.model.model.freeze_optional_dict(d: Union[None, Dict[KT, VT], swh.model.collections.ImmutableDict[KT, VT]]) → Optional[swh.model.collections.ImmutableDict[KT, VT]][source]
swh.model.model.dictify(value)[source]

Helper function used by BaseModel.to_dict()

class swh.model.model.BaseModel[source]

Bases: object

Base class for SWH model classes.

Provides serialization/deserialization to/from Python dictionaries, that are suitable for JSON/msgpack-like formats.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize() → Optional[ModelType][source]

Returns an anonymized version of the object, if needed.

If the object model does not need/support anonymization, returns None.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.HashableObject[source]

Bases: object

Mixin to automatically compute object identifier hash when the associated model is instantiated.

abstract compute_hash() → bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]
class swh.model.model.Person(fullname: bytes, name: Optional[bytes], email: Optional[bytes])[source]

Bases: swh.model.model.BaseModel

Represents the author/committer of a revision or release.

object_type: typing_extensions.Final = 'person'
classmethod from_fullname(fullname: bytes)[source]

Returns a Person object, by guessing the name and email from the fullname, in the name <email> format.

The fullname is left unchanged.

anonymize()swh.model.model.Person[source]

Returns an anonymized version of the Person object.

Anonymization is simply a Person which fullname is the hashed, with unset name or email.

class swh.model.model.Timestamp(seconds: int, microseconds: int)[source]

Bases: swh.model.model.BaseModel

Represents a naive timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp'
check_seconds(attribute, value)[source]

Check that seconds fit in a 64-bits signed integer.

check_microseconds(attribute, value)[source]

Checks that microseconds are positive and < 1000000.

class swh.model.model.TimestampWithTimezone(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool)[source]

Bases: swh.model.model.BaseModel

Represents a TZ-aware timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp_with_timezone'
check_offset(attribute, value)[source]

Checks the offset is a 16-bits signed integer (in theory, it should always be between -14 and +14 hours).

check_negative_utc(attribute, value)[source]
classmethod from_dict(obj: Union[Dict, datetime.datetime, int])[source]

Builds a TimestampWithTimezone from any of the formats accepted by swh.model.normalize_timestamp().

classmethod from_datetime(dt: datetime.datetime)[source]
classmethod from_iso8601(s)[source]

Builds a TimestampWithTimezone from an ISO8601-formatted string.

class swh.model.model.Origin(url: str)[source]

Bases: swh.model.model.BaseModel

Represents a software source: a VCS and an URL.

object_type: typing_extensions.Final = 'origin'
unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.OriginVisit(origin: str, date: datetime.datetime, type: str, visit: Optional[int] = None)[source]

Bases: swh.model.model.BaseModel

Represents an origin visit with a given type at a given point in time, by a SWH loader.

object_type: typing_extensions.Final = 'origin_visit'
type

Should not be set before calling ‘origin_visit_add()’.

check_date(attribute, value)[source]

Checks the date has a timezone.

to_dict()[source]

Serializes the date as a string and omits the visit id if it is None.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.OriginVisitStatus(origin: str, visit: int, date: datetime.datetime, status: str, snapshot: Optional[bytes], metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a visit update of an origin at a given point in time.

object_type: typing_extensions.Final = 'origin_visit_status'
check_date(attribute, value)[source]

Checks the date has a timezone.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.TargetType(value)[source]

Bases: enum.Enum

The type of content pointed to by a snapshot branch. Usually a revision or an alias.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ALIAS = 'alias'
class swh.model.model.ObjectType(value)[source]

Bases: enum.Enum

The type of content pointed to by a release. Usually a revision

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
class swh.model.model.SnapshotBranch(target: bytes, target_type: swh.model.model.TargetType)[source]

Bases: swh.model.model.BaseModel

Represents one of the branches of a snapshot.

object_type: typing_extensions.Final = 'snapshot_branch'
check_target(attribute, value)[source]

Checks the target type is not an alias, checks the target is a valid sha1_git.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Snapshot(branches, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Represents the full state of an origin at a given point in time.

object_type: typing_extensions.Final = 'snapshot'
compute_hash() → bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Release(name: bytes, message: Optional[bytes], target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata=None, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'release'
compute_hash() → bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

check_author(attribute, value)[source]

If the author is None, checks the date is None too.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize()swh.model.model.Release[source]

Returns an anonymized version of the Release object.

Anonymization consists in replacing the author with an anonymized Person object.

class swh.model.model.RevisionType(value)[source]

Bases: enum.Enum

An enumeration.

GIT = 'git'
TAR = 'tar'
DSC = 'dsc'
SUBVERSION = 'svn'
MERCURIAL = 'hg'
swh.model.model.tuplify_extra_headers(value: Iterable)[source]
class swh.model.model.Revision(message: Optional[bytes], author: swh.model.model.Person, committer: swh.model.model.Person, date: Optional[swh.model.model.TimestampWithTimezone], committer_date: Optional[swh.model.model.TimestampWithTimezone], type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata=None, parents: Tuple[bytes, ] = (), id: bytes = b'', extra_headers=())[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'revision'
compute_hash() → bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize()swh.model.model.Revision[source]

Returns an anonymized version of the Revision object.

Anonymization consists in replacing the author and committer with an anonymized Person object.

class swh.model.model.DirectoryEntry(name: bytes, type: str, target: bytes, perms: int)[source]

Bases: swh.model.model.BaseModel

object_type: typing_extensions.Final = 'directory_entry'
perms

Usually one of the values of swh.model.from_disk.DentryPerms.

class swh.model.model.Directory(entries: Tuple[swh.model.model.DirectoryEntry, ], id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'directory'
compute_hash() → bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.BaseContent(status: str)[source]

Bases: swh.model.model.BaseModel

classmethod from_dict(d, use_subclass=True)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

get_hash(hash_name)[source]
hashes() → Dict[str, bytes][source]

Returns a dictionary {hash_name: hash_value}

class swh.model.model.Content(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str = 'visible', data: Optional[bytes] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'content'
check_length(attribute, value)[source]

Checks the length is positive.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data, status='visible', ctime=None)swh.model.model.Content[source]

Generate a Content from a given data byte string.

This populates the Content with the hashes and length for the data passed as argument, as well as the data itself.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

with_data()swh.model.model.Content[source]

Loads the data attribute; meaning that it is guaranteed not to be None after this call.

This call is almost a no-op, but subclasses may overload this method to lazy-load data (eg. from disk or objstorage).

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.SkippedContent(sha1: Optional[bytes], sha1_git: Optional[bytes], sha256: Optional[bytes], blake2s256: Optional[bytes], length: Optional[int], status: str, reason: Optional[str] = None, origin: Optional[str] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'skipped_content'
check_reason(attribute, value)[source]

Checks the reason is full if status != absent.

check_length(attribute, value)[source]

Checks the length is positive or -1.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data: bytes, reason: str, ctime: Optional[datetime.datetime] = None)swh.model.model.SkippedContent[source]

Generate a SkippedContent from a given data byte string.

This populates the SkippedContent with the hashes and length for the data passed as argument.

You can use attr.evolve on such a generated content to nullify some of its attributes, e.g. for tests.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataAuthorityType(value)[source]

Bases: enum.Enum

An enumeration.

DEPOSIT_CLIENT = 'deposit_client'
FORGE = 'forge'
REGISTRY = 'registry'
class swh.model.model.MetadataAuthority(type: swh.model.model.MetadataAuthorityType, url: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents an entity that provides metadata about an origin or software artifact.

object_type: typing_extensions.Final = 'metadata_authority'
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataFetcher(name: str, version: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a software component used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.

object_type: typing_extensions.Final = 'metadata_fetcher'
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataTargetType(value)[source]

Bases: enum.Enum

The type of object extrinsic metadata refer to.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ORIGIN = 'origin'
class swh.model.model.RawExtrinsicMetadata(type: swh.model.model.MetadataTargetType, target: Union[str, swh.model.identifiers.SWHID], discovery_date: datetime.datetime, authority: swh.model.model.MetadataAuthority, fetcher: swh.model.model.MetadataFetcher, format: str, metadata: bytes, origin: Optional[str] = None, visit: Optional[int] = None, snapshot: Optional[swh.model.identifiers.SWHID] = None, release: Optional[swh.model.identifiers.SWHID] = None, revision: Optional[swh.model.identifiers.SWHID] = None, path: Optional[bytes] = None, directory: Optional[swh.model.identifiers.SWHID] = None)[source]

Bases: swh.model.model.BaseModel

object_type: typing_extensions.Final = 'raw_extrinsic_metadata'
target

URL if type=MetadataTargetType.ORIGIN, else core SWHID

check_target(attribute, value)[source]
check_discovery_date(attribute, value)[source]

Checks the discovery_date has a timezone.

check_origin(attribute, value)[source]
check_visit(attribute, value)[source]
check_snapshot(attribute, value)[source]
check_release(attribute, value)[source]
check_revision(attribute, value)[source]
check_path(attribute, value)[source]
check_directory(attribute, value)[source]
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key() → Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.