swh.model.model module

exception swh.model.model.MissingData[source]

Bases: Exception

Raised by Content.with_data when it has no way of fetching the data (but not when fetching the data fails).

swh.model.model.KeyType

The type returned by BaseModel.unique_key().

alias of Union[Dict[str, str], Dict[str, bytes], bytes]

swh.model.model.freeze_optional_dict(d: Union[None, Dict[KT, VT], swh.model.collections.ImmutableDict[KT, VT]])Optional[swh.model.collections.ImmutableDict[KT, VT]][source]
swh.model.model.dictify(value)[source]

Helper function used by BaseModel.to_dict()

class swh.model.model.BaseModel[source]

Bases: object

Base class for SWH model classes.

Provides serialization/deserialization to/from Python dictionaries, that are suitable for JSON/msgpack-like formats.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize()Optional[ModelType][source]

Returns an anonymized version of the object, if needed.

If the object model does not need/support anonymization, returns None.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.HashableObject[source]

Bases: object

Mixin to automatically compute object identifier hash when the associated model is instantiated.

abstract compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]
class swh.model.model.Person(fullname: bytes, name: Optional[bytes], email: Optional[bytes])[source]

Bases: swh.model.model.BaseModel

Represents the author/committer of a revision or release.

object_type: typing_extensions.Final = 'person'
fullname
name
email
classmethod from_fullname(fullname: bytes)[source]

Returns a Person object, by guessing the name and email from the fullname, in the name <email> format.

The fullname is left unchanged.

anonymize()swh.model.model.Person[source]

Returns an anonymized version of the Person object.

Anonymization is simply a Person which fullname is the hashed, with unset name or email.

class swh.model.model.Timestamp(seconds: int, microseconds: int)[source]

Bases: swh.model.model.BaseModel

Represents a naive timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp'
seconds
microseconds
check_seconds(attribute, value)[source]

Check that seconds fit in a 64-bits signed integer.

check_microseconds(attribute, value)[source]

Checks that microseconds are positive and < 1000000.

class swh.model.model.TimestampWithTimezone(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool)[source]

Bases: swh.model.model.BaseModel

Represents a TZ-aware timestamp from a VCS.

object_type: typing_extensions.Final = 'timestamp_with_timezone'
timestamp
offset
negative_utc
check_offset(attribute, value)[source]

Checks the offset is a 16-bits signed integer (in theory, it should always be between -14 and +14 hours).

check_negative_utc(attribute, value)[source]
classmethod from_dict(obj: Union[Dict, datetime.datetime, int])[source]

Builds a TimestampWithTimezone from any of the formats accepted by swh.model.normalize_timestamp().

classmethod from_datetime(dt: datetime.datetime)[source]
classmethod from_iso8601(s)[source]

Builds a TimestampWithTimezone from an ISO8601-formatted string.

class swh.model.model.Origin(url: str)[source]

Bases: swh.model.model.BaseModel

Represents a software source: a VCS and an URL.

object_type: typing_extensions.Final = 'origin'
url
unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

swhid()swh.model.identifiers.ExtendedSWHID[source]

Returns a SWHID representing this origin.

class swh.model.model.OriginVisit(origin: str, date: datetime.datetime, type: str, visit: Optional[int] = None)[source]

Bases: swh.model.model.BaseModel

Represents an origin visit with a given type at a given point in time, by a SWH loader.

object_type: typing_extensions.Final = 'origin_visit'
origin
date
type

Should not be set before calling ‘origin_visit_add()’.

visit
check_date(attribute, value)[source]

Checks the date has a timezone.

to_dict()[source]

Serializes the date as a string and omits the visit id if it is None.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.OriginVisitStatus(origin: str, visit: int, date: datetime.datetime, status: str, snapshot: Optional[bytes], type: Optional[str] = None, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a visit update of an origin at a given point in time.

object_type: typing_extensions.Final = 'origin_visit_status'
origin
visit
date
status
snapshot
type
metadata
check_date(attribute, value)[source]

Checks the date has a timezone.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.TargetType(value)[source]

Bases: enum.Enum

The type of content pointed to by a snapshot branch. Usually a revision or an alias.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ALIAS = 'alias'
class swh.model.model.ObjectType(value)[source]

Bases: enum.Enum

The type of content pointed to by a release. Usually a revision

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
class swh.model.model.SnapshotBranch(target: bytes, target_type: swh.model.model.TargetType)[source]

Bases: swh.model.model.BaseModel

Represents one of the branches of a snapshot.

object_type: typing_extensions.Final = 'snapshot_branch'
target
target_type
check_target(attribute, value)[source]

Checks the target type is not an alias, checks the target is a valid sha1_git.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Snapshot(branches, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Represents the full state of an origin at a given point in time.

object_type: typing_extensions.Final = 'snapshot'
branches
id
compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid()swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.Release(name: bytes, message: Optional[bytes], target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata=None, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'release'
name
message
target
target_type
synthetic
author
date
metadata
id
compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

check_author(attribute, value)[source]

If the author is None, checks the date is None too.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid()swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

anonymize()swh.model.model.Release[source]

Returns an anonymized version of the Release object.

Anonymization consists in replacing the author with an anonymized Person object.

class swh.model.model.RevisionType(value)[source]

Bases: enum.Enum

An enumeration.

GIT = 'git'
TAR = 'tar'
DSC = 'dsc'
SUBVERSION = 'svn'
MERCURIAL = 'hg'
swh.model.model.tuplify_extra_headers(value: Iterable)[source]
class swh.model.model.Revision(message: Optional[bytes], author: swh.model.model.Person, committer: swh.model.model.Person, date: Optional[swh.model.model.TimestampWithTimezone], committer_date: Optional[swh.model.model.TimestampWithTimezone], type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata=None, parents: Tuple[bytes, ] = (), id: bytes = b'', extra_headers=())[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'revision'
message
author
committer
date
committer_date
type
directory
synthetic
metadata
parents
id
extra_headers
compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid()swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

anonymize()swh.model.model.Revision[source]

Returns an anonymized version of the Revision object.

Anonymization consists in replacing the author and committer with an anonymized Person object.

class swh.model.model.DirectoryEntry(name: bytes, type: str, target: bytes, perms: int)[source]

Bases: swh.model.model.BaseModel

object_type: typing_extensions.Final = 'directory_entry'
name
type
target
perms

Usually one of the values of swh.model.from_disk.DentryPerms.

class swh.model.model.Directory(entries: Tuple[swh.model.model.DirectoryEntry, ], id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'directory'
entries
id
compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid()swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.BaseContent(status: str)[source]

Bases: swh.model.model.BaseModel

status
classmethod from_dict(d, use_subclass=True)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

get_hash(hash_name)[source]
hashes()Dict[str, bytes][source]

Returns a dictionary {hash_name: hash_value}

class swh.model.model.Content(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str = 'visible', data: Optional[bytes] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'content'
sha1
sha1_git
sha256
blake2s256
length
status
data
ctime
check_length(attribute, value)[source]

Checks the length is positive.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data, status='visible', ctime=None)swh.model.model.Content[source]

Generate a Content from a given data byte string.

This populates the Content with the hashes and length for the data passed as argument, as well as the data itself.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

with_data()swh.model.model.Content[source]

Loads the data attribute; meaning that it is guaranteed not to be None after this call.

This call is almost a no-op, but subclasses may overload this method to lazy-load data (eg. from disk or objstorage).

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

swhid()swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.SkippedContent(sha1: Optional[bytes], sha1_git: Optional[bytes], sha256: Optional[bytes], blake2s256: Optional[bytes], length: Optional[int], status: str, reason: Optional[str] = None, origin: Optional[str] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

object_type: typing_extensions.Final = 'skipped_content'
sha1
sha1_git
sha256
blake2s256
length
status
reason
origin
ctime
check_reason(attribute, value)[source]

Checks the reason is full if status != absent.

check_length(attribute, value)[source]

Checks the length is positive or -1.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data: bytes, reason: str, ctime: Optional[datetime.datetime] = None)swh.model.model.SkippedContent[source]

Generate a SkippedContent from a given data byte string.

This populates the SkippedContent with the hashes and length for the data passed as argument.

You can use attr.evolve on such a generated content to nullify some of its attributes, e.g. for tests.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataAuthorityType(value)[source]

Bases: enum.Enum

An enumeration.

DEPOSIT_CLIENT = 'deposit_client'
FORGE = 'forge'
REGISTRY = 'registry'
class swh.model.model.MetadataAuthority(type: swh.model.model.MetadataAuthorityType, url: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents an entity that provides metadata about an origin or software artifact.

object_type: typing_extensions.Final = 'metadata_authority'
type
url
metadata
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataFetcher(name: str, version: str, metadata=None)[source]

Bases: swh.model.model.BaseModel

Represents a software component used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.

object_type: typing_extensions.Final = 'metadata_fetcher'
name
version
metadata
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

unique_key()Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.RawExtrinsicMetadata(target: swh.model.identifiers.ExtendedSWHID, discovery_date: datetime.datetime, authority: swh.model.model.MetadataAuthority, fetcher: swh.model.model.MetadataFetcher, format: str, metadata: bytes, origin: Optional[str] = None, visit: Optional[int] = None, snapshot: Optional[swh.model.identifiers.CoreSWHID] = None, release: Optional[swh.model.identifiers.CoreSWHID] = None, revision: Optional[swh.model.identifiers.CoreSWHID] = None, path: Optional[bytes] = None, directory: Optional[swh.model.identifiers.CoreSWHID] = None, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

object_type: typing_extensions.Final = 'raw_extrinsic_metadata'
target
discovery_date
authority
fetcher
format
metadata
origin
visit
snapshot
release
revision
path
directory
id
compute_hash()bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

check_discovery_date(attribute, value)[source]

Checks the discovery_date has a timezone.

check_origin(attribute, value)[source]
check_visit(attribute, value)[source]
check_snapshot(attribute, value)[source]
check_release(attribute, value)[source]
check_revision(attribute, value)[source]
check_path(attribute, value)[source]
check_directory(attribute, value)[source]
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.