swh.model.model module

exception swh.model.model.MissingData[source]

Bases: Exception

Raised by Content.with_data when it has no way of fetching the data (but not when fetching the data fails).

swh.model.model.KeyType

The type returned by BaseModel.unique_key().

alias of Union[Dict[str, str], Dict[str, bytes], bytes]

swh.model.model.freeze_optional_dict(d: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]]) Optional[swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]][source]
swh.model.model.dictify(value)[source]

Helper function used by BaseModel.to_dict()

class swh.model.model.BaseModel[source]

Bases: object

Base class for SWH model classes.

Provides serialization/deserialization to/from Python dictionaries, that are suitable for JSON/msgpack-like formats.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

anonymize() Optional[swh.model.model.ModelType][source]

Returns an anonymized version of the object, if needed.

If the object model does not need/support anonymization, returns None.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.HashableObject[source]

Bases: object

Mixin to automatically compute object identifier hash when the associated model is instantiated.

abstract compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]
class swh.model.model.Person(fullname: bytes, name: Optional[bytes], email: Optional[bytes])[source]

Bases: swh.model.model.BaseModel

Represents the author/committer of a revision or release.

Method generated by attrs for class Person.

object_type: typing_extensions.Final = 'person'
fullname
name
email
classmethod from_fullname(fullname: bytes)[source]

Returns a Person object, by guessing the name and email from the fullname, in the name <email> format.

The fullname is left unchanged.

anonymize() swh.model.model.Person[source]

Returns an anonymized version of the Person object.

Anonymization is simply a Person which fullname is the hashed, with unset name or email.

class swh.model.model.Timestamp(seconds: int, microseconds: int)[source]

Bases: swh.model.model.BaseModel

Represents a naive timestamp from a VCS.

Method generated by attrs for class Timestamp.

object_type: typing_extensions.Final = 'timestamp'
seconds
microseconds
check_seconds(attribute, value)[source]

Check that seconds fit in a 64-bits signed integer.

check_microseconds(attribute, value)[source]

Checks that microseconds are positive and < 1000000.

class swh.model.model.TimestampWithTimezone(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool)[source]

Bases: swh.model.model.BaseModel

Represents a TZ-aware timestamp from a VCS.

Method generated by attrs for class TimestampWithTimezone.

object_type: typing_extensions.Final = 'timestamp_with_timezone'
timestamp
offset
negative_utc
check_offset(attribute, value)[source]

Checks the offset is a 16-bits signed integer (in theory, it should always be between -14 and +14 hours).

check_negative_utc(attribute, value)[source]
classmethod from_dict(obj: Union[Dict, datetime.datetime, int])[source]

Builds a TimestampWithTimezone from any of the formats accepted by swh.model.normalize_timestamp().

classmethod from_datetime(dt: datetime.datetime)[source]
to_datetime() datetime.datetime[source]

Convert to a datetime (with a timezone set to the recorded fixed UTC offset)

Beware that this conversion can be lossy: the negative_utc flag is not taken into consideration (since it cannot be represented in a datetime). Also note that it may fail due to type overflow.

classmethod from_iso8601(s)[source]

Builds a TimestampWithTimezone from an ISO8601-formatted string.

class swh.model.model.Origin(url: str)[source]

Bases: swh.model.model.BaseModel

Represents a software source: a VCS and an URL.

Method generated by attrs for class Origin.

object_type: typing_extensions.Final = 'origin'
url
unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

swhid() swh.model.identifiers.ExtendedSWHID[source]

Returns a SWHID representing this origin.

class swh.model.model.OriginVisit(origin: str, date: datetime.datetime, type: str, visit: Optional[int] = None)[source]

Bases: swh.model.model.BaseModel

Represents an origin visit with a given type at a given point in time, by a SWH loader.

Method generated by attrs for class OriginVisit.

object_type: typing_extensions.Final = 'origin_visit'
origin
date
type

Should not be set before calling ‘origin_visit_add()’.

visit
check_date(attribute, value)[source]

Checks the date has a timezone.

to_dict()[source]

Serializes the date as a string and omits the visit id if it is None.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.OriginVisitStatus(origin: str, visit: int, date: datetime.datetime, status: str, snapshot: Optional[bytes], type: Optional[str] = None, metadata: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]] = None)[source]

Bases: swh.model.model.BaseModel

Represents a visit update of an origin at a given point in time.

Method generated by attrs for class OriginVisitStatus.

object_type: typing_extensions.Final = 'origin_visit_status'
origin
visit
date
status
snapshot
type
metadata
check_date(attribute, value)[source]

Checks the date has a timezone.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.TargetType(value)[source]

Bases: enum.Enum

The type of content pointed to by a snapshot branch. Usually a revision or an alias.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ALIAS = 'alias'
class swh.model.model.ObjectType(value)[source]

Bases: enum.Enum

The type of content pointed to by a release. Usually a revision

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
class swh.model.model.SnapshotBranch(target: bytes, target_type: swh.model.model.TargetType)[source]

Bases: swh.model.model.BaseModel

Represents one of the branches of a snapshot.

Method generated by attrs for class SnapshotBranch.

object_type: typing_extensions.Final = 'snapshot_branch'
target
target_type
check_target(attribute, value)[source]

Checks the target type is not an alias, checks the target is a valid sha1_git.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

class swh.model.model.Snapshot(branches: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]], id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Represents the full state of an origin at a given point in time.

Method generated by attrs for class Snapshot.

object_type: typing_extensions.Final = 'snapshot'
branches
id
compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid() swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.Release(name: bytes, message: Optional[bytes], target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]] = None, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Method generated by attrs for class Release.

object_type: typing_extensions.Final = 'release'
name
message
target
target_type
synthetic
author
date
metadata
id
compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

check_author(attribute, value)[source]

If the author is None, checks the date is None too.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid() swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

anonymize() swh.model.model.Release[source]

Returns an anonymized version of the Release object.

Anonymization consists in replacing the author with an anonymized Person object.

class swh.model.model.RevisionType(value)[source]

Bases: enum.Enum

An enumeration.

GIT = 'git'
TAR = 'tar'
DSC = 'dsc'
SUBVERSION = 'svn'
MERCURIAL = 'hg'
CVS = 'cvs'
swh.model.model.tuplify_extra_headers(value: Iterable)[source]
class swh.model.model.Revision(message: Optional[bytes], author: swh.model.model.Person, committer: swh.model.model.Person, date: Optional[swh.model.model.TimestampWithTimezone], committer_date: Optional[swh.model.model.TimestampWithTimezone], type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]] = None, parents: Tuple[bytes, ...] = (), id: bytes = b'', extra_headers: Iterable = ())[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Method generated by attrs for class Revision.

object_type: typing_extensions.Final = 'revision'
message
author
committer
date
committer_date
type
directory
synthetic
metadata
parents
id
extra_headers
compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid() swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

anonymize() swh.model.model.Revision[source]

Returns an anonymized version of the Revision object.

Anonymization consists in replacing the author and committer with an anonymized Person object.

class swh.model.model.DirectoryEntry(name: bytes, type: str, target: bytes, perms: int)[source]

Bases: swh.model.model.BaseModel

Method generated by attrs for class DirectoryEntry.

object_type: typing_extensions.Final = 'directory_entry'
name
type
target
perms

Usually one of the values of swh.model.from_disk.DentryPerms.

class swh.model.model.Directory(entries: Tuple[swh.model.model.DirectoryEntry, ...], id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Method generated by attrs for class Directory.

object_type: typing_extensions.Final = 'directory'
entries
id
compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid() swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.BaseContent(status: str)[source]

Bases: swh.model.model.BaseModel

Method generated by attrs for class BaseContent.

status
classmethod from_dict(d, use_subclass=True)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

get_hash(hash_name)[source]
hashes() Dict[str, bytes][source]

Returns a dictionary {hash_name: hash_value}

class swh.model.model.Content(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str = 'visible', data: Optional[bytes] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

Method generated by attrs for class Content.

object_type: typing_extensions.Final = 'content'
sha1
sha1_git
sha256
blake2s256
length
status
data
ctime
check_length(attribute, value)[source]

Checks the length is positive.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data, status='visible', ctime=None) swh.model.model.Content[source]

Generate a Content from a given data byte string.

This populates the Content with the hashes and length for the data passed as argument, as well as the data itself.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

with_data() swh.model.model.Content[source]

Loads the data attribute; meaning that it is guaranteed not to be None after this call.

This call is almost a no-op, but subclasses may overload this method to lazy-load data (eg. from disk or objstorage).

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

swhid() swh.model.identifiers.CoreSWHID[source]

Returns a SWHID representing this object.

class swh.model.model.SkippedContent(sha1: Optional[bytes], sha1_git: Optional[bytes], sha256: Optional[bytes], blake2s256: Optional[bytes], length: Optional[int], status: str, reason: Optional[str] = None, origin: Optional[str] = None, ctime: Optional[datetime.datetime] = None)[source]

Bases: swh.model.model.BaseContent

Method generated by attrs for class SkippedContent.

object_type: typing_extensions.Final = 'skipped_content'
sha1
sha1_git
sha256
blake2s256
length
status
reason
origin
ctime
check_reason(attribute, value)[source]

Checks the reason is full if status != absent.

check_length(attribute, value)[source]

Checks the length is positive or -1.

check_ctime(attribute, value)[source]

Checks the ctime has a timezone.

to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_data(data: bytes, reason: str, ctime: Optional[datetime.datetime] = None) swh.model.model.SkippedContent[source]

Generate a SkippedContent from a given data byte string.

This populates the SkippedContent with the hashes and length for the data passed as argument.

You can use attr.evolve on such a generated content to nullify some of its attributes, e.g. for tests.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataAuthorityType(value)[source]

Bases: enum.Enum

An enumeration.

DEPOSIT_CLIENT = 'deposit_client'
FORGE = 'forge'
REGISTRY = 'registry'
class swh.model.model.MetadataAuthority(type: swh.model.model.MetadataAuthorityType, url: str, metadata: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]] = None)[source]

Bases: swh.model.model.BaseModel

Represents an entity that provides metadata about an origin or software artifact.

Method generated by attrs for class MetadataAuthority.

object_type: typing_extensions.Final = 'metadata_authority'
type
url
metadata
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

class swh.model.model.MetadataFetcher(name: str, version: str, metadata: Union[None, Dict[swh.model.model.KT, swh.model.model.VT], swh.model.collections.ImmutableDict[swh.model.model.KT, swh.model.model.VT]] = None)[source]

Bases: swh.model.model.BaseModel

Represents a software component used to fetch metadata from a metadata authority, and ingest them into the Software Heritage archive.

Method generated by attrs for class MetadataFetcher.

object_type: typing_extensions.Final = 'metadata_fetcher'
name
version
metadata
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

unique_key() Union[Dict[str, str], Dict[str, bytes], bytes][source]

Returns a unique key for this object, that can be used for deduplication.

swh.model.model.normalize_discovery_date(value: Any) datetime.datetime[source]
class swh.model.model.RawExtrinsicMetadata(target: swh.model.identifiers.ExtendedSWHID, discovery_date: Any, authority: swh.model.model.MetadataAuthority, fetcher: swh.model.model.MetadataFetcher, format: str, metadata: bytes, origin: Optional[str] = None, visit: Optional[int] = None, snapshot: Optional[swh.model.identifiers.CoreSWHID] = None, release: Optional[swh.model.identifiers.CoreSWHID] = None, revision: Optional[swh.model.identifiers.CoreSWHID] = None, path: Optional[bytes] = None, directory: Optional[swh.model.identifiers.CoreSWHID] = None, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Method generated by attrs for class RawExtrinsicMetadata.

object_type: typing_extensions.Final = 'raw_extrinsic_metadata'
target
discovery_date
authority
fetcher
format
metadata
origin
visit
snapshot
release
revision
path
directory
id
compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.

check_origin(attribute, value)[source]
check_visit(attribute, value)[source]
check_snapshot(attribute, value)[source]
check_release(attribute, value)[source]
check_revision(attribute, value)[source]
check_path(attribute, value)[source]
check_directory(attribute, value)[source]
to_dict()[source]

Wrapper of attr.asdict that can be overridden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

swhid() swh.model.identifiers.ExtendedSWHID[source]

Returns a SWHID representing this RawExtrinsicMetadata object.

class swh.model.model.ExtID(extid_type: str, extid: bytes, target: swh.model.identifiers.CoreSWHID, extid_version: int = 0, id: bytes = b'')[source]

Bases: swh.model.model.HashableObject, swh.model.model.BaseModel

Method generated by attrs for class ExtID.

object_type: typing_extensions.Final = 'extid'
extid_type
extid
target
extid_version
id
classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

compute_hash() bytes[source]

Derived model classes must implement this to compute the object hash.

This method is called by the object initialization if the id attribute is set to an empty value.