swh.model package

Submodules

swh.model.cli module

class swh.model.cli.PidParamType[source]

Bases: click.types.ParamType

name = 'persistent identifier'
convert(value, param, ctx)[source]

Converts the value. This is not invoked for values that are None (the missing value).

__module__ = 'swh.model.cli'
swh.model.cli.pid_of_file(path)[source]
swh.model.cli.pid_of_dir(path)[source]
swh.model.cli.identify_object(obj_type, follow_symlinks, obj)[source]

swh.model.exceptions module

exception swh.model.exceptions.ValidationError(message, code=None, params=None)[source]

Bases: Exception

An error while validating data.

__init__(message, code=None, params=None)[source]

The message argument can be a single error, a list of errors, or a dictionary that maps field names to lists of errors. What we define as an “error” can be either a simple string or an instance of ValidationError with its message attribute set, and what we define as list or dictionary can be an actual list or dict or an instance of ValidationError with its error_list or error_dict attribute set.

message_dict
messages
update_error_dict(error_dict)[source]
__iter__()[source]
__str__()[source]

Return str(self).

__module__ = 'swh.model.exceptions'
__repr__()[source]

Return repr(self).

__weakref__

list of weak references to the object (if defined)

swh.model.from_disk module

class swh.model.from_disk.DentryPerms[source]

Bases: enum.IntEnum

Admissible permissions for directory entries.

content = 33188

Content

executable_content = 33261

Executable content (e.g. executable script)

Symbolic link

directory = 16384

Directory

revision = 57344

Revision (e.g. submodule)

__module__ = 'swh.model.from_disk'
swh.model.from_disk.mode_to_perms(mode)[source]

Convert a file mode to a permission compatible with Software Heritage directory entries

Parameters:mode (int) – a file mode as returned by os.stat() in os.stat_result.st_mode
Returns:
one of the following values:
DentryPerms.content: plain file DentryPerms.executable_content: executable file DentryPerms.symlink: symbolic link DentryPerms.directory: directory
Return type:DentryPerms
class swh.model.from_disk.Content(data=None)[source]

Bases: swh.model.merkle.MerkleLeaf

Representation of a Software Heritage content as a node in a Merkle tree.

The current Merkle hash for the Content nodes is the sha1_git, which makes it consistent with what Directory uses for its own hash computation.

__slots__ = []
type = 'content'
classmethod from_bytes(*, mode, data)[source]

Convert data (raw bytes) to a Software Heritage content entry

Parameters:
  • mode (int) – a file mode (passed to mode_to_perms())
  • data (bytes) – raw contents of the file

Convert a symbolic link to a Software Heritage content entry

classmethod from_file(*, path, data=False, save_path=False)[source]

Compute the Software Heritage content entry corresponding to an on-disk file.

The returned dictionary contains keys useful for both: - loading the content in the archive (hashes, length) - using the content as a directory entry in a directory

Parameters:
  • path (bytes) – path to the file for which we’re computing the content entry
  • data (bool) – add the file data to the entry
  • save_path (bool) – add the file path to the entry
__repr__()[source]

Return repr(self).

compute_hash()[source]

Compute the hash of the current node.

The hash should depend on the data of the node, as well as on hashes of the children nodes.

__abstractmethods__ = frozenset()
__module__ = 'swh.model.from_disk'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 111
_abc_registry = <_weakrefset.WeakSet object>
swh.model.from_disk.accept_all_directories(dirname, entries)[source]

Default filter for Directory.from_disk() accepting all directories

Parameters:
  • dirname (bytes) – directory name
  • entries (list) – directory entries
swh.model.from_disk.ignore_empty_directories(dirname, entries)[source]

Filter for directory_to_objects() ignoring empty directories

Parameters:
  • dirname (bytes) – directory name
  • entries (list) – directory entries
Returns:

True if the directory is not empty, false if the directory is empty

swh.model.from_disk.ignore_named_directories(names, *, case_sensitive=True)[source]

Filter for directory_to_objects() to ignore directories named one of names.

Parameters:
  • names (list of bytes) – names to ignore
  • case_sensitive (bool) – whether to do the filtering in a case sensitive way
Returns:

a directory filter for directory_to_objects()

class swh.model.from_disk.Directory(data=None)[source]

Bases: swh.model.merkle.MerkleNode

Representation of a Software Heritage directory as a node in a Merkle Tree.

This class can be used to generate, from an on-disk directory, all the objects that need to be sent to the Software Heritage archive.

The from_disk() constructor allows you to generate the data structure from a directory on disk. The resulting Directory can then be manipulated as a dictionary, using the path as key.

The collect() method is used to retrieve all the objects that need to be added to the Software Heritage archive since the last collection, by class (contents and directories).

When using the dict-like methods to update the contents of the directory, the affected levels of hierarchy are reset and can be collected again using the same method. This enables the efficient collection of updated nodes, for instance when the client is applying diffs.

__slots__ = ['__entries']
type = 'directory'
classmethod from_disk(*, path, data=False, save_path=False, dir_filter=<function accept_all_directories>)[source]

Compute the Software Heritage objects for a given directory tree

Parameters:
  • path (bytes) – the directory to traverse
  • data (bool) – whether to add the data to the content objects
  • save_path (bool) – whether to add the path to the content objects
  • dir_filter (function) – a filter to ignore some directories by name or contents. Takes two arguments: dirname and entries, and returns True if the directory should be added, False if the directory should be ignored.
__init__(data=None)[source]

Initialize self. See help(type(self)) for accurate signature.

invalidate_hash()[source]

Invalidate the cached hash of the current node.

static child_to_directory_entry(name, child)[source]
get_data(**kwargs)[source]

Retrieve and format the collected data for the current node, for use by collect().

Can be overridden, for instance when you want the collected data to contain information about the child nodes.

Parameters:kwargs – allow subclasses to alter behaviour depending on how collect() is called.
Returns:data formatted for collect()
entries
_Directory__entries
__abstractmethods__ = frozenset()
__module__ = 'swh.model.from_disk'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 111
_abc_registry = <_weakrefset.WeakSet object>
compute_hash()[source]

Compute the hash of the current node.

The hash should depend on the data of the node, as well as on hashes of the children nodes.

__getitem__(key)[source]

x.__getitem__(y) <==> x[y]

__setitem__(key, value)[source]

Add a child, invalidating the current hash

__delitem__(key)[source]

Remove a child, invalidating the current hash

__repr__()[source]

Return repr(self).

swh.model.hashutil module

Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.

Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.

This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.

Basic usage examples:

  • file object: MultiHash.from_file(
    file_object, hash_names=DEFAULT_ALGORITHMS).digest()
  • path (filepath): MultiHash.from_path(b’foo’).hexdigest()
  • data (bytes): MultiHash.from_data(b’foo’).bytehexdigest()

“Complex” usage, defining a swh hashlib instance first:

  • To compute length, integrate the length to the set of algorithms to compute, for example:

    h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS))
    with open(filepath, 'rb') as f:
        h.update(f.read(HASH_BLOCK_SIZE))
    hashes = h.digest()  # returns a dict of {hash_algo_name: hash_in_bytes}
    
  • Write alongside computing hashing algorithms (from a stream), example:

    h = MultiHash(length=length)
    with open(filepath, 'wb') as f:
        for chunk in r.iter_content():  # r a stream of sort
            h.update(chunk)
            f.write(chunk)
    hashes = h.hexdigest()  # returns a dict of {hash_algo_name: hash_in_hex}
    
swh.model.hashutil.ALGORITHMS = {'blake2b512', 'blake2s256', 'sha1', 'sha1_git', 'sha256'}

Hashing algorithms supported by this module

swh.model.hashutil.DEFAULT_ALGORITHMS = {'blake2s256', 'sha1', 'sha1_git', 'sha256'}

Algorithms computed by default when calling the functions from this module.

Subset of ALGORITHMS.

swh.model.hashutil.HASH_BLOCK_SIZE = 32768

Block size for streaming hash computations made in this module

class swh.model.hashutil.MultiHash(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]

Bases: object

Hashutil class to support multiple hashes computation.

Parameters:
  • hash_names (set) – Set of hash algorithms (+ optionally length) to compute hashes (cf. DEFAULT_ALGORITHMS)
  • length (int) – Length of the total sum of chunks to read

If the length is provided as algorithm, the length is also computed and returned.

__init__(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]

Initialize self. See help(type(self)) for accurate signature.

classmethod from_state(state, track_length)[source]
classmethod from_file(fobj, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]
classmethod from_path(path, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'})[source]
classmethod from_data(data, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'})[source]
update(chunk)[source]
digest()[source]
hexdigest()[source]
bytehexdigest()[source]
copy()[source]
__dict__ = mappingproxy({'from_path': <classmethod object>, '__doc__': 'Hashutil class to support multiple hashes computation.\n\n Args:\n\n hash_names (set): Set of hash algorithms (+ optionally length)\n to compute hashes (cf. DEFAULT_ALGORITHMS)\n length (int): Length of the total sum of chunks to read\n\n If the length is provided as algorithm, the length is also\n computed and returned.\n\n ', '__weakref__': <attribute '__weakref__' of 'MultiHash' objects>, '__module__': 'swh.model.hashutil', 'copy': <function MultiHash.copy>, 'from_state': <classmethod object>, 'hexdigest': <function MultiHash.hexdigest>, 'from_data': <classmethod object>, 'bytehexdigest': <function MultiHash.bytehexdigest>, 'digest': <function MultiHash.digest>, 'update': <function MultiHash.update>, '__dict__': <attribute '__dict__' of 'MultiHash' objects>, '__init__': <function MultiHash.__init__>, 'from_file': <classmethod object>})
__module__ = 'swh.model.hashutil'
__weakref__

list of weak references to the object (if defined)

swh.model.hashutil._new_blake2_hash(algo)[source]

Return a function that initializes a blake2 hash.

swh.model.hashutil._new_hashlib_hash(algo)[source]

Initialize a digest object from hashlib.

Handle the swh-specific names for the blake2-related algorithms

swh.model.hashutil._new_git_hash(base_algo, git_type, length)[source]

Initialize a digest object (as returned by python’s hashlib) for the requested algorithm, and feed it with the header for a git object of the given type and length.

The header for hashing a git object consists of:
  • The type of the object (encoded in ASCII)
  • One ASCII space ( )
  • The length of the object (decimal encoded in ASCII)
  • One NUL byte
Parameters:
  • base_algo (str from ALGORITHMS) – a hashlib-supported algorithm
  • git_type – the type of the git object (supposedly one of ‘blob’, ‘commit’, ‘tag’, ‘tree’)
  • length – the length of the git object you’re encoding
Returns:

a hashutil.hash object

swh.model.hashutil._new_hash(algo, length=None)[source]

Initialize a digest object (as returned by python’s hashlib) for the requested algorithm. See the constant ALGORITHMS for the list of supported algorithms. If a git-specific hashing algorithm is requested (e.g., “sha1_git”), the hashing object will be pre-fed with the needed header; for this to work, length must be given.

Parameters:
  • algo (str) – a hashing algorithm (one of ALGORITHMS)
  • length (int) – the length of the hashed payload (needed for git-specific algorithms)
Returns:

a hashutil.hash object

Raises:
  • ValueError if algo is unknown, or length is missing for a git-specific
  • hash.
swh.model.hashutil.hash_git_data(data, git_type, base_algo='sha1')[source]

Hash the given data as a git object of type git_type.

Parameters:
  • data – a bytes object
  • git_type – the git object type
  • base_algo – the base hashing algorithm used (default: sha1)

Returns: a dict mapping each algorithm to a bytes digest

Raises:ValueError if the git_type is unexpected.
swh.model.hashutil.hash_to_hex[source]

Converts a hash (in hex or bytes form) to its hexadecimal ascii form

Parameters:hash (str or bytes) – a bytes hash or a str containing the hexadecimal form of the hash
Returns:the hexadecimal form of the hash
Return type:str
swh.model.hashutil.hash_to_bytehex[source]

Converts a hash to its hexadecimal bytes representation

Parameters:hash (bytes) – a bytes hash
Returns:the hexadecimal form of the hash, as bytes
Return type:bytes
swh.model.hashutil.hash_to_bytes[source]

Converts a hash (in hex or bytes form) to its raw bytes form

Parameters:hash (str or bytes) – a bytes hash or a str containing the hexadecimal form of the hash
Returns:the bytes form of the hash
Return type:bytes
swh.model.hashutil.bytehex_to_hash[source]

Converts a hexadecimal bytes representation of a hash to that hash

Parameters:hash (bytes) – a bytes containing the hexadecimal form of the hash encoded in ascii
Returns:the bytes form of the hash
Return type:bytes

swh.model.hypothesis_strategies module

swh.model.hypothesis_strategies.pgsql_text()[source]
swh.model.hypothesis_strategies.sha1_git()[source]
swh.model.hypothesis_strategies.sha1()[source]
swh.model.hypothesis_strategies.urls()[source]
swh.model.hypothesis_strategies.persons()[source]
swh.model.hypothesis_strategies.timestamps()[source]
swh.model.hypothesis_strategies.timestamps_with_timezone()[source]
swh.model.hypothesis_strategies.origins()[source]
swh.model.hypothesis_strategies.origin_visits()[source]
swh.model.hypothesis_strategies.releases()[source]
swh.model.hypothesis_strategies.revision_metadata()[source]
swh.model.hypothesis_strategies.revisions()[source]
swh.model.hypothesis_strategies.directory_entries()[source]
swh.model.hypothesis_strategies.directories()[source]
swh.model.hypothesis_strategies.contents()[source]
swh.model.hypothesis_strategies.branch_names()[source]
swh.model.hypothesis_strategies.branch_targets_object()[source]
swh.model.hypothesis_strategies.branch_targets_alias()[source]
swh.model.hypothesis_strategies.branch_targets(*, only_objects=False)[source]
swh.model.hypothesis_strategies.snapshots(*, min_size=0, max_size=100, only_objects=False)[source]
swh.model.hypothesis_strategies.objects()[source]
swh.model.hypothesis_strategies.object_dicts()[source]

swh.model.identifiers module

swh.model.identifiers.identifier_to_bytes[source]

Convert a text identifier to bytes.

Parameters:identifier – an identifier, either a 40-char hexadecimal string or a bytes object of length 20
Returns:The length 20 bytestring corresponding to the given identifier
Raises:ValueError – if the identifier is of an unexpected type or length.
swh.model.identifiers.identifier_to_str[source]

Convert an identifier to an hexadecimal string.

Parameters:identifier – an identifier, either a 40-char hexadecimal string or a bytes object of length 20
Returns:The length 40 string corresponding to the given identifier, hex encoded
Raises:ValueError – if the identifier is of an unexpected type or length.
swh.model.identifiers.content_identifier(content)[source]

Return the intrinsic identifier for a content.

A content’s identifier is the sha1, sha1_git and sha256 checksums of its data.

Parameters:content – a content conforming to the Software Heritage schema
Returns:A dictionary with all the hashes for the data
Raises:KeyError – if the content doesn’t have a data member.
swh.model.identifiers._sort_key(entry)[source]

The sorting key for tree entries

swh.model.identifiers._perms_to_bytes[source]

Convert the perms value to its bytes representation

swh.model.identifiers.escape_newlines(snippet)[source]

Escape the newlines present in snippet according to git rules.

New lines in git manifests are escaped by indenting the next line by one space.

swh.model.identifiers.directory_identifier(directory)[source]

Return the intrinsic identifier for a directory.

A directory’s identifier is the tree sha1 à la git of a directory listing, using the following algorithm, which is equivalent to the git algorithm for trees:

  1. Entries of the directory are sorted using the name (or the name with ‘/’ appended for directory entries) as key, in bytes order.
  2. For each entry of the directory, the following bytes are output:
  • the octal representation of the permissions for the entry (stored in the ‘perms’ member), which is a representation of the entry type:
    • b’100644’ (int 33188) for files
    • b’100755’ (int 33261) for executable files
    • b’120000’ (int 40960) for symbolic links
    • b’40000’ (int 16384) for directories
    • b’160000’ (int 57344) for references to revisions
  • an ascii space (b’ ‘)
  • the entry’s name (as raw bytes), stored in the ‘name’ member
  • a null byte (b’’)
  • the 20 byte long identifier of the object pointed at by the entry, stored in the ‘target’ member:
    • for files or executable files: their blob sha1_git
    • for symbolic links: the blob sha1_git of a file containing the link destination
    • for directories: their intrinsic identifier
    • for revisions: their intrinsic identifier

(Note that there is no separator between entries)

swh.model.identifiers.format_date(date)[source]

Convert a date object into an UTC timestamp encoded as ascii bytes.

Git stores timestamps as an integer number of seconds since the UNIX epoch.

However, Software Heritage stores timestamps as an integer number of microseconds (postgres type “datetime with timezone”).

Therefore, we print timestamps with no microseconds as integers, and timestamps with microseconds as floating point values. We elide the trailing zeroes from microsecond values, to “future-proof” our representation if we ever need more precision in timestamps.

swh.model.identifiers.format_offset[source]

Convert an integer number of minutes into an offset representation.

The offset representation is [+-]hhmm where:

  • hh is the number of hours;
  • mm is the number of minutes.

A null offset is represented as +0000.

swh.model.identifiers.normalize_timestamp(time_representation)[source]

Normalize a time representation for processing by Software Heritage

This function supports a numeric timestamp (representing a number of seconds since the UNIX epoch, 1970-01-01 at 00:00 UTC), a datetime.datetime object (with timezone information), or a normalized Software Heritage time representation (idempotency).

Parameters:time_representation – the representation of a timestamp
Returns:a normalized dictionary with three keys:
  • timestamp: a dict with two optional keys:
    • seconds: the integral number of seconds since the UNIX epoch
    • microseconds: the integral number of microseconds
  • offset: the timezone offset as a number of minutes relative to UTC
  • negative_utc: a boolean representing whether the offset is -0000 when offset = 0.
Return type:dict
swh.model.identifiers.format_author(author)[source]

Format the specification of an author.

An author is either a byte string (passed unchanged), or a dict with three keys, fullname, name and email.

If the fullname exists, return it; if it doesn’t, we construct a fullname using the following heuristics: if the name value is None, we return the email in angle brackets, else, we return the name, a space, and the email in angle brackets.

swh.model.identifiers.format_author_line(header, author, date_offset)[source]

Format a an author line according to git standards.

An author line has three components:

  • a header, describing the type of author (author, committer, tagger)
  • a name and email, which is an arbitrary bytestring
  • optionally, a timestamp with UTC offset specification

The author line is formatted thus:

`header` `name and email`[ `timestamp` `utc_offset`]

The timestamp is encoded as a (decimal) number of seconds since the UNIX epoch (1970-01-01 at 00:00 UTC). As an extension to the git format, we support fractional timestamps, using a dot as the separator for the decimal part.

The utc offset is a number of minutes encoded as ‘[+-]HHMM’. Note some tools can pass a negative offset corresponding to the UTC timezone (‘-0000’), which is valid and is encoded as such.

For convenience, this function returns the whole line with its trailing newline.

Parameters:
  • header – the header of the author line (one of ‘author’, ‘committer’, ‘tagger’)
  • author – an author specification (dict with two bytes values: name and email, or byte value)
  • date_offset – a normalized date/time representation as returned by normalize_timestamp().
Returns:

the newline-terminated byte string containing the author line

swh.model.identifiers.revision_identifier(revision)[source]

Return the intrinsic identifier for a revision.

The fields used for the revision identifier computation are:

  • directory
  • parents
  • author
  • author_date
  • committer
  • committer_date
  • metadata -> extra_headers
  • message

A revision’s identifier is the ‘git’-checksum of a commit manifest constructed as follows (newlines are a single ASCII newline character):

tree <directory identifier>
[for each parent in parents]
parent <parent identifier>
[end for each parents]
author <author> <author_date>
committer <committer> <committer_date>
[for each key, value in extra_headers]
<key> <encoded value>
[end for each extra_headers]

<message>

The directory identifier is the ascii representation of its hexadecimal encoding.

Author and committer are formatted with the format_author() function. Dates are formatted with the format_offset() function.

Extra headers are an ordered list of [key, value] pairs. Keys are strings and get encoded to utf-8 for identifier computation. Values are either byte strings, unicode strings (that get encoded to utf-8), or integers (that get encoded to their utf-8 decimal representation).

Multiline extra header values are escaped by indenting the continuation lines with one ascii space.

If the message is None, the manifest ends with the last header. Else, the message is appended to the headers after an empty line.

The checksum of the full manifest is computed using the ‘commit’ git object type.

swh.model.identifiers.target_type_to_git(target_type)[source]

Convert a software heritage target type to a git object type

swh.model.identifiers.release_identifier(release)[source]

Return the intrinsic identifier for a release.

swh.model.identifiers.snapshot_identifier(snapshot, *, ignore_unresolved=False)[source]

Return the intrinsic identifier for a snapshot.

Snapshots are a set of named branches, which are pointers to objects at any level of the Software Heritage DAG.

As well as pointing to other objects in the Software Heritage DAG, branches can also be alias*es, in which case their target is the name of another branch in the same snapshot, or *dangling, in which case the target is unknown (and represented by the None value).

A snapshot identifier is a salted sha1 (using the git hashing algorithm with the snapshot object type) of a manifest following the algorithm:

  1. Branches are sorted using the name as key, in bytes order.
  2. For each branch, the following bytes are output:
  • the type of the branch target:
    • content, directory, revision, release or snapshot for the corresponding entries in the DAG;
    • alias for branches referencing another branch;
    • dangling for dangling branches
  • an ascii space (\x20)
  • the branch name (as raw bytes)
  • a null byte (\x00)
  • the length of the target identifier, as an ascii-encoded decimal number (20 for current intrinsic identifiers, 0 for dangling branches, the length of the target branch name for branch aliases)
  • a colon (:)
  • the identifier of the target object pointed at by the branch, stored in the ‘target’ member:
    • for contents: their sha1_git
    • for directories, revisions, releases or snapshots: their intrinsic identifier
    • for branch aliases, the name of the target branch (as raw bytes)
    • for dangling branches, the empty string

Note that, akin to directory manifests, there is no separator between entries. Because of symbolic branches, identifiers are of arbitrary length but are length-encoded to avoid ambiguity.

Parameters:
  • snapshot (dict) – the snapshot of which to compute the identifier. A single entry is needed, 'branches', which is itself a dict mapping each branch to its target
  • ignore_unresolved (bool) – if True, ignore unresolved branch aliases.
Returns:

the intrinsic identifier for snapshot

Return type:

str

swh.model.identifiers.origin_identifier(origin)[source]

Return the intrinsic identifier for an origin.

class swh.model.identifiers.PersistentId[source]

Bases: swh.model.identifiers.PersistentId

Named tuple holding the relevant info associated to a Software Heritage persistent identifier.

Parameters:
  • namespace (str) – the namespace of the identifier, defaults to ‘swh’
  • scheme_version (int) – the scheme version of the identifier, defaults to 1
  • object_type (str) – the type of object the identifier points to, either ‘content’, ‘directory’, ‘release’, ‘revision’ or ‘snapshot’
  • object_id (dict/bytes/str) – object’s dict representation or object identifier
  • metadata (dict) – optional dict filled with metadata related to pointed object
Raises:

swh.model.exceptions.ValidationError – In case of invalid object type or id

Once created, it contains the following attributes:

namespace

the namespace of the identifier

Type:str
scheme_version

the scheme version of the identifier

Type:int
object_type

the type of object the identifier points to

Type:str
object_id

hexadecimal representation of the object hash

Type:str
metadata

metadata related to the pointed object

Type:dict

To get the raw persistent identifier string from an instance of this named tuple, use the str() function:

pid = PersistentId(
    object_type='content',
    object_id='8ff44f081d43176474b267de5451f2c2e88089d0'
)
pid_str = str(pid)
# 'swh:1:cnt:8ff44f081d43176474b267de5451f2c2e88089d0'
__slots__ = ()
static __new__(cls, namespace='swh', scheme_version=1, object_type='', object_id='', metadata={})[source]

Create new instance of PersistentId(namespace, scheme_version, object_type, object_id, metadata)

__str__()[source]

Return str(self).

__module__ = 'swh.model.identifiers'
swh.model.identifiers.persistent_identifier(object_type, object_id, scheme_version=1, metadata={})[source]
Compute persistent identifier (stable over time) as per
documentation.
Documentation:
https://docs.softwareheritage.org/devel/swh-model/persistent-identifiers.html # noqa
Parameters:
  • object_type (str) – object’s type, either ‘content’, ‘directory’, ‘release’, ‘revision’ or ‘snapshot’
  • object_id (dict/bytes/str) – object’s dict representation or object identifier
  • scheme_version (int) – persistent identifier scheme version, defaults to 1
  • metadata (dict) – metadata related to the pointed object
Raises:
Returns:

the persistent identifier

Return type:

str

swh.model.identifiers.parse_persistent_identifier(persistent_id)[source]

Parse swh’s Persistent identifiers scheme.

Parameters:

persistent_id (str) – A persistent identifier

Raises:

swh.model.exceptions.ValidationError – in case of:

  • missing mandatory values (4)
  • invalid namespace supplied
  • invalid version supplied
  • invalid type supplied
  • missing hash
  • invalid hash identifier supplied
Returns:

a named tuple holding the parsing result

Return type:

PersistentId

swh.model.merkle module

Merkle tree data structure

swh.model.merkle.deep_update(left, right)[source]

Recursively update the left mapping with deeply nested values from the right mapping.

This function is useful to merge the results of several calls to MerkleNode.collect().

Parameters:
  • left – a mapping (modified by the update operation)
  • right – a mapping
Returns:

the left mapping, updated with nested values from the right mapping

Example

>>> a = {
...     'key1': {
...         'key2': {
...              'key3': 'value1/2/3',
...         },
...     },
... }
>>> deep_update(a, {
...     'key1': {
...         'key2': {
...              'key4': 'value1/2/4',
...         },
...     },
... }) == {
...     'key1': {
...         'key2': {
...             'key3': 'value1/2/3',
...             'key4': 'value1/2/4',
...         },
...     },
... }
True
>>> deep_update(a, {
...     'key1': {
...         'key2': {
...              'key3': 'newvalue1/2/3',
...         },
...     },
... }) == {
...     'key1': {
...         'key2': {
...             'key3': 'newvalue1/2/3',
...             'key4': 'value1/2/4',
...         },
...     },
... }
True
class swh.model.merkle.MerkleNode(data=None)[source]

Bases: dict

Representation of a node in a Merkle Tree.

A (generalized) Merkle Tree is a tree in which every node is labeled with a hash of its own data and the hash of its children.

In pseudocode:

node.hash = hash(node.data
                 + sum(child.hash for child in node.children))

This class efficiently implements the Merkle Tree data structure on top of a Python dict, minimizing hash computations and new data collections when updating nodes.

Node data is stored in the data attribute, while (named) children are stored as items of the underlying dictionary.

Addition, update and removal of objects are instrumented to automatically invalidate the hashes of the current node as well as its registered parents; It also resets the collection status of the objects so the updated objects can be collected.

The collection of updated data from the tree is implemented through the collect() function and associated helpers.

data

data associated to the current node

Type:dict
parents

known parents of the current node

Type:list
collected

whether the current node has been collected

Type:bool
__slots__ = ['parents', 'data', '__hash', 'collected']
type = None

Type of the current node (used as a classifier for collect())

__init__(data=None)[source]

Initialize self. See help(type(self)) for accurate signature.

parents
data
collected
invalidate_hash()[source]

Invalidate the cached hash of the current node.

update_hash(*, force=False)[source]

Recursively compute the hash of the current node.

Parameters:force (bool) – invalidate the cache and force the computation for this node and all children.
hash

The hash of the current node, as calculated by compute_hash().

compute_hash()[source]

Compute the hash of the current node.

The hash should depend on the data of the node, as well as on hashes of the children nodes.

__setitem__(name, new_child)[source]

Add a child, invalidating the current hash

__delitem__(name)[source]

Remove a child, invalidating the current hash

update(new_children)[source]

Add several named children from a dictionary

get_data(**kwargs)[source]

Retrieve and format the collected data for the current node, for use by collect().

Can be overridden, for instance when you want the collected data to contain information about the child nodes.

Parameters:kwargs – allow subclasses to alter behaviour depending on how collect() is called.
Returns:data formatted for collect()
collect_node(**kwargs)[source]

Collect the data for the current node, for use by collect().

Parameters:kwargs – passed as-is to get_data().
Returns:A dict compatible with collect().
collect(**kwargs)[source]

Collect the data for all nodes in the subtree rooted at self.

The data is deduplicated by type and by hash.

Parameters:kwargs – passed as-is to get_data().
Returns:A dict with the following structure:
{
  'typeA': {
    node1.hash: node1.get_data(),
    node2.hash: node2.get_data(),
  },
  'typeB': {
    node3.hash: node3.get_data(),
    ...
  },
  ...
}
reset_collect()[source]

Recursively unmark collected nodes in the subtree rooted at self.

This lets the caller use collect() again.

_MerkleNode__hash
__abstractmethods__ = frozenset({'compute_hash'})
__module__ = 'swh.model.merkle'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 111
_abc_registry = <_weakrefset.WeakSet object>
class swh.model.merkle.MerkleLeaf(data=None)[source]

Bases: swh.model.merkle.MerkleNode

A leaf to a Merkle tree.

A Merkle leaf is simply a Merkle node with children disabled.

__slots__ = []
__setitem__(name, child)[source]

Add a child, invalidating the current hash

__getitem__(name)[source]

x.__getitem__(y) <==> x[y]

__delitem__(name)[source]

Remove a child, invalidating the current hash

__abstractmethods__ = frozenset({'compute_hash'})
__module__ = 'swh.model.merkle'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 111
_abc_registry = <_weakrefset.WeakSet object>
update(new_children)[source]

Children update operation. Disabled for leaves.

swh.model.model module

class swh.model.model.BaseModel[source]

Bases: object

Base class for SWH model classes.

Provides serialization/deserialization to/from Python dictionaries, that are suitable for JSON/msgpack-like formats.

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__dict__ = mappingproxy({'__doc__': 'Base class for SWH model classes.\n\n Provides serialization/deserialization to/from Python dictionaries,\n that are suitable for JSON/msgpack-like formats.', 'to_dict': <function BaseModel.to_dict>, '__dict__': <attribute '__dict__' of 'BaseModel' objects>, '__module__': 'swh.model.model', '__weakref__': <attribute '__weakref__' of 'BaseModel' objects>, 'from_dict': <classmethod object>})
__module__ = 'swh.model.model'
__weakref__

list of weak references to the object (if defined)

class swh.model.model.Person(name: bytes, email: bytes, fullname: bytes)[source]

Bases: swh.model.model.BaseModel

Represents the author/committer of a revision or release.

__attrs_attrs__ = (Attribute(name='name', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='email', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='fullname', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(name: bytes, email: bytes, fullname: bytes) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Timestamp(seconds: int, microseconds: int)[source]

Bases: swh.model.model.BaseModel

Represents a naive timestamp from a VCS.

check_seconds(attribute, value)[source]

Check that seconds fit in a 64-bits signed integer.

check_microseconds(attribute, value)[source]

Checks that microseconds are positive and < 1000000.

__attrs_attrs__ = (Attribute(name='seconds', default=NOTHING, validator=<function Timestamp.check_seconds>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False), Attribute(name='microseconds', default=NOTHING, validator=<function Timestamp.check_microseconds>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(seconds: int, microseconds: int) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.TimestampWithTimezone(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool)[source]

Bases: swh.model.model.BaseModel

Represents a TZ-aware timestamp from a VCS.

check_offset(attribute, value)[source]

Checks the offset is a 16-bits signed integer (in theory, it should always be between -14 and +14 hours).

classmethod from_dict(d)[source]

Builds a TimestampWithTimezone from any of the formats accepted by :py:`swh.model.normalize_timestamp`.

__attrs_attrs__ = (Attribute(name='timestamp', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.Timestamp'>, converter=None, kw_only=False), Attribute(name='offset', default=NOTHING, validator=<function TimestampWithTimezone.check_offset>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False), Attribute(name='negative_utc', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bool'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(timestamp: swh.model.model.Timestamp, offset: int, negative_utc: bool) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Origin(type: str, url: str)[source]

Bases: swh.model.model.BaseModel

Represents a software source: a VCS and an URL.

__attrs_attrs__ = (Attribute(name='type', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'str'>, converter=None, kw_only=False), Attribute(name='url', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'str'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(type: str, url: str) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.OriginVisit(origin: swh.model.model.Origin, date: datetime.datetime, visit: Optional[int])[source]

Bases: swh.model.model.BaseModel

Represents a visit of an origin at a given point in time, by a SWH loader.

visit = None

Should not be set before calling ‘origin_visit_add()’.

to_dict()[source]

Serializes the date as a string and omits the visit id if it is None.

classmethod from_dict(d)[source]

Parses the date from a string, and accepts missing visit ids.

__attrs_attrs__ = (Attribute(name='origin', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.Origin'>, converter=None, kw_only=False), Attribute(name='date', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'datetime.datetime'>, converter=None, kw_only=False), Attribute(name='visit', default=NOTHING, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[int, NoneType], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(origin: swh.model.model.Origin, date: datetime.datetime, visit: Optional[int]) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.TargetType[source]

Bases: enum.Enum

The type of content pointed to by a snapshot branch. Usually a revision or an alias.

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
ALIAS = 'alias'
__module__ = 'swh.model.model'
class swh.model.model.ObjectType[source]

Bases: enum.Enum

The type of content pointed to by a release. Usually a revision

CONTENT = 'content'
DIRECTORY = 'directory'
REVISION = 'revision'
RELEASE = 'release'
SNAPSHOT = 'snapshot'
__module__ = 'swh.model.model'
class swh.model.model.SnapshotBranch(target: bytes, target_type: swh.model.model.TargetType)[source]

Bases: swh.model.model.BaseModel

Represents one of the branches of a snapshot.

check_target(attribute, value)[source]

Checks the target type is not an alias, checks the target is a valid sha1_git.

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__attrs_attrs__ = (Attribute(name='target', default=NOTHING, validator=<function SnapshotBranch.check_target>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='target_type', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<enum 'TargetType'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(target: bytes, target_type: swh.model.model.TargetType) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Snapshot(id: bytes, branches: Dict[bytes, Optional[swh.model.model.SnapshotBranch]])[source]

Bases: swh.model.model.BaseModel

Represents the full state of an origin at a given point in time.

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__attrs_attrs__ = (Attribute(name='id', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='branches', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Dict[bytes, typing.Union[swh.model.model.SnapshotBranch, NoneType]], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(id: bytes, branches: Dict[bytes, Optional[swh.model.model.SnapshotBranch]]) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Release(id: bytes, name: bytes, message: bytes, target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata: Optional[Dict[str, object]] = None)[source]

Bases: swh.model.model.BaseModel

check_author(attribute, value)[source]

If the author is None, checks the date is None too.

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__attrs_attrs__ = (Attribute(name='id', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='name', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='message', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='target', default=NOTHING, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[bytes, NoneType], converter=None, kw_only=False), Attribute(name='target_type', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<enum 'ObjectType'>, converter=None, kw_only=False), Attribute(name='synthetic', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bool'>, converter=None, kw_only=False), Attribute(name='author', default=None, validator=_AndValidator(_validators=(<optional validator for _AndValidator(_validators=[]) or None>, <function Release.check_author>)), repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[swh.model.model.Person, NoneType], converter=None, kw_only=False), Attribute(name='date', default=None, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[swh.model.model.TimestampWithTimezone, NoneType], converter=None, kw_only=False), Attribute(name='metadata', default=None, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[typing.Dict[str, object], NoneType], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(id: bytes, name: bytes, message: bytes, target: Optional[bytes], target_type: swh.model.model.ObjectType, synthetic: bool, author: Optional[swh.model.model.Person] = None, date: Optional[swh.model.model.TimestampWithTimezone] = None, metadata: Optional[Dict[str, object]] = None) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.RevisionType[source]

Bases: enum.Enum

An enumeration.

GIT = 'git'
TAR = 'tar'
DSC = 'dsc'
SUBVERSION = 'svn'
MERCURIAL = 'hg'
__module__ = 'swh.model.model'
class swh.model.model.Revision(id: bytes, message: bytes, author: swh.model.model.Person, committer: swh.model.model.Person, date: swh.model.model.TimestampWithTimezone, committer_date: swh.model.model.TimestampWithTimezone, type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata: Optional[Dict[str, object]] = None, parents: List[bytes] = NOTHING)[source]

Bases: swh.model.model.BaseModel

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__attrs_attrs__ = (Attribute(name='id', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='message', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='author', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.Person'>, converter=None, kw_only=False), Attribute(name='committer', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.Person'>, converter=None, kw_only=False), Attribute(name='date', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.TimestampWithTimezone'>, converter=None, kw_only=False), Attribute(name='committer_date', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'swh.model.model.TimestampWithTimezone'>, converter=None, kw_only=False), Attribute(name='type', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<enum 'RevisionType'>, converter=None, kw_only=False), Attribute(name='directory', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='synthetic', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bool'>, converter=None, kw_only=False), Attribute(name='metadata', default=None, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[typing.Dict[str, object], NoneType], converter=None, kw_only=False), Attribute(name='parents', default=Factory(factory=<class 'list'>, takes_self=False), validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.List[bytes], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(id: bytes, message: bytes, author: swh.model.model.Person, committer: swh.model.model.Person, date: swh.model.model.TimestampWithTimezone, committer_date: swh.model.model.TimestampWithTimezone, type: swh.model.model.RevisionType, directory: bytes, synthetic: bool, metadata: Optional[Dict[str, object]] = None, parents: List[bytes] = NOTHING) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.DirectoryEntry(name: bytes, type: str, target: bytes, perms: int)[source]

Bases: swh.model.model.BaseModel

perms = None

Usually one of the values of swh.model.from_disk.DentryPerms.

__attrs_attrs__ = (Attribute(name='name', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='type', default=NOTHING, validator=<in_ validator with options ['file', 'dir', 'rev']>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'str'>, converter=None, kw_only=False), Attribute(name='target', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='perms', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(name: bytes, type: str, target: bytes, perms: int) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Directory(id: bytes, entries: List[swh.model.model.DirectoryEntry])[source]

Bases: swh.model.model.BaseModel

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

classmethod from_dict(d)[source]

Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.

__attrs_attrs__ = (Attribute(name='id', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='entries', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.List[swh.model.model.DirectoryEntry], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(id: bytes, entries: List[swh.model.model.DirectoryEntry]) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

class swh.model.model.Content(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str, reason: Optional[str] = None, data: Optional[bytes] = None)[source]

Bases: swh.model.model.BaseModel

check_length(attribute, value)[source]

Checks the length is positive.

__attrs_attrs__ = (Attribute(name='sha1', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='sha1_git', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='sha256', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='blake2s256', default=NOTHING, validator=None, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'bytes'>, converter=None, kw_only=False), Attribute(name='length', default=NOTHING, validator=<function Content.check_length>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'int'>, converter=None, kw_only=False), Attribute(name='status', default=NOTHING, validator=<in_ validator with options ['visible', 'absent', 'hidden']>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=<class 'str'>, converter=None, kw_only=False), Attribute(name='reason', default=None, validator=_AndValidator(_validators=(<optional validator for _AndValidator(_validators=[]) or None>, <function Content.check_reason>)), repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[str, NoneType], converter=None, kw_only=False), Attribute(name='data', default=None, validator=<optional validator for _AndValidator(_validators=[]) or None>, repr=True, cmp=True, hash=None, init=True, metadata=mappingproxy({}), type=typing.Union[bytes, NoneType], converter=None, kw_only=False))
__eq__(other)

Return self==value.

__ge__(other)

Automatically created by attrs.

__gt__(other)

Automatically created by attrs.

__hash__ = None
__init__(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str, reason: Optional[str] = None, data: Optional[bytes] = None) → None

Initialize self. See help(type(self)) for accurate signature.

__le__(other)

Automatically created by attrs.

__lt__(other)

Automatically created by attrs.

__module__ = 'swh.model.model'
__ne__(other)

Check equality and either forward a NotImplemented or return the result negated.

__repr__()

Automatically created by attrs.

check_reason(attribute, value)[source]

Checks the reason is full iff status != absent.

to_dict()[source]

Wrapper of attr.asdict that can be overriden by subclasses that have special handling of some of the fields.

swh.model.toposort module

swh.model.toposort.toposort(revision_log)[source]

Perform a topological sort on a revision log graph.

Complexity: O(N) (linear in the length of the revision log)

Parameters:revision_log – Revision log as returned by swh.storage.Storage.revision_log().
Yields:The revision log sorted by a topological order

swh.model.validators module

swh.model.validators.validate_content(content)[source]

Validate that a content has the correct schema.

Args: a content (dictionary) to validate.

Module contents