swh.model.from_disk module¶
-
class
swh.model.from_disk.
DiskBackedContent
(sha1: bytes, sha1_git: bytes, sha256: bytes, blake2s256: bytes, length: int, status: str = 'visible', ctime: Optional[datetime.datetime] = None, path: Optional[bytes] = None)[source]¶ Bases:
swh.model.model.BaseContent
Content-like class, which allows lazy-loading data from the disk.
-
object_type
: typing_extensions.Final = 'content_file'¶
-
sha1
¶
-
sha1_git
¶
-
sha256
¶
-
blake2s256
¶
-
length
¶
-
status
¶
-
ctime
¶
-
path
¶
-
classmethod
from_dict
(d)[source]¶ Takes a dictionary representing a tree of SWH objects, and recursively builds the corresponding objects.
-
with_data
() → swh.model.model.Content[source]¶
-
-
class
swh.model.from_disk.
DentryPerms
(value)[source]¶ Bases:
enum.IntEnum
Admissible permissions for directory entries.
-
content
= 33188¶ Content
-
executable_content
= 33261¶ Executable content (e.g. executable script)
-
symlink
= 40960¶ Symbolic link
-
directory
= 16384¶ Directory
-
revision
= 57344¶ Revision (e.g. submodule)
-
-
swh.model.from_disk.
mode_to_perms
(mode)[source]¶ Convert a file mode to a permission compatible with Software Heritage directory entries
- Parameters
mode (int) – a file mode as returned by
os.stat()
inos.stat_result.st_mode
- Returns
- one of the following values:
DentryPerms.content
: plain fileDentryPerms.executable_content
: executable fileDentryPerms.symlink
: symbolic linkDentryPerms.directory
: directory
- Return type
-
class
swh.model.from_disk.
Content
(data=None)[source]¶ Bases:
swh.model.merkle.MerkleLeaf
Representation of a Software Heritage content as a node in a Merkle tree.
The current Merkle hash for the Content nodes is the sha1_git, which makes it consistent with what
Directory
uses for its own hash computation.-
object_type
: typing_extensions.Final = 'content'¶
-
classmethod
from_bytes
(*, mode, data)[source]¶ Convert data (raw
bytes
) to a Software Heritage content entry- Parameters
mode (int) – a file mode (passed to
mode_to_perms()
)data (bytes) – raw contents of the file
-
classmethod
from_symlink
(*, path, mode)[source]¶ Convert a symbolic link to a Software Heritage content entry
-
classmethod
from_file
(*, path, max_content_length=None)[source]¶ Compute the Software Heritage content entry corresponding to an on-disk file.
The returned dictionary contains keys useful for both: - loading the content in the archive (hashes, length) - using the content as a directory entry in a directory
- Parameters
save_path (bool) – add the file path to the entry
max_content_length (Optional[int]) – if given, all contents larger than this will be skipped.
-
compute_hash
()[source]¶ Compute the hash of the current node.
The hash should depend on the data of the node, as well as on hashes of the children nodes.
-
to_model
() → swh.model.model.BaseContent[source]¶ Builds a model.BaseContent object based on this leaf.
-
-
swh.model.from_disk.
accept_all_directories
(dirpath: str, dirname: str, entries: Iterable[Any]) → bool[source]¶ Default filter for
Directory.from_disk()
accepting all directories- Parameters
dirname (bytes) – directory name
entries (list) – directory entries
-
swh.model.from_disk.
ignore_empty_directories
(dirpath: str, dirname: str, entries: Iterable[Any]) → bool[source]¶ Filter for
directory_to_objects()
ignoring empty directories- Parameters
dirname (bytes) – directory name
entries (list) – directory entries
- Returns
True if the directory is not empty, false if the directory is empty
-
swh.model.from_disk.
ignore_named_directories
(names, *, case_sensitive=True)[source]¶ Filter for
directory_to_objects()
to ignore directories named one of names.- Parameters
names (list of bytes) – names to ignore
case_sensitive (bool) – whether to do the filtering in a case sensitive way
- Returns
a directory filter for
directory_to_objects()
-
swh.model.from_disk.
extract_regex_objs
(root_path: bytes, patterns: Iterable[bytes]) → Iterator[Pattern[bytes]][source]¶ - Generates a regex object for each pattern given in input and checks if
the path is a subdirectory or relative to the root path.
- Parameters
root_path (bytes) – path to the root directory
patterns – patterns to match
-
swh.model.from_disk.
ignore_directories_patterns
(root_path: bytes, patterns: Iterable[bytes])[source]¶ Filter for
directory_to_objects()
to ignore directories matching certain patterns.- Parameters
root_path (bytes) – path of the root directory
patterns (list of byte) – patterns to ignore
- Returns
a directory filter for
directory_to_objects()
-
swh.model.from_disk.
iter_directory
(directory) → Tuple[List[swh.model.model.Content], List[swh.model.model.SkippedContent], List[swh.model.model.Directory]][source]¶ Return the directory listing from a disk-memory directory instance.
- Raises
TypeError in case an unexpected object type is listed. –
- Returns
Tuple of respectively iterable of content, skipped content and directories.
-
class
swh.model.from_disk.
Directory
(data=None)[source]¶ Bases:
swh.model.merkle.MerkleNode
Representation of a Software Heritage directory as a node in a Merkle Tree.
This class can be used to generate, from an on-disk directory, all the objects that need to be sent to the Software Heritage archive.
The
from_disk()
constructor allows you to generate the data structure from a directory on disk. The resultingDirectory
can then be manipulated as a dictionary, using the path as key.The
collect()
method is used to retrieve all the objects that need to be added to the Software Heritage archive since the last collection, by class (contents and directories).When using the dict-like methods to update the contents of the directory, the affected levels of hierarchy are reset and can be collected again using the same method. This enables the efficient collection of updated nodes, for instance when the client is applying diffs.
-
object_type
: typing_extensions.Final = 'directory'¶
-
classmethod
from_disk
(*, path, dir_filter=<function accept_all_directories>, max_content_length=None)[source]¶ Compute the Software Heritage objects for a given directory tree
- Parameters
path (bytes) – the directory to traverse
data (bool) – whether to add the data to the content objects
save_path (bool) – whether to add the path to the content objects
dir_filter (function) – a filter to ignore some directories by name or contents. Takes two arguments: dirname and entries, and returns True if the directory should be added, False if the directory should be ignored.
max_content_length (Optional[int]) – if given, all contents larger than this will be skipped.
-
get_data
(**kwargs)[source]¶ Retrieve and format the collected data for the current node, for use by
collect()
.Can be overridden, for instance when you want the collected data to contain information about the child nodes.
- Parameters
kwargs – allow subclasses to alter behaviour depending on how
collect()
is called.- Returns
data formatted for
collect()
-
property
entries
¶ Child nodes, sorted by name in the same way directory_identifier does.
-
compute_hash
()[source]¶ Compute the hash of the current node.
The hash should depend on the data of the node, as well as on hashes of the children nodes.
-
to_model
() → swh.model.model.Directory[source]¶ Builds a model.Directory object based on this node; ignoring its children.
-