swh.scanner.data module

class swh.scanner.data.MerkleNodeInfo[source]

Bases: dict

Store additional information about Merkle DAG nodes, using SWHIDs as keys

swh.scanner.data.init_merkle_node_info(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo, info: set)[source]

Populate the MerkleNodeInfo with the SWHIDs of the given source tree and the attributes that will be stored.

async swh.scanner.data.add_origin(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo, client: swh.scanner.client.Client)[source]

Store origin information about software artifacts retrieved from the Software Heritage graph service.

swh.scanner.data.get_directory_data(root_path: str, source_tree: swh.model.from_disk.Directory, nodes_data: swh.scanner.data.MerkleNodeInfo, directory_data: Dict = {}) Dict[pathlib.Path, dict][source]

Get content information for each directory inside source_tree.


A dictionary with a directory path as key and the relative contents information as values.

swh.scanner.data.directory_content(node: swh.model.from_disk.Directory, nodes_data: swh.scanner.data.MerkleNodeInfo) Tuple[int, int][source]

Count known contents inside the given directory.


A tuple with the total number of contents inside the directory and the number of known contents.

swh.scanner.data.has_dirs(node: swh.model.from_disk.Directory) bool[source]

Check if the given directory has other directories inside.

swh.scanner.data.get_content_from(node_path: bytes, source_tree: swh.model.from_disk.Directory, nodes_data: swh.scanner.data.MerkleNodeInfo) Dict[bytes, dict][source]

Get content information from the given directory node.