swh.scanner.data module#

class swh.scanner.data.MerkleNodeInfo[source]#

Bases: dict

Store additional information about Merkle DAG nodes, using SWHIDs as keys

swh.scanner.data.init_merkle_node_info(source_tree: Directory, data: MerkleNodeInfo, info: set)[source]#

Populate the MerkleNodeInfo with the SWHIDs of the given source tree and the attributes that will be stored.

async swh.scanner.data.add_origin(source_tree: Directory, data: MerkleNodeInfo, client: Client)[source]#

Store origin information about software artifacts retrieved from the Software Heritage graph service.

swh.scanner.data.get_directory_data(root_path: str, source_tree: Directory, nodes_data: MerkleNodeInfo, directory_data: Dict = {}) Dict[Path, dict][source]#

Get content information for each directory inside source_tree.


A dictionary with a directory path as key and the relative contents information as values.

swh.scanner.data.directory_content(node: Directory, nodes_data: MerkleNodeInfo) Tuple[int, int][source]#

Count known contents inside the given directory.


A tuple with the total number of contents inside the directory and the number of known contents.

swh.scanner.data.has_dirs(node: Directory) bool[source]#

Check if the given directory has other directories inside.

swh.scanner.data.get_content_from(node_path: bytes, source_tree: Directory, nodes_data: MerkleNodeInfo) Dict[bytes, dict][source]#

Get content information from the given directory node.

swh.scanner.data.get_git_ignore_patterns(cwd: Path | None)[source]#
swh.scanner.data.get_hg_ignore_patterns(cwd: Path | None)[source]#
swh.scanner.data.get_svn_ignore_patterns(cwd: Path | None)[source]#
swh.scanner.data.vcs_detected(folder_path: str) bool[source]#
swh.scanner.data.get_vcs_ignore_patterns(cwd: Path | None = None) List[bytes][source]#

Return a list of all patterns to ignore according to the VCS used for the project being scanned, if any.