swh.scanner.scanner module

async swh.scanner.scanner.swhids_discovery(swhids: List[str], session: aiohttp.client.ClientSession, api_url: str) Dict[str, Dict[str, bool]][source]

API Request to get information about the SoftWare Heritage persistent IDentifiers (SWHIDs) given in input.

  • swhids – a list of SWHIDS

  • api_url – url for the API request



SWHID searched


value[‘known’] = True if the SWHID is found value[‘known’] = False if the SWHID is not found

Return type

A dictionary with

swh.scanner.scanner.directory_filter(path_name: Union[str, bytes], exclude_patterns: Iterable[Pattern[bytes]]) bool[source]

It checks if the path_name is matching with the patterns given in input.

It is also used as a dir_filter function when generating the directory object from swh.model.from_disk


False if the directory has to be ignored, True otherwise

swh.scanner.scanner.get_subpaths(path: pathlib.Path, exclude_patterns: Iterable[Pattern[bytes]]) Iterator[Tuple[pathlib.Path, str]][source]

Find the SoftWare Heritage persistent IDentifier (SWHID) of the directories and files under a given path.


path – the root path


pairs of – path, the relative SWHID

async swh.scanner.scanner.parse_path(path: pathlib.Path, session: aiohttp.client.ClientSession, api_url: str, exclude_patterns: Iterable[Pattern[bytes]]) Iterator[Tuple[str, str, bool]][source]

Check if the sub paths of the given path are present in the archive or not.

  • path – the source path

  • api_url – url for the API request


a subpath of the given path, the SWHID of the subpath and the result of the api call

Return type

a map containing tuples with

async swh.scanner.scanner.run(config: Dict[str, Any], root: str, source_tree: swh.scanner.model.Tree, exclude_patterns: Iterable[Pattern[bytes]]) None[source]

Start scanning from the given root.

It fills the source tree with the path discovered.

  • root – the root path to scan

  • api_url – url for the API request

swh.scanner.scanner.scan(config: Dict[str, Any], root_path: str, exclude_patterns: Iterable[str], out_fmt: str, interactive: bool)[source]

Scan a source code project to discover files and directories already present in the archive