swh.scanner.policy module#

swh.scanner.policy.source_size(source_tree: Directory)[source]#

return the size of a source tree as the number of nodes it contains

class swh.scanner.policy.Policy(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: object

source_tree: Directory#

representation of a source code project directory in the merkle tree

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

abstract run(client: WebAPIClient, update_info: Callable[[Any], None] | None = None)[source]#

Scan a source code project

class swh.scanner.policy.WebAPIConnection(contents: List[Content], skipped_contents: List[SkippedContent], directories: List[Directory], client: WebAPIClient)[source]#

Bases: ArchiveDiscoveryInterface

Use the web APIs to query the archive

content_missing(contents: List[bytes]) List[bytes][source]#

List content missing from the archive by sha1

skipped_content_missing(skipped_contents: List[bytes]) Iterable[bytes][source]#

List skipped content missing from the archive by sha1

directory_missing(directories: List[bytes]) Iterable[bytes][source]#

List directories missing from the archive by sha1

class swh.scanner.policy.RandomDirSamplingPriority(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Check the Merkle tree querying random directories. Set all ancestors to unknown for unknown directories, otherwise set all descendants to known. Finally check all the remaining file contents.

run(client: WebAPIClient, update_info: Callable[[Any], None] | None = None)[source]#

Scan a source code project