swh.scanner.policy module#
- swh.scanner.policy.source_size(source_tree: Directory)[source]#
return the size of a source tree as the number of nodes it contains
- class swh.scanner.policy.Policy(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
object
- data: MerkleNodeInfo#
information about contents and directories of the merkle tree
- class swh.scanner.policy.LazyBFS(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
Policy
Read nodes in the merkle tree using the BFS algorithm. Lookup only directories that are unknown otherwise set all the downstream contents to known.
- class swh.scanner.policy.GreedyBFS(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
Policy
Query graph nodes in chunks (to maximize the Web API rate limit use) and set the downstream contents of known directories to known.
- class swh.scanner.policy.FilePriority(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
Policy
Check the Merkle tree querying all the file contents and set all the upstream directories to unknown in the case a file content is unknown. Finally check all the directories which status is still unknown and set all the sub-directories of known directories to known.
- class swh.scanner.policy.DirectoryPriority(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
Policy
Check the Merkle tree querying all the directories that have at least one file content and set all the upstream directories to unknown in the case a directory is unknown otherwise set all the downstream contents to known. Finally check the status of empty directories and all the remaining file contents.
- class swh.scanner.policy.WebAPIConnection(contents: List[Content], skipped_contents: List[SkippedContent], directories: List[Directory], client: Client)[source]#
Bases:
ArchiveDiscoveryInterface
Use the web APIs to query the archive
- async content_missing(contents: List[bytes]) List[bytes] [source]#
List content missing from the archive by sha1
- class swh.scanner.policy.RandomDirSamplingPriority(source_tree: Directory, data: MerkleNodeInfo)[source]#
Bases:
Policy
Check the Merkle tree querying random directories. Set all ancestors to unknown for unknown directories, otherwise set all descendants to known. Finally check all the remaining file contents.