swh.scanner.policy module#

swh.scanner.policy.source_size(source_tree: Directory)[source]#

return the size of a source tree as the number of nodes it contains

class swh.scanner.policy.Policy(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: object

source_tree: Directory#

representation of a source code project directory in the merkle tree

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

abstract async run(client: Client)[source]#

Scan a source code project

class swh.scanner.policy.LazyBFS(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Read nodes in the merkle tree using the BFS algorithm. Lookup only directories that are unknown otherwise set all the downstream contents to known.

async run(client: Client)[source]#

Scan a source code project

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

source_tree: Directory#

representation of a source code project directory in the merkle tree

class swh.scanner.policy.GreedyBFS(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Query graph nodes in chunks (to maximize the Web API rate limit use) and set the downstream contents of known directories to known.

async run(client: Client)[source]#

Scan a source code project

async get_nodes_chunks(client: Client, ssize: int)[source]#

Query chunks of QUERY_LIMIT nodes at once in order to fill the Web API rate limit. It query all the nodes in the case the source code contains less than QUERY_LIMIT nodes.

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

source_tree: Directory#

representation of a source code project directory in the merkle tree

class swh.scanner.policy.FilePriority(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Check the Merkle tree querying all the file contents and set all the upstream directories to unknown in the case a file content is unknown. Finally check all the directories which status is still unknown and set all the sub-directories of known directories to known.

async run(client: Client)[source]#

Scan a source code project

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

source_tree: Directory#

representation of a source code project directory in the merkle tree

class swh.scanner.policy.DirectoryPriority(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Check the Merkle tree querying all the directories that have at least one file content and set all the upstream directories to unknown in the case a directory is unknown otherwise set all the downstream contents to known. Finally check the status of empty directories and all the remaining file contents.

async run(client: Client)[source]#

Scan a source code project

has_contents(directory: Directory)[source]#

Check if the directory given in input has contents

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

source_tree: Directory#

representation of a source code project directory in the merkle tree

get_contents(dir_: Directory)[source]#

Get all the contents of a given directory

class swh.scanner.policy.QueryAll(source_tree: Directory, data: MerkleNodeInfo)[source]#

Bases: Policy

Check the status of every node in the Merkle tree.

data: MerkleNodeInfo#

information about contents and directories of the merkle tree

source_tree: Directory#

representation of a source code project directory in the merkle tree

async run(client: Client)[source]#

Scan a source code project