swh.scanner.policy module

swh.scanner.policy.source_size(source_tree: swh.model.from_disk.Directory)[source]

return the size of a source tree as the number of nodes it contains

class swh.scanner.policy.Policy(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: object

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

abstract async run(client: swh.scanner.client.Client)[source]

Scan a source code project

class swh.scanner.policy.LazyBFS(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: swh.scanner.policy.Policy

Read nodes in the merkle tree using the BFS algorithm. Lookup only directories that are unknown otherwise set all the downstream contents to known.

async run(client: swh.scanner.client.Client)[source]

Scan a source code project

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

class swh.scanner.policy.GreedyBFS(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: swh.scanner.policy.Policy

Query graph nodes in chunks (to maximize the Web API rate limit use) and set the downstream contents of known directories to known.

async run(client: swh.scanner.client.Client)[source]

Scan a source code project

async get_nodes_chunks(client: swh.scanner.client.Client, ssize: int)[source]

Query chunks of QUERY_LIMIT nodes at once in order to fill the Web API rate limit. It query all the nodes in the case the source code contains less than QUERY_LIMIT nodes.

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

class swh.scanner.policy.FilePriority(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: swh.scanner.policy.Policy

Check the Merkle tree querying all the file contents and set all the upstream directories to unknown in the case a file content is unknown. Finally check all the directories which status is still unknown and set all the sub-directories of known directories to known.

async run(client: swh.scanner.client.Client)[source]

Scan a source code project

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

class swh.scanner.policy.DirectoryPriority(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: swh.scanner.policy.Policy

Check the Merkle tree querying all the directories that have at least one file content and set all the upstream directories to unknown in the case a directory is unknown otherwise set all the downstream contents to known. Finally check the status of empty directories and all the remaining file contents.

async run(client: swh.scanner.client.Client)[source]

Scan a source code project

has_contents(directory: swh.model.from_disk.Directory)[source]

Check if the directory given in input has contents

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

get_contents(dir_: swh.model.from_disk.Directory)[source]

Get all the contents of a given directory

class swh.scanner.policy.QueryAll(source_tree: swh.model.from_disk.Directory, data: swh.scanner.data.MerkleNodeInfo)[source]

Bases: swh.scanner.policy.Policy

Check the status of every node in the Merkle tree.

data: swh.scanner.data.MerkleNodeInfo

information about contents and directories of the merkle tree

source_tree: swh.model.from_disk.Directory

representation of a source code project directory in the merkle tree

async run(client: swh.scanner.client.Client)[source]

Scan a source code project