swh.graph.luigi.provenance module#
Luigi tasks to help compute the provenance of content blobs#
This module contains Luigi tasks driving the computation of the Provenance index.
- class swh.graph.luigi.provenance.ListProvenanceNodes(*args, **kwargs)[source]#
Bases:
Task
Lists all nodes reachable from releases and ‘head revisions’.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- class swh.graph.luigi.provenance.ComputeEarliestTimestamps(*args, **kwargs)[source]#
Bases:
Task
Creates an array storing, for each directory/content SWHIDs, the author date of the first revision/release that contains it.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- class swh.graph.luigi.provenance.ListDirectoryMaxLeafTimestamp(*args, **kwargs)[source]#
Bases:
Task
Creates a file that contains all directory/content SWHIDs, along with the first revision/release author date and SWHIDs they occur in.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- requires() Dict[str, Task] [source]#
Returns
LocalGraph
andComputeEarliestTimestamps
instances.
- class swh.graph.luigi.provenance.ComputeDirectoryFrontier(*args, **kwargs)[source]#
Bases:
Task
Creates a file that contains the “directory frontier” as defined by swh-provenance.
In short, it is a directory which directly contains a file (not a directory), which is a non-root directory in a revision newer than the directory timestamp computed by ListDirectoryMaxLeafTimestamp.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- max_ram_mb = <luigi.parameter.IntParameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- requires() Dict[str, Task] [source]#
Returns
LocalGraph
andListDirectoryMaxLeafTimestamp
instances.
- class swh.graph.luigi.provenance.ListFrontierDirectoriesInRevisions(*args, **kwargs)[source]#
Bases:
Task
Creates a file that contains the list of revision any “frontier directory” (as defined by swh-provenance) is in.
While a directory is considered frontier only relative to a revision, the produced file contains the list of all revisions a directory is in, for directories which are frontier for any revision.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- max_ram_mb = <luigi.parameter.IntParameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- requires() Dict[str, Task] [source]#
Returns
LocalGraph
andComputeDirectoryFrontier
instances.
- class swh.graph.luigi.provenance.ListContentsInRevisionsWithoutFrontier(*args, **kwargs)[source]#
Bases:
Task
Creates a file that contains the list of (file, revision) where the file is reachable from the revision without going through any “directory frontier” as defined by swh-provenance.
In short, it is a directory which directly contains a file (not a directory), which is a non-root directory in a revision newer than the directory timestamp computed by ListDirectoryMaxLeafTimestamp.
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- max_ram_mb = <luigi.parameter.IntParameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- requires() Dict[str, Task] [source]#
Returns
LocalGraph
andListDirectoryMaxLeafTimestamp
instances.
- class swh.graph.luigi.provenance.ListContentsInFrontierDirectories(*args, **kwargs)[source]#
Bases:
Task
Enumerates all contents in all directories returned by
ComputeDirectoryFrontier
.- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- max_ram_mb = <luigi.parameter.IntParameter object>#
- property resources#
Returns the value of
self.max_ram_mb
- requires() Dict[str, Task] [source]#
Returns
LocalGraph
andComputeDirectoryFrontier
instances.
- class swh.graph.luigi.provenance.RunProvenance(*args, **kwargs)[source]#
Bases:
WrapperTask
(Transitively) depends on all provenance tasks
- local_export_path = <luigi.parameter.PathParameter object>#
- local_graph_path = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- provenance_dir = <luigi.parameter.PathParameter object>#
- provenance_node_filter = <luigi.parameter.Parameter object>#
- max_ram_mb = <luigi.parameter.IntParameter object>#
- requires()[source]#
Returns
ListContentsInFrontierDirectories
andListContentsInRevisionsWithoutFrontier