swh.graph.luigi.topology module#

class swh.graph.luigi.topology.TopoSort(*args, **kwargs)[source]#

Bases: Task

Creates a file that contains all SWHIDs in topological order from a compressed graph.

local_graph_path = <luigi.parameter.PathParameter object>#
topological_order_dir = <luigi.parameter.PathParameter object>#
graph_name = <luigi.parameter.Parameter object>#
object_types = <luigi.parameter.Parameter object>#
direction = <luigi.parameter.ChoiceParameter object>#
algorithm = <luigi.parameter.ChoiceParameter object>#
property resources#

Return the estimated RAM use of this task.

requires() List[Task][source]#

Returns an instance of LocalGraph.

output() Target[source]#

.csv.zst file that contains the topological order.

run() None[source]#

Runs ‘toposort’ command from tools/topology and compresses

class swh.graph.luigi.topology.CountPaths(*args, **kwargs)[source]#

Bases: Task

Creates a file that lists:

  • the number of paths leading to each node, and starting from all leaves, and

  • the number of paths leading to each node, and starting from all other nodes

Singleton paths are not counted.

local_graph_path = <luigi.parameter.PathParameter object>#
topological_order_dir = <luigi.parameter.PathParameter object>#
graph_name = <luigi.parameter.Parameter object>#
object_types = <luigi.parameter.Parameter object>#
direction = <luigi.parameter.ChoiceParameter object>#
property resources#

Return the estimated RAM use of this task.

requires() Dict[str, Task][source]#

Returns an instance of LocalGraph and one of TopoSort.

output() Target[source]#

.csv.zst file that contains the counts.

nb_lines()[source]#
run() None[source]#

Runs ‘count_paths’ command from tools/topology and compresses

class swh.graph.luigi.topology.PathCountsParquetToS3(*args, **kwargs)[source]#

Bases: _ParquetToS3ToAthenaTask

Reads the CSV from CountPaths, converts it to ORC, upload the ORC to S3, and create an Athena table for it.

topological_order_dir = <luigi.parameter.PathParameter object>#
object_types = <luigi.parameter.Parameter object>#
direction = <luigi.parameter.ChoiceParameter object>#
dataset_name = <luigi.parameter.Parameter object>#
s3_athena_output_location = <swh.dataset.luigi.S3PathParameter object>#
requires() CountPaths[source]#

Returns corresponding CountPaths instance