swh.graph.luigi.topology module#
- class swh.graph.luigi.topology.TopoSort(*args, **kwargs)[source]#
Bases:
Task
Creates a file that contains all SWHIDs in topological order from a compressed graph.
- local_graph_path = <luigi.parameter.PathParameter object>#
- topological_order_dir = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- object_types = <luigi.parameter.Parameter object>#
- direction = <luigi.parameter.ChoiceParameter object>#
- algorithm = <luigi.parameter.ChoiceParameter object>#
- property resources#
Return the estimated RAM use of this task.
- class swh.graph.luigi.topology.CountPaths(*args, **kwargs)[source]#
Bases:
Task
Creates a file that lists:
the number of paths leading to each node, and starting from all leaves, and
the number of paths leading to each node, and starting from all other nodes
Singleton paths are not counted.
- local_graph_path = <luigi.parameter.PathParameter object>#
- topological_order_dir = <luigi.parameter.PathParameter object>#
- graph_name = <luigi.parameter.Parameter object>#
- object_types = <luigi.parameter.Parameter object>#
- direction = <luigi.parameter.ChoiceParameter object>#
- property resources#
Return the estimated RAM use of this task.
- class swh.graph.luigi.topology.PathCountsParquetToS3(*args, **kwargs)[source]#
Bases:
_ParquetToS3ToAthenaTask
Reads the CSV from
CountPaths
, converts it to ORC, upload the ORC to S3, and create an Athena table for it.- topological_order_dir = <luigi.parameter.PathParameter object>#
- object_types = <luigi.parameter.Parameter object>#
- direction = <luigi.parameter.ChoiceParameter object>#
- dataset_name = <luigi.parameter.Parameter object>#
- s3_athena_output_location = <swh.dataset.luigi.S3PathParameter object>#
- requires() CountPaths [source]#
Returns corresponding CountPaths instance