swh.graph.luigi package#
Submodules#
- swh.graph.luigi.compressed_graph module
- Luigi tasks for compression
ObjectTypesParameter
ExtractNodes
ExtractLabels
NodeStats
EdgeStats
LabelStats
Mph
Bv
BvEf
BfsRoots
Bfs
PermuteAndSimplifyBfs
BfsEf
BfsDcf
Llp
PermuteLlp
Offsets
Ef
ComposeOrders
Transpose
TransposeOffsets
TransposeEf
Maps
ExtractPersons
PersonsStats
MphPersons
NodeProperties
PthashLabels
LabelsOrder
FclLabels
EdgeLabels
EdgeLabelsTranspose
EdgeLabelsEf
EdgeLabelsTransposeEf
Stats
CompressGraph
UploadGraphToS3
DownloadGraphFromS3
LocalGraph
- swh.graph.luigi.subdataset module
SelectTopGithubOrigins
ListSwhidsForSubdataset
CreateSubdatasetOnAthena
CreateSubdatasetOnAthena.local_export_path
CreateSubdatasetOnAthena.s3_parent_export_path
CreateSubdatasetOnAthena.s3_export_path
CreateSubdatasetOnAthena.s3_athena_output_location
CreateSubdatasetOnAthena.athena_db_name
CreateSubdatasetOnAthena.athena_parent_db_name
CreateSubdatasetOnAthena.object_types
CreateSubdatasetOnAthena.requires()
CreateSubdatasetOnAthena.output()
CreateSubdatasetOnAthena.run()
- swh.graph.luigi.topology module
- Luigi tasks to analyze, and produce datasets related to, graph topology
TopoSort
ComputeGenerations
UploadGenerationsToS3
UploadGenerationsToS3.local_graph_path
UploadGenerationsToS3.topological_order_dir
UploadGenerationsToS3.dataset_name
UploadGenerationsToS3.graph_name
UploadGenerationsToS3.object_types
UploadGenerationsToS3.direction
UploadGenerationsToS3.requires()
UploadGenerationsToS3.output()
UploadGenerationsToS3.run()
CountPaths
PathCountsParquetToS3
- swh.graph.luigi.utils module
Module contents#
Luigi tasks#
This package contains Luigi tasks. These come in two kinds:
in
swh.graph.luigi.compressed_graph
: an alternative to the ‘swh graph compress’ CLI that can be composed with other tasks, such as swh-export’sin other submodules: tasks driving the creation of specific datasets that are generated using the compressed graph
The overall directory structure is:
base_dir/
<date>[_<flavor>]/
edges/
...
orc/
...
compressed/
graph.graph
graph.mph
...
meta/
export.json
compression.json
datasets/
contribution_graph.csv.zst
topology/
topological_order_dfs.csv.zst
And optionally:
sensitive_base_dir/
<date>[_<flavor>]/
persons_sha256_to_name.csv.zst
datasets/
contribution_graph.deanonymized.csv.zst