swh.graph.luigi package#

Submodules#

Module contents#

Luigi tasks#

This package contains Luigi tasks. These come in two kinds:

  • in swh.graph.luigi.compressed_graph: an alternative to the ‘swh graph compress’ CLI that can be composed with other tasks, such as swh-dataset’s

  • in other submodules: tasks driving the creation of specific datasets that are generated using the compressed graph

The overall directory structure is:

base_dir/
    <date>[_<flavor>]/
        edges/
            ...
        orc/
            ...
        compressed/
            graph.graph
            graph.mph
            ...
            meta/
                export.json
                compression.json
        datasets/
            contribution_graph.csv.zst
        topology/
            topological_order_dfs.csv.zst

And optionally:

sensitive_base_dir/
    <date>[_<flavor>]/
        persons_sha256_to_name.csv.zst
        datasets/
            contribution_graph.deanonymized.csv.zst
class swh.graph.luigi.RunExportCompressUpload(*args, **kwargs)[source]#

Bases: Task

Runs dataset export, graph compression, and generates datasets using the graph.

requires() List[Task][source]#

Returns instances of swh.dataset.luigi.RunExportAll and swh.graph.luigi.compressed_graph.UploadGraphToS3, which recursively depend on the whole export and compression pipeline.

complete() bool[source]#

If the task has any outputs, return True if all outputs exist. Otherwise, return False.

However, you may freely override this method with custom logic.