swh.graph.luigi.subdataset module#
- class swh.graph.luigi.subdataset.SelectTopGithubOrigins(*args, **kwargs)[source]#
Bases:
Task
Writes a list of origins selected from popular Github repositories
- local_export_path = <luigi.parameter.PathParameter object>#
- num_origins = <luigi.parameter.IntParameter object>#
- query = <luigi.parameter.Parameter object>#
- class swh.graph.luigi.subdataset.ListSwhidsForSubdataset(*args, **kwargs)[source]#
Bases:
Task
Lists all SWHIDs reachable from a set of origins
- select_task = <luigi.parameter.ChoiceParameter object>#
- local_export_path = <luigi.parameter.PathParameter object>#
- grpc_api = <luigi.parameter.Parameter object>#
- class swh.graph.luigi.subdataset.CreateSubdatasetOnAthena(*args, **kwargs)[source]#
Bases:
Task
Generates an ORC export from an existing ORC export, filtering out SWHIDs not in the given list.
- local_export_path = <luigi.parameter.PathParameter object>#
- s3_parent_export_path = <swh.dataset.luigi.S3PathParameter object>#
- s3_export_path = <swh.dataset.luigi.S3PathParameter object>#
- s3_athena_output_location = <swh.dataset.luigi.S3PathParameter object>#
- athena_db_name = <luigi.parameter.Parameter object>#
- athena_parent_db_name = <luigi.parameter.Parameter object>#
- object_types = <luigi.parameter.EnumListParameter object>#
- requires() Dict[str, Task] [source]#
Returns an instance of
ListSwhidsForSubdataset
and one ofCreateAthena