swh.datasets.luigi.impact module#

Luigi tasks to measure institutional impact#

This module contains Luigi tasks computing the impact of an institution across all origins

class swh.datasets.luigi.impact.ComputeRawImpact(*args, **kwargs)[source]#

Bases: Task

Creates a file that list all origins that contains revrels from a given set of persons, as well as the number of revrels and first/latest timestamp for each origin.

local_graph_path = <luigi.parameter.PathParameter object>#
graph_name = <luigi.parameter.Parameter object>#
persons_path = <luigi.parameter.PathParameter object>#
raw_impact_path = <luigi.parameter.PathParameter object>#
output_emails = <luigi.parameter.BoolParameter object>#
include_ranges = <luigi.parameter.Parameter object>#
exclude_ranges = <luigi.parameter.Parameter object>#
requires() Dict[str, Task][source]#

Returns an instance of swh.graph.luigi.compressed_graph.LocalGraph and swh.graph.libs.luigi.topology.ComputeGenerations.

output() List[Target][source]#

.csv.zst file that contains the origin_id<->contributor_id map and the list of origins

run() None[source]#

Runs org.softwareheritage.graph.utils.ListOriginContributors and compresses

class swh.datasets.luigi.impact.ComputeIndexedImpact(*args, **kwargs)[source]#

Bases: Task

Removes forks from ComputeRawImpact’s output, unless they contain more revrels (or older/newer ones) than the upstream origin.

indexer_storage_url = <luigi.parameter.Parameter object>#
swh_scheduler_url = <luigi.parameter.Parameter object>#
FORK_FILTERS = ['all', 'none', 'without-upstream-contribution', 'with-original-content']#
local_graph_path = <luigi.parameter.PathParameter object>#
graph_name = <luigi.parameter.Parameter object>#
persons_path = <luigi.parameter.PathParameter object>#
raw_impact_path = <luigi.parameter.PathParameter object>#
output_emails = <luigi.parameter.BoolParameter object>#
indexed_impact_path = <luigi.parameter.PathParameter object>#
fork_filter = <luigi.parameter.ChoiceParameter object>#
requires() Dict[str, Task][source]#

Returns an instance of swh.graph.luigi.compressed_graph.LocalGraph and swh.graph.libs.luigi.topology.ComputeGenerations.

output() List[Target][source]#

.csv.zst file that contains the origin_id<->contributor_id map and the list of origins

run() None[source]#

Runs org.softwareheritage.graph.utils.ListOriginContributors and compresses