swh.loader.metadata.base module#

Base module for all metadata fetchers, which are called by the Git loader to get metadata from forges on origins being loaded.

exception swh.loader.metadata.base.InvalidOrigin[source]#

Bases: Exception

class swh.loader.metadata.base.BaseMetadataFetcher(origin: Origin, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None, lister_name: str, lister_instance_name: str)[source]#

Bases: object

The base class for a Software Heritage metadata fetchers

Fetchers are hooks used by loader to retrieve extrinsic metadata from forges before archiving repositories.

Each fetcher handles a specific type of forge (not VCS); each fetcher class generally matches a lister class, as they use the same APIs.

  • origin – the origin to retrieve metadata from

  • credentials – This is the same format as for swh.lister.pattern.Lister: dictionary of credentials for all fetchers. The first level identifies the fetcher’s name, the second level the lister instance. The final level is a list of dicts containing the expected credentials for the given instance of that fetcher.

  • session – optional HTTP session to use to send HTTP requests


The config-friendly name of this fetcher, used to retrieve the first level of credentials.


Set of forge types this metadata fetcher supports. The type names are the same as the names used by listers themselves.

Generally, fetchers have a one-to-one matching with listers, in which case this is set of {FETCHER_NAME}.

session() Session[source]#
metadata_authority() MetadataAuthority[source]#

Return information about the metadata authority that issued metadata we extract from the given origin

get_origin_metadata() List[RawExtrinsicMetadata][source]#

Return a list of metadata objects for the given origin.

get_parent_origins() List[Origin][source]#

If the given origin is a “forge fork” (ie. created with the “Fork” button of GitHub-like forges), returns a list of origins it was forked from; closest parent first.