swh.lister.nixguix.lister module#

NixGuix lister definition.

This lists artifacts out of manifest for Guix or Nixpkgs manifests.

Artifacts can be of types: - upstream git repository (NixOS/nixpkgs, Guix) - VCS repositories (svn, git, hg, …) - unique file - unique tarball

exception swh.lister.nixguix.lister.ArtifactNatureUndetected[source]#

Bases: ValueError

Raised when a remote artifact’s nature (tarball, file) cannot be detected.

exception swh.lister.nixguix.lister.ArtifactNatureMistyped[source]#

Bases: ValueError

Raised when a remote artifact is neither a tarball nor a file.

Error of this type are’ probably a misconfiguration in the manifest generation that badly typed a vcs repository.

exception swh.lister.nixguix.lister.ArtifactWithoutExtension[source]#

Bases: ValueError

Raised when an artifact nature cannot be determined by its name.

class swh.lister.nixguix.lister.ChecksumsComputation(value)[source]#

Bases: Enum

The possible artifact types listed out of the manifest.

STANDARD = 'standard'#

Standard checksums (e.g. sha1, sha256, …) on the tarball or file.

NAR = 'nar'#

The hash is computed over the NAR archive dump of the output (e.g. uncompressed directory.)

swh.lister.nixguix.lister.MAPPING_CHECKSUMS_COMPUTATION = {'flat': ChecksumsComputation.STANDARD, 'recursive': ChecksumsComputation.NAR}#

Mapping between the outputHashMode from the manifest and how to compute checksums.

class swh.lister.nixguix.lister.Artifact(origin: str, visit_type: str, fallback_urls: List[str], checksums: Dict[str, str], checksums_computation: ChecksumsComputation)[source]#

Bases: object

Metadata information on Remote Artifact with url (tarball or file).

origin: str#

Canonical url retrieve the tarball artifact.

visit_type: str#

Either ‘tar’ or ‘file’

fallback_urls: List[str]#

List of urls to retrieve tarball artifact if canonical url no longer works.

checksums: Dict[str, str]#

Integrity hash converted into a checksum dict.

checksums_computation: ChecksumsComputation#

Checksums computation mode to provide to loaders (e.g. nar, standard, …)

class swh.lister.nixguix.lister.VCS(origin: str, type: str, ref: Optional[str] = None)[source]#

Bases: object

Metadata information on VCS.

origin: str#

Origin url of the vcs

type: str#

Type of (d)vcs, e.g. svn, git, hg, …

ref: Optional[str] = None#

Reference either a svn commit id, a git commit, …

class swh.lister.nixguix.lister.ArtifactType(value)[source]#

Bases: Enum

The possible artifact types listed out of the manifest.

ARTIFACT = 'artifact'#
VCS = 'vcs'#
swh.lister.nixguix.lister.url_endswith(urlparsed, extensions: List[str], raise_when_no_extension: bool = True) bool[source]#

Determine whether urlparsed ends with one of the extensions passed as parameter.

This also account for the edge case of a filename with only a version as name (so no extension in the end.)

  • ArtifactWithoutExtension in case no extension is available and

  • raise_when_no_extension is True (the default)

swh.lister.nixguix.lister.is_tarball(urls: List[str], request: Optional[Any] = None) Tuple[bool, str][source]#

Determine whether a list of files actually are tarballs or simple files.

When this cannot be answered simply out of the url, when request is provided, this executes a HTTP HEAD query on the url to determine the information. If request is not provided, this raises an ArtifactNatureUndetected exception.


urls – name of the remote files for which the extension needs to be checked.

  • ArtifactNatureUndetected when the artifact's nature cannot be detected out – of its url

  • ArtifactNatureMistyped when the artifact is not a tarball nor a file. It's up to – the caller to do what’s right with it.

Returns: A tuple (bool, url). The boolean represents whether the url is an archive

or not. The second parameter is the actual url once the head request is issued as a fallback of not finding out whether the urls are tarballs or not.

class swh.lister.nixguix.lister.NixGuixLister(scheduler, url: str, origin_upstream: str, instance: Optional[str] = None, credentials: Optional[Dict[str, Dict[str, List[Dict[str, str]]]]] = None, max_origins_per_page: Optional[int] = None, max_pages: Optional[int] = None, enable_origins: bool = True, canonicalize: bool = True, extensions_to_ignore: List[str] = [], **kwargs: Any)[source]#

Bases: StatelessLister[Tuple[ArtifactType, Union[Artifact, VCS]]]

List Guix or Nix sources out of a public json manifest.

This lister can output: - unique tarball (.tar.gz, .tbz2, …) - vcs repositories (e.g. git, hg, svn) - unique file (.lisp, .py, …)

Note that no last_update is available in either manifest.

For url types artifacts, this tries to determine the artifact’s nature, tarball or file. It first tries to compute out of the “url” extension. In case of no extension, it fallbacks to query (HEAD) the url to retrieve the origin out of the Location response header, and then checks the extension again.

Optionally, when the extension_to_ignore parameter is provided, it extends the default extensions to ignore (DEFAULT_EXTENSIONS_TO_IGNORE) with those passed. This can be used to drop further binary files detected in the wild.

LISTER_NAME: str = 'nixguix'#
build_artifact(artifact_url: str, artifact_type: str, artifact_ref: Optional[str] = None) Optional[Tuple[ArtifactType, VCS]][source]#

Build a canonicalized vcs artifact when possible.

get_pages() Iterator[Tuple[ArtifactType, Union[Artifact, VCS]]][source]#

Yield one page per “typed” origin referenced in manifest.

vcs_to_listed_origin(artifact: VCS) Iterator[ListedOrigin][source]#

Given a vcs repository, yield a ListedOrigin.

artifact_to_listed_origin(artifact: Artifact) Iterator[ListedOrigin][source]#

Given an artifact (tarball, file), yield one ListedOrigin.

get_origins_from_page(artifact_tuple: Tuple[ArtifactType, Union[Artifact, VCS]]) Iterator[ListedOrigin][source]#

Given an artifact tuple (type, artifact), yield a ListedOrigin.