get_versions() → Sequence[str]¶
Return the list of all published package versions.
Sequence of published versions
get_package_info(version: str) → Generator[Tuple[str, Mapping[str, Any]], None, None]¶
- Given a release version of a package, retrieve the associated
package information for such version.
version – Package version
(branch name, package metadata)
build_revision(a_metadata: Dict, uncompressed_path: str, directory: bytes) → Optional[swh.model.model.Revision]¶
Build the revision from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.
a_metadata – Artifact metadata
uncompressed_path – Artifact uncompressed path on disk
SWH data dict
get_default_version() → str¶
Retrieve the latest release version if any.
last_snapshot() → Optional[swh.model.model.Snapshot]¶
Retrieve the last snapshot out of the last visit.
known_artifacts(snapshot: Optional[swh.model.model.Snapshot]) → Dict[bytes, swh.model.model.BaseModel]¶
Retrieve the known releases/artifact for the origin.
snapshot: snapshot for the visit
Dict of keys revision id (bytes), values a metadata Dict.
resolve_revision_from(known_artifacts: Dict, artifact_metadata: Dict) → Optional[bytes]¶
Resolve the revision from a snapshot and an artifact metadata dict.
If the artifact has already been downloaded, this will return the existing revision targeting that uncompressed artifact directory. Otherwise, this returns None.
snapshot – Snapshot
artifact_metadata – Information dict
None or revision identifier
download_package(p_info: Mapping[str, Any], tmpdir: str) → List[Tuple[str, Mapping]]¶
Download artifacts for a specific package. All downloads happen in in the tmpdir folder.
Default implementation expects the artifacts package info to be about one artifact per package.
Note that most implementation have 1 artifact per package. But some implementation have multiple artifacts per package (debian), some have none, the package is the artifact (gnu).
artifacts_package_info – Information on the package artifacts to download (url, filename, etc…)
tmpdir – Location to retrieve such artifacts
List of (path, computed hashes)
uncompress(dl_artifacts: List[Tuple[str, Mapping[str, Any]]], dest: str) → str¶
Uncompress the artifact(s) in the destination folder dest.
Optionally, this could need to use the p_info dict for some more information (debian).
extra_branches() → Dict[bytes, Mapping[str, Any]]¶
Return an extra dict of branches that are used to update the set of branches.
load() → Dict¶
Load for a specific origin the associated contents.
for each package version of the origin
Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.
Extract the downloaded files By default, this would be a universal archive/tarball extraction.
Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).
Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.
Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).
On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.
Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.
end for each
Generate and load the snapshot for the visit
Using the revisions/releases collected at step 5., and the branch information from step 0., generate a snapshot and load it into the Software Heritage archive