swh.loader.package.deposit.loader module

class swh.loader.package.deposit.loader.DepositLoader(url: str, deposit_id: str)[source]

Bases: swh.loader.package.loader.PackageLoader

Load pypi origin’s artifact releases into swh archive.

visit_type = 'deposit'
get_versions() → Sequence[str][source]

Return the list of all published package versions.


Sequence of published versions

get_package_info(version: str) → Generator[Tuple[str, Mapping[str, Any]], None, None][source]
Given a release version of a package, retrieve the associated

package information for such version.


version – Package version


(branch name, package metadata)

download_package(p_info: Mapping[str, Any], tmpdir: str) → List[Tuple[str, Mapping]][source]

Override to allow use of the dedicated deposit client

build_revision(a_metadata: Dict, uncompressed_path: str, directory: bytes) → Optional[swh.model.model.Revision][source]

Build the revision from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.

  • a_metadata – Artifact metadata

  • uncompressed_path – Artifact uncompressed path on disk


SWH data dict

load() → Dict[source]

Load for a specific origin the associated contents.

for each package version of the origin

  1. Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.

  2. Extract the downloaded files By default, this would be a universal archive/tarball extraction.

    Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).

  3. Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.

  4. Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).

    On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.

  5. Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.

end for each

  1. Generate and load the snapshot for the visit

Using the revisions/releases collected at step 5., and the branch information from step 0., generate a snapshot and load it into the Software Heritage archive


See prior fixme

class swh.loader.package.deposit.loader.ApiClient(url, auth: Optional[Mapping[str, str]])[source]

Bases: object

Private Deposit Api client

do(method: str, url: str, *args, **kwargs)[source]
Internal method to deal with requests, possibly with basic http



method (str) – supported http methods as in get/post/put


The request’s execution output

archive_get(deposit_id: Union[int, str], tmpdir: str, filename: str) → Tuple[str, Dict][source]

Retrieve deposit’s archive artifact locally

metadata_url(deposit_id: Union[int, str]) → str[source]
metadata_get(deposit_id: Union[int, str]) → Dict[str, Any][source]

Retrieve deposit’s metadata artifact as json

status_update(deposit_id: Union[int, str], status: str, revision_id: Optional[str] = None, directory_id: Optional[str] = None, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None)[source]

Update deposit’s information including status, and persistent identifiers result of the loading.