swh.loader.package.deposit.loader module

class swh.loader.package.deposit.loader.DepositPackageInfo(url: str, filename: str, raw_info: Dict[str, Any], author_date: datetime.datetime, commit_date: datetime.datetime, client: str, id: int, collection: str, author: swh.model.model.Person, committer: swh.model.model.Person, revision_parents: Tuple[bytes, ...], *, directory_extrinsic_metadata: List[swh.loader.package.loader.RawExtrinsicMetadataCore] = [])[source]

Bases: swh.loader.package.loader.BasePackageInfo

Method generated by attrs for class DepositPackageInfo.


dateCreated if any, deposit completed_date otherwise




datePublished if any, deposit completed_date otherwise




Internal ID of the deposit in the deposit DB


The collection in the deposit; see SWORD specification.


Revisions created from previous deposits, that will be used as parents of the revision created for this deposit.

classmethod from_metadata(metadata: Dict[str, Any], url: str, filename: str)swh.loader.package.deposit.loader.DepositPackageInfo[source]

Returns a unique intrinsic identifier of this package info, or None if this package info is not ‘deduplicatable’ (meaning that we will always load it, instead of checking the ExtID storage to see if we already did)

class swh.loader.package.deposit.loader.DepositLoader(storage: swh.storage.interface.StorageInterface, url: str, deposit_id: str, deposit_client: swh.loader.package.deposit.loader.ApiClient, max_content_size: Optional[int] = None, default_filename: str = 'archive.tar')[source]

Bases: swh.loader.package.loader.PackageLoader[swh.loader.package.deposit.loader.DepositPackageInfo]

Load a deposited artifact into swh archive.


  • url – Origin url to associate the artifacts/metadata to

  • deposit_id – Deposit identity

  • deposit_client – Deposit api client

visit_type: Optional[str] = 'deposit'
classmethod from_configfile(**kwargs: Any)[source]

Instantiate a loader from the configuration loaded from the SWH_CONFIG_FILENAME envvar, with potential extra keyword arguments if their value is not None.


kwargs – kwargs passed to the loader instantiation


Return the list of all published package versions.


classswh.loader.exception.NotFound error when failing to read the published package versions.


Sequence of published versions


For package loaders that get extrinsic metadata, returns the authority the metadata are coming from.


Returns a MetadataFetcher instance representing this package loader; which is used to for adding provenance information to extracted extrinsic metadata, if any.

get_package_info(version: str)Iterator[Tuple[str, swh.loader.package.deposit.loader.DepositPackageInfo]][source]
Given a release version of a package, retrieve the associated

package information for such version.


version – Package version


(branch name, package metadata)

download_package(p_info: swh.loader.package.deposit.loader.DepositPackageInfo, tmpdir: str)List[Tuple[str, Mapping]][source]

Override to allow use of the dedicated deposit client

build_revision(p_info: swh.loader.package.deposit.loader.DepositPackageInfo, uncompressed_path: str, directory: bytes)Optional[swh.model.model.Revision][source]

Build the revision from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.

  • p_info – Package information

  • uncompressed_path – Artifact uncompressed path on disk


Revision object


Returns metadata items, used by build_extrinsic_origin_metadata.


Returns metadata from the deposit server


Load for a specific origin the associated contents.

  1. Get the list of versions in an origin.

  2. Get the snapshot from the previous run of the loader, and filter out versions that were already loaded, if their extids match

Then, for each remaining version in the origin

  1. Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.

  2. Extract the downloaded files. By default, this would be a universal archive/tarball extraction.

    Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).

  3. Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.

  4. Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).

    On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.

  5. Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.

end for each

  1. Generate and load the snapshot for the visit

Using the revisions/releases collected at step 7., and the branch information from step 2., generate a snapshot and load it into the Software Heritage archive

finalize_visit(status_visit: str, **kwargs)Dict[str, Any][source]

Finalize the visit:

  • flush eventual unflushed data to storage

  • update origin visit’s status

  • return the task’s status

visit_date: datetime.datetime

See prior fixme

class swh.loader.package.deposit.loader.ApiClient(url, auth: Optional[Mapping[str, str]])[source]

Bases: object

Private Deposit Api client

do(method: str, url: str, *args, **kwargs)[source]
Internal method to deal with requests, possibly with basic http



method (str) – supported http methods as in get/post/put


The request’s execution output

archive_get(deposit_id: Union[int, str], tmpdir: str, filename: str)Tuple[str, Dict][source]

Retrieve deposit’s archive artifact locally

metadata_url(deposit_id: Union[int, str])str[source]
metadata_get(deposit_id: Union[int, str])Dict[str, Any][source]

Retrieve deposit’s metadata artifact as json

status_update(deposit_id: Union[int, str], status: str, revision_id: Optional[str] = None, directory_id: Optional[str] = None, snapshot_id: Optional[str] = None, origin_url: Optional[str] = None)[source]

Update deposit’s information including status, and persistent identifiers result of the loading.