swh.loader.package.deposit.loader module#
- class swh.loader.package.deposit.loader.DepositPackageInfo(url: str, version: str, filename: str, author_date: datetime, commit_date: datetime, client: str, id: int, collection: str, author: Person, committer: Person, release_notes: str | None, *, directory_extrinsic_metadata: List[RawExtrinsicMetadataCore] = [], checksums: Dict[str, str] = {})[source]#
Bases:
BasePackageInfo
Method generated by attrs for class DepositPackageInfo.
- author_date#
dateCreated if any, deposit completed_date otherwise
- Type:
codemeta
- commit_date#
datePublished if any, deposit completed_date otherwise
- Type:
codemeta
- id#
Internal ID of the deposit in the deposit DB
- collection#
The collection in the deposit; see SWORD specification.
- class swh.loader.package.deposit.loader.DepositLoader(storage: StorageInterface, url: str, deposit_id: str, deposit_client: ApiClient, default_filename: str = 'archive.tar', **kwargs: Any)[source]#
Bases:
PackageLoader
[DepositPackageInfo
]Load a deposited artifact into swh archive.
Constructor
- Parameters:
url – Origin url to associate the artifacts/metadata to
deposit_id – Deposit identity
deposit_client – Deposit api client
- classmethod from_configfile(**kwargs: Any)[source]#
Instantiate a loader from the configuration loaded from the SWH_CONFIG_FILENAME envvar, with potential extra keyword arguments if their value is not None.
- Parameters:
kwargs – kwargs passed to the loader instantiation
- get_versions() Sequence[str] [source]#
Return the list of all published package versions.
- Raises:
class – swh.loader.exception.NotFound error when failing to read the published package versions.
- Returns:
Sequence of published versions
- get_metadata_authority() MetadataAuthority [source]#
For package loaders that get extrinsic metadata, returns the authority the metadata are coming from.
- get_metadata_fetcher() MetadataFetcher [source]#
Returns a MetadataFetcher instance representing this package loader; which is used to for adding provenance information to extracted extrinsic metadata, if any.
- get_package_info(version: str) Iterator[Tuple[str, DepositPackageInfo]] [source]#
- Given a release version of a package, retrieve the associated
package information for such version.
- Parameters:
version – Package version
- Returns:
(branch name, package metadata)
- download_package(p_info: DepositPackageInfo, tmpdir: str) List[Tuple[str, Mapping]] [source]#
Override to allow use of the dedicated deposit client
- build_release(p_info: DepositPackageInfo, uncompressed_path: str, directory: bytes) Release | None [source]#
Build the release from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.
- Parameters:
p_info – Package information
uncompressed_path – Artifact uncompressed path on disk
- get_extrinsic_origin_metadata() List[RawExtrinsicMetadataCore] [source]#
Returns metadata items, used by build_extrinsic_origin_metadata.
- load() Dict [source]#
Load for a specific origin the associated contents.
Get the list of versions in an origin.
Get the snapshot from the previous run of the loader, and filter out versions that were already loaded, if their extids match
Then, for each remaining version in the origin
Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.
Extract the downloaded files. By default, this would be a universal archive/tarball extraction.
Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).
Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.
Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).
On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.
Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.
end for each
Generate and load the snapshot for the visit
Using the revisions/releases collected at step 7., and the branch information from step 2., generate a snapshot and load it into the Software Heritage archive
- class swh.loader.package.deposit.loader.ApiClient(url, auth: Mapping[str, str] | None)[source]#
Bases:
object
Private Deposit Api client
- do(method: str, url: str, *args, **kwargs)[source]#
- Internal method to deal with requests, possibly with basic http
authentication.
- Parameters:
method (str) – supported http methods as in get/post/put
- Returns:
The request’s execution output
- archive_get(deposit_id: int | str, tmpdir: str, filename: str) Tuple[str, Dict] [source]#
Retrieve deposit’s archive artifact locally
- metadata_get(deposit_id: int | str) Dict[str, Any] [source]#
Retrieve deposit’s metadata artifact as json
- status_update(deposit_id: int | str, status: str, errors: List[str] | None = None, release_id: str | None = None, directory_id: str | None = None, snapshot_id: str | None = None, origin_url: str | None = None)[source]#
Update deposit’s information including status, and persistent identifiers result of the loading.