swh.loader.package.deposit.loader module#

swh.loader.package.deposit.loader.now() datetime[source]#
swh.loader.package.deposit.loader.build_branch_name(version: str) str[source]#

Build a branch name from a version number.

There is no “branch name” concept in a deposit, so we artificially create a name by prefixing the slugified version number of the repository with deposit/. This could lead to duplicate branch names, if you need a unique branch name use the generate_branch_name method of the loader as it keeps track of the branches names previously issued.

Parameters:

version – a version number

Returns:

A branch name

class swh.loader.package.deposit.loader.DepositPackageInfo(url: str, version: str, filename: str, author_date: datetime, commit_date: datetime, client: str, id: int, collection: str, author: Person, committer: Person, release_notes: str | None, *, directory_extrinsic_metadata: List[RawExtrinsicMetadataCore] = [], checksums: Dict[str, str] = {})[source]#

Bases: BasePackageInfo

Method generated by attrs for class DepositPackageInfo.

author_date#

dateCreated if any, deposit completed_date otherwise

Type:

codemeta

commit_date#

datePublished if any, deposit completed_date otherwise

Type:

codemeta

id#

Internal ID of the deposit in the deposit DB

collection#

The collection in the deposit; see SWORD specification.

classmethod from_metadata(metadata: Dict[str, Any], url: str, filename: str, version: str) DepositPackageInfo[source]#
extid() None[source]#

Returns a unique intrinsic identifier of this package info, or None if this package info is not ‘deduplicatable’ (meaning that we will always load it, instead of checking the ExtID storage to see if we already did)

class swh.loader.package.deposit.loader.DepositLoader(storage: StorageInterface, url: str, deposit_id: str, deposit_client: ApiClient, default_filename: str = 'archive.tar', **kwargs: Any)[source]#

Bases: PackageLoader[DepositPackageInfo]

Load a deposited artifact into swh archive.

Constructor

Parameters:
  • url – Origin url to associate the artifacts/metadata to

  • deposit_id – Deposit identity

  • deposit_client – Deposit api client

visit_type: str = 'deposit'#
classmethod from_configfile(**kwargs: Any)[source]#

Instantiate a loader from the configuration loaded from the SWH_CONFIG_FILENAME envvar, with potential extra keyword arguments if their value is not None.

Parameters:

kwargs – kwargs passed to the loader instantiation

get_versions() Sequence[str][source]#

A list of versions from the list of releases.

Returns:

A list of versions

get_default_version() str[source]#

The default version is the latest release.

Returns:

A version number

generate_branch_name(version: str) str[source]#

Generate a unique branch name from a version number.

Previously generated branch names are stored in the _branch_names property. If version leads to a non unique branch name for this deposit we add a /n suffix to the branch name, where n is a number.

Example

loader.generate_branch_name(“ABC”) # returns “deposit/abc” loader.generate_branch_name(“abc”) # returns “deposit/abc/1” loader.generate_branch_name(“a$b$c”) # returns “deposit/abc/2” loader.generate_branch_name(“def”) # returns “deposit/def”

Parameters:

version – a version number

Returns:

A unique branch name

get_default_branch_name() str[source]#

The branch name of the default version.

Returns:

A branch name

get_metadata_authority() MetadataAuthority[source]#

For package loaders that get extrinsic metadata, returns the authority the metadata are coming from.

get_metadata_fetcher() MetadataFetcher[source]#

Returns a MetadataFetcher instance representing this package loader; which is used to for adding provenance information to extracted extrinsic metadata, if any.

get_package_info(version: str) Iterator[Tuple[str, DepositPackageInfo]][source]#

Get package info

First we look for the version matching the branch name, then we fetch metadata from the deposit server and build DepositPackageInfo with it.

Parameters:

version – a branch name.

Yields:

Package infos.

download_package(p_info: DepositPackageInfo, tmpdir: str) List[Tuple[str, Mapping]][source]#

Override to allow use of the dedicated deposit client

build_release(p_info: DepositPackageInfo, uncompressed_path: str, directory: bytes) Release | None[source]#

Build the release from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.

Parameters:
  • p_info – Package information

  • uncompressed_path – Artifact uncompressed path on disk

get_extrinsic_origin_metadata() List[RawExtrinsicMetadataCore][source]#

Returns metadata items, used by build_extrinsic_origin_metadata.

load() Dict[source]#

Load for a specific origin the associated contents.

  1. Get the list of versions in an origin.

  2. Get the snapshot from the previous run of the loader, and filter out versions that were already loaded, if their extids match

Then, for each remaining version in the origin

  1. Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.

  2. Extract the downloaded files. By default, this would be a universal archive/tarball extraction.

    Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).

  3. Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.

  4. Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).

    On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.

  5. Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.

end for each

  1. Generate and load the snapshot for the visit

Using the revisions/releases collected at step 7., and the branch information from step 2., generate a snapshot and load it into the Software Heritage archive

finalize_visit(status_visit: str, snapshot: Snapshot | None, errors: List[str] | None = None, **kwargs) Dict[str, Any][source]#

Finalize the visit:

  • flush eventual unflushed data to storage

  • update origin visit’s status

  • return the task’s status

swh.loader.package.deposit.loader.parse_author(author) Person[source]#

See prior fixme

class swh.loader.package.deposit.loader.ApiClient(url, auth: Mapping[str, str] | None)[source]#

Bases: object

Private Deposit Api client

do(method: str, url: str, *args, **kwargs)[source]#
Internal method to deal with requests, possibly with basic http

authentication.

Parameters:

method (str) – supported http methods as in get/post/put

Returns:

The request’s execution output

archive_get(deposit_id: int | str, tmpdir: str, filename: str) Tuple[str, Dict][source]#

Retrieve deposit’s archive artifact locally

metadata_get(deposit_id: int | str) Dict[str, Any][source]#

Retrieve deposit’s metadata artifact as json

The result of this API call is cached.

Parameters:

deposit_id – a deposit id

Returns:

A dict of metadata

Raises:

ValueError – something when wrong with the metadata API.

releases_get(deposit_id: int | str) List[Dict[str, Any]][source]#

Retrieve the list of releases related to this deposit.

The result of this API call is cached.

Parameters:

deposit_id – a deposit id

Returns:

A list of deposits

Raises:

ValueError – something when wrong with the releases API.

status_update(deposit_id: int | str, status: str, errors: List[str] | None = None, release_id: str | None = None, directory_id: str | None = None, snapshot_id: str | None = None, origin_url: str | None = None)[source]#

Update deposit’s information including status, and persistent identifiers result of the loading.