swh.loader.package.deposit.loader module#
- swh.loader.package.deposit.loader.build_branch_name(version: str) str [source]#
Build a branch name from a version number.
There is no “branch name” concept in a deposit, so we artificially create a name by prefixing the slugified version number of the repository with deposit/. This could lead to duplicate branch names, if you need a unique branch name use the
generate_branch_name
method of the loader as it keeps track of the branches names previously issued.- Parameters:
version – a version number
- Returns:
A branch name
- class swh.loader.package.deposit.loader.DepositPackageInfo(url: str, version: str, filename: str, author_date: datetime, commit_date: datetime, client: str, id: int, collection: str, author: Person, committer: Person, release_notes: str | None, *, directory_extrinsic_metadata: List[RawExtrinsicMetadataCore] = [], checksums: Dict[str, str] = {})[source]#
Bases:
BasePackageInfo
Method generated by attrs for class DepositPackageInfo.
- author_date#
dateCreated if any, deposit completed_date otherwise
- Type:
codemeta
- commit_date#
datePublished if any, deposit completed_date otherwise
- Type:
codemeta
- id#
Internal ID of the deposit in the deposit DB
- collection#
The collection in the deposit; see SWORD specification.
- class swh.loader.package.deposit.loader.DepositLoader(storage: StorageInterface, url: str, deposit_id: str, deposit_client: ApiClient, default_filename: str = 'archive.tar', **kwargs: Any)[source]#
Bases:
PackageLoader
[DepositPackageInfo
]Load a deposited artifact into swh archive.
Constructor
- Parameters:
url – Origin url to associate the artifacts/metadata to
deposit_id – Deposit identity
deposit_client – Deposit api client
- classmethod from_configfile(**kwargs: Any)[source]#
Instantiate a loader from the configuration loaded from the SWH_CONFIG_FILENAME envvar, with potential extra keyword arguments if their value is not None.
- Parameters:
kwargs – kwargs passed to the loader instantiation
- get_versions() Sequence[str] [source]#
A list of versions from the list of releases.
- Returns:
A list of versions
- get_default_version() str [source]#
The default version is the latest release.
- Returns:
A version number
- generate_branch_name(version: str) str [source]#
Generate a unique branch name from a version number.
Previously generated branch names are stored in the
_branch_names
property. Ifversion
leads to a non unique branch name for this deposit we add a /n suffix to the branch name, where n is a number.Example
loader.generate_branch_name(“ABC”) # returns “deposit/abc” loader.generate_branch_name(“abc”) # returns “deposit/abc/1” loader.generate_branch_name(“a$b$c”) # returns “deposit/abc/2” loader.generate_branch_name(“def”) # returns “deposit/def”
- Parameters:
version – a version number
- Returns:
A unique branch name
- get_default_branch_name() str [source]#
The branch name of the default version.
- Returns:
A branch name
- get_metadata_authority() MetadataAuthority [source]#
For package loaders that get extrinsic metadata, returns the authority the metadata are coming from.
- get_metadata_fetcher() MetadataFetcher [source]#
Returns a MetadataFetcher instance representing this package loader; which is used to for adding provenance information to extracted extrinsic metadata, if any.
- get_package_info(version: str) Iterator[Tuple[str, DepositPackageInfo]] [source]#
Get package info
First we look for the version matching the branch name, then we fetch metadata from the deposit server and build DepositPackageInfo with it.
- Parameters:
version – a branch name.
- Yields:
Package infos.
- download_package(p_info: DepositPackageInfo, tmpdir: str) List[Tuple[str, Mapping]] [source]#
Override to allow use of the dedicated deposit client
- build_release(p_info: DepositPackageInfo, uncompressed_path: str, directory: bytes) Release | None [source]#
Build the release from the archive metadata (extrinsic artifact metadata) and the intrinsic metadata.
- Parameters:
p_info – Package information
uncompressed_path – Artifact uncompressed path on disk
- get_extrinsic_origin_metadata() List[RawExtrinsicMetadataCore] [source]#
Returns metadata items, used by build_extrinsic_origin_metadata.
- load() Dict [source]#
Load for a specific origin the associated contents.
Get the list of versions in an origin.
Get the snapshot from the previous run of the loader, and filter out versions that were already loaded, if their extids match
Then, for each remaining version in the origin
Fetch the files for one package version By default, this can be implemented as a simple HTTP request. Loaders with more specific requirements can override this, e.g.: the PyPI loader checks the integrity of the downloaded files; the Debian loader has to download and check several files for one package version.
Extract the downloaded files. By default, this would be a universal archive/tarball extraction.
Loaders for specific formats can override this method (for instance, the Debian loader uses dpkg-source -x).
Convert the extracted directory to a set of Software Heritage objects Using swh.model.from_disk.
Extract the metadata from the unpacked directories This would only be applicable for “smart” loaders like npm (parsing the package.json), PyPI (parsing the PKG-INFO file) or Debian (parsing debian/changelog and debian/control).
On “minimal-metadata” sources such as the GNU archive, the lister should provide the minimal set of metadata needed to populate the revision/release objects (authors, dates) as an argument to the task.
Generate the revision/release objects for the given version. From the data generated at steps 3 and 4.
end for each
Generate and load the snapshot for the visit
Using the revisions/releases collected at step 7., and the branch information from step 2., generate a snapshot and load it into the Software Heritage archive
- class swh.loader.package.deposit.loader.ApiClient(url, auth: Mapping[str, str] | None)[source]#
Bases:
object
Private Deposit Api client
- do(method: str, url: str, *args, **kwargs)[source]#
- Internal method to deal with requests, possibly with basic http
authentication.
- Parameters:
method (str) – supported http methods as in get/post/put
- Returns:
The request’s execution output
- archive_get(deposit_id: int | str, tmpdir: str, filename: str) Tuple[str, Dict] [source]#
Retrieve deposit’s archive artifact locally
- metadata_get(deposit_id: int | str) Dict[str, Any] [source]#
Retrieve deposit’s metadata artifact as json
The result of this API call is cached.
- Parameters:
deposit_id – a deposit id
- Returns:
A dict of metadata
- Raises:
ValueError – something when wrong with the metadata API.
- releases_get(deposit_id: int | str) List[Dict[str, Any]] [source]#
Retrieve the list of releases related to this deposit.
The result of this API call is cached.
- Parameters:
deposit_id – a deposit id
- Returns:
A list of deposits
- Raises:
ValueError – something when wrong with the releases API.
- status_update(deposit_id: int | str, status: str, errors: List[str] | None = None, release_id: str | None = None, directory_id: str | None = None, snapshot_id: str | None = None, origin_url: str | None = None)[source]#
Update deposit’s information including status, and persistent identifiers result of the loading.