swh.loader.pypi package

Submodules

swh.loader.pypi.client module

swh.loader.pypi.client._to_dict(pkginfo)[source]

Given a pkginfo parsed structure, convert it to a dict.

Parameters:pkginfo (UnpackedSDist) – The sdist parsed structure
Returns:parsed structure as a dict
swh.loader.pypi.client._project_pkginfo(dir_path)[source]
Given an uncompressed path holding the pkginfo file, returns a

pkginfo parsed structure as a dict.

The release artifact contains at their root one folder. For example: $ tar tvf zprint-0.0.6.tar.gz drwxr-xr-x root/root 0 2018-08-22 11:01 zprint-0.0.6/ …

Parameters:dir_path (str) – Path to the uncompressed directory representing a release artifact from pypi.
Returns:the pkginfo parsed structure as a dict if any or None if none was present.
class swh.loader.pypi.client.PyPIClient(base_url='https://pypi.org/pypi', temp_directory=None, cache=False, cache_dir=None)[source]

Bases: object

PyPI client in charge of discussing with the pypi server.

Parameters:
  • base_url (str) – PyPI instance’s base url
  • temp_directory (str) – Path to the temporary disk location used for uncompressing the release artifacts
  • cache (bool) – Use an internal cache to keep the archives on disk. Default is not to use it.
  • cache_dir (str) – cache’s disk location (relevant only with cache to True)
  • last 2 parameters are not for production use. (Those) –
__init__(base_url='https://pypi.org/pypi', temp_directory=None, cache=False, cache_dir=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_save_response(response, project=None)[source]

Log the response from a server request to a cache dir.

Parameters:
  • response (Response) – full server response
  • cache_dir (str) – system path for cache dir
Returns:

nothing

_save_raw(filepath)[source]

In cache mode, backup the filepath to self.cache_raw_dir

Parameters:filepath (str) – Path of the file to save
_get_raw(filepath)[source]

In cache mode, we try to retrieve the cached file.

_get(url, project=None)[source]

Get query to the url.

Parameters:url (str) – Url
Raises:ValueError in case of failing to query
Returns:Response as dict if ok
info(project_url, project=None)[source]

Given a metadata project url, retrieve the raw json response

Parameters:project_url (str) – Project’s pypi to retrieve information
Returns:Main project information as dict.
release(project, release)[source]
Given a project and a release name, retrieve the raw information
for said project’s release.
Parameters:
  • project (str) – Project’s name
  • release (dict) – Release information
Returns:

Release information as dict

prepare_release_artifacts(project, version, release_artifacts)[source]
For a given project’s release version, fetch and prepare the
associated release artifacts.
Parameters:
  • project (str) – PyPI Project
  • version (str) – Release version
  • release_artifacts ([dict]) – List of source distribution release artifacts
Yields:

tuple (artifact, filepath, uncompressed_path, pkginfo) where

  • artifact (dict): release artifact’s associated info
  • release (dict): release information
  • filepath (str): Local artifact’s path
  • uncompressed_archive_path (str): uncompressed archive path
  • pkginfo (dict): package information or None if none found
prepare_release_artifact(project, release, artifact)[source]
For a given release project, fetch and prepare the associated
artifact.

This: - fetches the artifact - checks the size, hashes match - uncompress the artifact locally - computes the swh hashes - returns the associated information for the artifact

Parameters:
  • project (str) – Project’s name
  • release (dict) – Release information
  • artifact (dict) – Release artifact information
Returns:

  • release (dict): Release information (name, message)
  • artifact (dict): release artifact’s information
  • filepath (str): Local artifact’s path
  • uncompressed_archive_path (str): uncompressed archive path
  • pkginfo (dict): package information or None if none found

Return type:

tuple (artifact, filepath, uncompressed_path, pkginfo) where

__dict__ = mappingproxy({'_save_response': <function PyPIClient._save_response>, '__doc__': "PyPI client in charge of discussing with the pypi server.\n\n Args:\n base_url (str): PyPI instance's base url\n temp_directory (str): Path to the temporary disk location used\n for uncompressing the release artifacts\n\n cache (bool): Use an internal cache to keep the archives on\n disk. Default is not to use it.\n cache_dir (str): cache's disk location (relevant only with\n `cache` to True)\n\n Those last 2 parameters are not for production use.\n\n ", '_get_raw': <function PyPIClient._get_raw>, '__module__': 'swh.loader.pypi.client', '__weakref__': <attribute '__weakref__' of 'PyPIClient' objects>, '_get': <function PyPIClient._get>, 'prepare_release_artifact': <function PyPIClient.prepare_release_artifact>, '_save_raw': <function PyPIClient._save_raw>, 'info': <function PyPIClient.info>, 'release': <function PyPIClient.release>, '__dict__': <attribute '__dict__' of 'PyPIClient' objects>, '__init__': <function PyPIClient.__init__>, 'prepare_release_artifacts': <function PyPIClient.prepare_release_artifacts>})
__module__ = 'swh.loader.pypi.client'
__weakref__

list of weak references to the object (if defined)

class swh.loader.pypi.client.PyPIProject(client, project, project_metadata_url, data=None)[source]

Bases: object

PyPI project representation

This allows to extract information for a given project: - either its latest information (from the latest release) - either for a given release version - uncompress associated fetched release artifacts

This also fetches and uncompresses the associated release artifacts.

__init__(client, project, project_metadata_url, data=None)[source]

Initialize self. See help(type(self)) for accurate signature.

_data(release_name=None)[source]

Fetch data per release and cache it. Returns the cache retrieved data if already fetched.

info(release_name=None)[source]

Compute release information for provided release (or latest one).

_filter_release_artifacts(version, releases, known_artifacts=None)[source]

Filter not already known sdist (source distribution) release.

There can be multiple ‘package_type’ (sdist, bdist_egg, bdist_wheel, bdist_rpm, bdist_msi, bdist_wininst, …), we are only interested in source distribution (sdist), others bdist* are binary

Parameters:
  • version (str) – Release name or version
  • releases (dict/[dict]) – Full release object (or a list of)
  • known_artifacts ([tuple]) – List of known releases (tuple filename, sha256)
Yields:

an unknown release artifact

_cleanup_release_artifacts(archive_path, directory_path)[source]

Clean intermediary files which no longer needs to be present.

all_release_artifacts()[source]

Generate a mapping of releases to their artifacts

default_release()[source]

Return the version number of the default release, as would be installed by pip install

__dict__ = mappingproxy({'__doc__': 'PyPI project representation\n\n This allows to extract information for a given project:\n - either its latest information (from the latest release)\n - either for a given release version\n - uncompress associated fetched release artifacts\n\n This also fetches and uncompresses the associated release\n artifacts.\n\n ', '_cleanup_release_artifacts': <function PyPIProject._cleanup_release_artifacts>, '__module__': 'swh.loader.pypi.client', '_data': <function PyPIProject._data>, '__weakref__': <attribute '__weakref__' of 'PyPIProject' objects>, '_filter_release_artifacts': <function PyPIProject._filter_release_artifacts>, 'info': <function PyPIProject.info>, 'all_release_artifacts': <function PyPIProject.all_release_artifacts>, '__dict__': <attribute '__dict__' of 'PyPIProject' objects>, '__init__': <function PyPIProject.__init__>, 'default_release': <function PyPIProject.default_release>, 'download_new_releases': <function PyPIProject.download_new_releases>})
__module__ = 'swh.loader.pypi.client'
__weakref__

list of weak references to the object (if defined)

download_new_releases(known_artifacts)[source]

Fetch metadata/data per release (if new release artifact detected)

For new release artifact, this: - downloads and uncompresses the release artifacts. - yields the (release info, author info, release, dir_path) - Clean up the intermediary fetched artifact files

Parameters:

known_artifacts (tuple) – artifact name, artifact sha256 hash

Yields:

tuple (version, release_info, release, uncompressed_path) where

  • project_info (dict): release’s associated version info
  • author (dict): Author information for the release
  • artifact (dict): Release artifact information
  • release (dict): release metadata
  • uncompressed_path (str): Path to uncompressed artifact

swh.loader.pypi.converters module

swh.loader.pypi.converters.info(data)[source]
Given a dict of a PyPI project information, returns a project
subset.
Parameters:data (dict) – Representing either artifact information or release information.
Returns:A dict subset of project information.
swh.loader.pypi.converters.author(data)[source]
Given a dict of project/release artifact information (coming from
PyPI), returns an author subset.
Parameters:data (dict) – Representing either artifact information or release information.
Returns:swh-model dict representing a person.

swh.loader.pypi.loader module

class swh.loader.pypi.loader.PyPILoader(client=None)[source]

Bases: swh.loader.core.loader.BufferedLoader

CONFIG_BASE_FILENAME = 'loader/pypi'
ADDITIONAL_CONFIG = {'cache': ('bool', False), 'cache_dir': ('str', ''), 'debug': ('bool', False), 'temp_directory': ('str', '/tmp/swh.loader.pypi/')}
__init__(client=None)[source]

Initialize self. See help(type(self)) for accurate signature.

pre_cleanup()[source]

To prevent disk explosion if some other workers exploded in mid-air (OOM killed), we try and clean up dangling files.

cleanup()[source]

Clean up temporary disk use

prepare_origin_visit(project_name, project_url, project_metadata_url=None)[source]

Prepare the origin visit information

Parameters:
  • project_name (str) – Project’s simple name
  • project_url (str) – Project’s main url
  • project_metadata_url (str) – Project’s metadata url
_known_artifacts(last_snapshot)[source]

Retrieve the known releases/artifact for the origin_id.

Args
snapshot (dict): Last snapshot for the visit
Returns:list of (filename, sha256) tuples.
_last_snapshot()[source]

Retrieve the last snapshot

prepare(project_name, project_url, project_metadata_url=None)[source]
Keep reference to the origin url (project) and the
project metadata url
Parameters:
  • project_name (str) – Project’s simple name
  • project_url (str) – Project’s main url
  • project_metadata_url (str) – Project’s metadata url
_prepare_state()[source]

Initialize internal state (snapshot, contents, directories, etc…)

This is called from prepare method.

fetch_data()[source]
Called once per release artifact version (can be many for one
release).

This will for each call: - retrieve a release artifact (associated to a release version) - Uncompress it and compute the necessary information - Computes the swh objects

Returns:True as long as data to fetch exist
target_from_artifact(filename, sha256)[source]
generate_and_load_snapshot()[source]
store_data()[source]

(override) This sends collected objects to storage.

load_status()[source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s
result to the scheduler, allowing tuning of the task recurrence mechanism.
visit_status()[source]

Detailed visit status.

Defaults to logging a full visit.

__abstractmethods__ = frozenset()
__module__ = 'swh.loader.pypi.loader'
_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 116
_abc_registry = <_weakrefset.WeakSet object>

swh.loader.pypi.tasks module

Module contents