swh.loader.core.utils module#

swh.loader.core.utils.clean_dangling_folders(dirpath: str, pattern_check: str, log=None) None[source]#
Clean up potential dangling temporary working folder rooted at dirpath. Those

folders must match a dedicated pattern and not belonging to a live pid.

Parameters:
  • dirpath – Path to check for dangling files

  • pattern_check – A dedicated pattern to check on first level directory (e.g swh.loader.mercurial., swh.loader.svn.)

  • log (Logger) – Optional logger

exception swh.loader.core.utils.CloneTimeout[source]#

Bases: Exception

exception swh.loader.core.utils.CloneFailure[source]#

Bases: Exception

swh.loader.core.utils.clone_with_timeout(src: str, dest: str, clone_func: Callable[[], None], timeout: float) None[source]#

Clone a repository with timeout.

Parameters:
  • src – clone source

  • dest – clone destination

  • clone_func – callable that does the actual cloning

  • timeout – timeout in seconds

swh.loader.core.utils.parse_visit_date(visit_date: datetime | str | None) datetime | None[source]#

Convert visit date from either None, a string or a datetime to either None or datetime.

swh.loader.core.utils.compute_hashes(filepath: str, hash_names: List[str] = ['sha256']) Dict[str, str][source]#

Compute checksums dict out of a filepath

swh.loader.core.utils.get_url_body(url: str, session: Session | None = None, **extra_params) bytes[source]#

Basic HTTP client to retrieve information on software package, typically JSON metadata from a REST API.

Parameters:

url (str) – An HTTP URL

Raises:

NotFound in case of query failures (for some reasons – 404, …)

Returns:

The associated response’s information

swh.loader.core.utils.download(url: str, dest: str, hashes: Dict = {}, filename: str | None = None, auth: Tuple[str, str] | None = None, extra_request_headers: Dict[str, str] | None = None, timeout: int = 120, session: Session | None = None) Tuple[str, Dict][source]#

Download a remote file from url, and compute swh hashes on it.

Parameters:
  • url – Artifact uri to fetch and hash

  • dest – Directory to write the archive to

  • hashes – Dict of expected hashes (key is the hash algo) for the artifact to download (those hashes are expected to be hex string). The supported algorithms are defined in the swh.model.hashutil.ALGORITHMS set.

  • auth – Optional tuple of login/password (for http authentication service, e.g. deposit)

  • extra_request_headers – Optional dict holding extra HTTP headers to be sent with the request

  • timeout – Value in seconds so the connection does not hang indefinitely (read/connection timeout)

Raises:
Returns:

Tuple of (downloaded file path, hashes of downloaded file path)

swh.loader.core.utils.release_name(version: str, filename: str | None = None) str[source]#
swh.loader.core.utils.cached_method(f: Callable[[TSelf], TReturn]) Callable[[TSelf], TReturn][source]#