swh.loader.mercurial.loader module

This document contains a SWH loader for ingesting repository data from Mercurial version 2 bundle files.

exception swh.loader.mercurial.loader.CommandErrorWrapper(err: Optional[bytes])[source]

Bases: Exception

This exception is raised in place of a ‘CommandError’ exception (raised by the underlying hglib library)

This is needed because billiard.Queue is serializing the queued object and as CommandError doesn’t have a constructor without parameters, the deserialization is failing

exception swh.loader.mercurial.loader.CloneTimeoutError[source]

Bases: Exception

class swh.loader.mercurial.loader.HgBundle20Loader(storage: swh.storage.interface.StorageInterface, url: str, visit_date: Optional[datetime.datetime] = None, directory: Optional[str] = None, logging_class='swh.loader.mercurial.Bundle20Loader', bundle_filename: Optional[str] = 'HG20_none_bundle', reduce_effort: bool = False, temp_directory: str = '/tmp', cache1_size: int = 838860800, cache2_size: int = 838860800, clone_timeout_seconds: int = 7200, save_data_path: Optional[str] = None, max_content_size: Optional[int] = None)[source]

Bases: swh.loader.core.loader.DVCSLoader

Mercurial loader able to deal with remote or local repository.

visit_type: Optional[str] = 'hg'
visit_date: Optional[datetime.datetime]
pre_cleanup()[source]

Cleanup potential dangling files from prior runs (e.g. OOM killed tasks)

cleanup()[source]

Clean temporary working directory

get_heads(repo)[source]

Read the closed branches heads (branch, bookmarks) and returns a dict with key the branch_name (bytes) and values the tuple (pointer nature (bytes), mercurial’s node id (bytes)). Those needs conversion to swh-ids. This is taken care of in get_revisions.

prepare_origin_visit()None[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

static clone_with_timeout(log, origin, destination, timeout)[source]
prepare()[source]
Prepare the necessary steps to load an actual remote or local

repository.

To load a local repository, pass the optional directory parameter as filled with a path to a real local folder.

To load a remote repository, pass the optional directory parameter as None.

Parameters
  • origin_url (str) – Origin url to load

  • visit_date (str/datetime) – Date of the visit

  • directory (str/None) – The local directory to load

has_contents()[source]

Checks whether we need to load contents

has_directories()[source]

Checks whether we need to load directories

has_revisions()[source]

Checks whether we need to load revisions

has_releases()[source]

Checks whether we need to load releases

fetch_data()[source]

Fetch the data from the data source.

get_contents()Iterable[swh.model.model.BaseContent][source]

Get the contents that need to be loaded.

load_directories()[source]

This is where the work is done to convert manifest deltas from the repository bundle into SWH directories.

get_directories()Iterable[swh.model.model.Directory][source]

Compute directories to load

get_revisions()Iterable[swh.model.model.Revision][source]

Compute revisions to load

get_releases()Iterable[swh.model.model.Release][source]

Get the releases that need to be loaded.

get_snapshot()swh.model.model.Snapshot[source]

Get the snapshot that need to be loaded.

store_data()None[source]

Store fetched data in the database.

Should call the maybe_load_xyz() methods, which handle the bundles sent to storage, rather than send directly.

get_fetch_history_result()[source]

Return the data to store in fetch_history.

load_status()[source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

origin: Optional[swh.model.model.Origin]
origin_metadata: Dict[str, Any]
loaded_snapshot_id: Optional[bytes]
class swh.loader.mercurial.loader.HgArchiveBundle20Loader(storage: swh.storage.interface.StorageInterface, url: str, visit_date: Optional[datetime.datetime] = None, archive_path=None, temp_directory: str = '/tmp', max_content_size: Optional[int] = None)[source]

Bases: swh.loader.mercurial.loader.HgBundle20Loader

Mercurial loader for repository wrapped within archives.

prepare()[source]
Prepare the necessary steps to load an actual remote or local

repository.

To load a local repository, pass the optional directory parameter as filled with a path to a real local folder.

To load a remote repository, pass the optional directory parameter as None.

Parameters
  • origin_url (str) – Origin url to load

  • visit_date (str/datetime) – Date of the visit

  • directory (str/None) – The local directory to load

visit_date: Optional[datetime.datetime]
origin: Optional[swh.model.model.Origin]
origin_metadata: Dict[str, Any]
loaded_snapshot_id: Optional[bytes]
heads: Dict[bytes, Any]
releases: Dict[bytes, Any]
last_snapshot_id: Optional[bytes]