swh.loader.mercurial.loader module#

Loaders for ingesting Mercurial repositories either local from disk, or remote, see swh.loader.mercurial.loader.HgLoader or from an archive, see swh.loader.mercurial.from_disk.HgArchiveLoader.

exception swh.loader.mercurial.loader.CorruptedRevision(hg_nodeid: HgNodeId)[source]#

Bases: ValueError

Raised when a revision is corrupted.

class swh.loader.mercurial.loader.HgDirectory(data=None)[source]#

Bases: Directory

A more practical directory.

  • creates missing parent directories

  • removes empty directories

get(path: bytes, default: T | None = None) Content | HgDirectory | T | None[source]#

Return the value for key if key is in the dictionary, else default.

class swh.loader.mercurial.loader.HgLoader(storage: StorageInterface, url: str, directory: str | None = None, visit_date: datetime | None = None, temp_directory: str = '/tmp', clone_timeout_seconds: int = 7200, content_cache_size: int = 10000, **kwargs: Any)[source]#

Bases: BaseLoader

Load a mercurial repository from a local repository.

Mercurial’s branching model is more complete than Git’s; it allows for multiple heads per branch, closed heads and bookmarks. The following mapping is used to represent the branching state of a Mercurial project in a given snapshot:

  • HEAD (optional) either the node pointed by the @ bookmark or the tip of the default branch

  • branch-tip/<branch-name> (required) the first head of the branch, sorted by nodeid if there are multiple heads.

  • bookmarks/<bookmark_name> (optional) holds the bookmarks mapping if any

  • branch-heads/<branch_name>/0..n (optional) for any branch with multiple open heads, list all open heads

  • branch-closed-heads/<branch_name>/0..n (optional) for any branch with at least one closed head, list all closed heads

  • tags/<tag-name> (optional) record tags

The format is not ambiguous regardless of branch name since we know it ends with a /<index>, as long as we have a stable sorting of the heads (we sort by nodeid). There may be some overlap between the refs, but it’s simpler not to try to figure out de-duplication. However, to reduce the redundancy between snapshot branches in the most common case, when a branch has a single open head, it will only be referenced as branch-tip/<branch-name>. The branch-heads/ hierarchy only appears when a branch has multiple open heads, which we consistently sort by increasing nodeid. The branch-closed-heads/ hierarchy is also sorted by increasing nodeid.

Initialize the loader.

Parameters:
  • url – url of the repository.

  • directory – directory of the local repository.

  • logging_class – class of the loader logger.

  • visit_date – visit date of the repository

  • config – loader configuration

CONFIG_BASE_FILENAME = 'loader/mercurial'#
visit_type: str = 'hg'#
pre_cleanup() None[source]#

As a first step, will try and check for dangling data to cleanup. This should do its best to avoid raising issues.

cleanup() None[source]#

Last step executed by the loader.

prepare() None[source]#

Second step executed by the loader to prepare some state needed by the loader.

fetch_data() bool[source]#

Fetch the data from the source the loader is currently loading

Returns:

a value that is interpreted as a boolean. If True, fetch_data needs to be called again to complete loading.

get_hg_revs_to_load() Iterator[int][source]#

Yield hg revision numbers to load.

store_data()[source]#

Store fetched data in the database.

load_status() Dict[str, str][source]#

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

visit_status() str[source]#

Allow overriding the visit status in case of partial load

get_revision_id_from_hg_nodeid(hg_nodeid: HgNodeId) bytes[source]#

Return the git sha1 of a revision given its hg nodeid.

Parameters:

hg_nodeid – the hg nodeid of the revision.

Returns:

the sha1_git of the revision.

get_revision_parents(rev_ctx: basectx) Tuple[bytes, ...][source]#

Return the git sha1 of the parent revisions.

Parameters:

hg_nodeid – the hg nodeid of the revision.

Returns:

the sha1_git of the parent revisions.

store_revision(rev_ctx: basectx) None[source]#

Store a revision given its hg nodeid.

Parameters:

rev_ctx – the he revision context.

Returns:

the sha1_git of the stored revision.

store_release(name: bytes, target: bytes) bytes[source]#

Store a release given its name and its target.

A release correspond to a user defined tag in mercurial. The mercurial api as a tip tag that must be ignored.

Parameters:
  • name – name of the release.

  • target – sha1_git of the target revision.

Returns:

the sha1_git of the stored release.

store_content(rev_ctx: basectx, file_path: bytes) Content[source]#

Store a revision content hg nodeid and file path.

Content is a mix of file content at a given revision and its permissions found in the changeset’s manifest.

Parameters:
  • rev_ctx – the he revision context.

  • file_path – the hg path of the content.

Returns:

the sha1_git of the top level directory.

store_directories(rev_ctx: basectx) bytes[source]#

Store a revision directories given its hg nodeid.

Mercurial as no directory as in git. A Git like tree must be build from file paths to obtain each directory hash.

Parameters:

rev_ctx – the he revision context.

Returns:

the sha1_git of the top level directory.

class swh.loader.mercurial.loader.HgArchiveLoader(storage: StorageInterface, url: str, visit_date: datetime | None = None, archive_path: str | None = None, temp_directory: str = '/tmp', **kwargs: Any)[source]#

Bases: HgLoader

Mercurial loader for repository wrapped within tarballs.

Initialize the loader.

Parameters:
  • url – url of the repository.

  • directory – directory of the local repository.

  • logging_class – class of the loader logger.

  • visit_date – visit date of the repository

  • config – loader configuration

prepare()[source]#

Extract the archive instead of cloning.