swh.loader.mercurial.from_disk module

exception swh.loader.mercurial.from_disk.CorruptedRevision(hg_nodeid: HgNodeId)[source]

Bases: ValueError

Raised when a revision is corrupted.

class swh.loader.mercurial.from_disk.HgDirectory(data=None)[source]

Bases: swh.model.from_disk.Directory

A more practical directory.

  • creates missing parent directories

  • removes empty directories

get(path: bytes, default: Optional[swh.loader.mercurial.from_disk.T] = None)Optional[Union[swh.model.from_disk.Content, swh.loader.mercurial.from_disk.HgDirectory, swh.loader.mercurial.from_disk.T]][source]

Return the value for key if key is in the dictionary, else default.

class swh.loader.mercurial.from_disk.HgLoaderFromDisk(storage: swh.storage.interface.StorageInterface, url: str, directory: Optional[str] = None, logging_class: str = 'swh.loader.mercurial.LoaderFromDisk', visit_date: Optional[datetime.datetime] = None, temp_directory: str = '/tmp', clone_timeout_seconds: int = 7200, content_cache_size: int = 10000, max_content_size: Optional[int] = None)[source]

Bases: swh.loader.core.loader.BaseLoader

Load a mercurial repository from a local repository.

Mercurial’s branching model is more complete than Git’s; it allows for multiple heads per branch, closed heads and bookmarks. The following mapping is used to represent the branching state of a Mercurial project in a given snapshot:

  • HEAD (optional) either the node pointed by the @ bookmark or the tip of the default branch

  • branch-tip/<branch-name> (required) the first head of the branch, sorted by nodeid if there are multiple heads.

  • bookmarks/<bookmark_name> (optional) holds the bookmarks mapping if any

  • branch-heads/<branch_name>/0..n (optional) for any branch with multiple open heads, list all open heads

  • branch-closed-heads/<branch_name>/0..n (optional) for any branch with at least one closed head, list all closed heads

  • tags/<tag-name> (optional) record tags

The format is not ambiguous regardless of branch name since we know it ends with a /<index>, as long as we have a stable sorting of the heads (we sort by nodeid). There may be some overlap between the refs, but it’s simpler not to try to figure out de-duplication. However, to reduce the redundancy between snapshot branches in the most common case, when a branch has a single open head, it will only be referenced as branch-tip/<branch-name>. The branch-heads/ hierarchy only appears when a branch has multiple open heads, which we consistently sort by increasing nodeid. The branch-closed-heads/ hierarchy is also sorted by increasing nodeid.

Initialize the loader.

Parameters
  • url – url of the repository.

  • directory – directory of the local repository.

  • logging_class – class of the loader logger.

  • visit_date – visit date of the repository

  • config – loader configuration

CONFIG_BASE_FILENAME = 'loader/mercurial'
visit_type: Optional[str] = 'hg'
visit_date: Optional[datetime.datetime]
pre_cleanup()None[source]

As a first step, will try and check for dangling data to cleanup. This should do its best to avoid raising issues.

cleanup()None[source]

Last step executed by the loader.

prepare_origin_visit()None[source]

First step executed by the loader to prepare origin and visit references. Set/update self.origin, and optionally self.origin_url, self.visit_date.

prepare()None[source]

Second step executed by the loader to prepare some state needed by the loader.

fetch_data()bool[source]

Fetch the data from the source the loader is currently loading

Returns

a value that is interpreted as a boolean. If True, fetch_data needs to be called again to complete loading.

get_hg_revs_to_load()Union[mercurial.smartset.filteredset, mercurial.smartset._spanset][source]

Return the hg revision numbers to load.

store_data()[source]

Store fetched data in the database.

load_status()Dict[str, str][source]

Detailed loading status.

Defaults to logging an eventful load.

Returns: a dictionary that is eventually passed back as the task’s

result to the scheduler, allowing tuning of the task recurrence mechanism.

visit_status()str[source]

Allow overriding the visit status in case of partial load

get_revision_id_from_hg_nodeid(hg_nodeid: HgNodeId)bytes[source]

Return the git sha1 of a revision given its hg nodeid.

Parameters

hg_nodeid – the hg nodeid of the revision.

Returns

the sha1_git of the revision.

get_revision_parents(rev_ctx: mercurial.context.basectx)Tuple[bytes, ...][source]

Return the git sha1 of the parent revisions.

Parameters

hg_nodeid – the hg nodeid of the revision.

Returns

the sha1_git of the parent revisions.

store_revision(rev_ctx: mercurial.context.basectx)None[source]

Store a revision given its hg nodeid.

Parameters

rev_ctx – the he revision context.

Returns

the sha1_git of the stored revision.

store_release(name: bytes, target: bytes)bytes[source]

Store a release given its name and its target.

A release correspond to a user defined tag in mercurial. The mercurial api as a tip tag that must be ignored.

Parameters
  • name – name of the release.

  • target – sha1_git of the target revision.

Returns

the sha1_git of the stored release.

store_content(rev_ctx: mercurial.context.basectx, file_path: bytes)swh.model.from_disk.Content[source]

Store a revision content hg nodeid and file path.

Content is a mix of file content at a given revision and its permissions found in the changeset’s manifest.

Parameters
  • rev_ctx – the he revision context.

  • file_path – the hg path of the content.

Returns

the sha1_git of the top level directory.

store_directories(rev_ctx: mercurial.context.basectx)bytes[source]

Store a revision directories given its hg nodeid.

Mercurial as no directory as in git. A Git like tree must be build from file paths to obtain each directory hash.

Parameters

rev_ctx – the he revision context.

Returns

the sha1_git of the top level directory.

origin: Optional[swh.model.model.Origin]
origin_metadata: Dict[str, Any]
loaded_snapshot_id: Optional[bytes]
class swh.loader.mercurial.from_disk.HgArchiveLoaderFromDisk(storage: swh.storage.interface.StorageInterface, url: str, visit_date: Optional[datetime.datetime] = None, archive_path: Optional[str] = None, temp_directory: str = '/tmp', max_content_size: Optional[int] = None)[source]

Bases: swh.loader.mercurial.from_disk.HgLoaderFromDisk

Mercurial loader for repository wrapped within tarballs.

Initialize the loader.

Parameters
  • url – url of the repository.

  • directory – directory of the local repository.

  • logging_class – class of the loader logger.

  • visit_date – visit date of the repository

  • config – loader configuration

prepare()[source]

Extract the archive instead of cloning.

visit_date: Optional[datetime.datetime]
origin: Optional[swh.model.model.Origin]
origin_metadata: Dict[str, Any]
loaded_snapshot_id: Optional[bytes]