swh.loader.mercurial.loader module#
Loaders for ingesting Mercurial repositories either local from disk, or remote, see
swh.loader.mercurial.loader.HgLoader
or from an archive, see
swh.loader.mercurial.from_disk.HgArchiveLoader
.
- exception swh.loader.mercurial.loader.CorruptedRevision(hg_nodeid: HgNodeId)[source]#
Bases:
ValueError
Raised when a revision is corrupted.
- class swh.loader.mercurial.loader.HgDirectory(data=None)[source]#
Bases:
Directory
A more practical directory.
creates missing parent directories
removes empty directories
- class swh.loader.mercurial.loader.HgLoader(storage: StorageInterface, url: str, directory: str | None = None, visit_date: datetime | None = None, temp_directory: str = '/tmp', clone_timeout_seconds: int = 7200, content_cache_size: int = 10000, **kwargs: Any)[source]#
Bases:
BaseLoader
Load a mercurial repository from a local repository.
Mercurial’s branching model is more complete than Git’s; it allows for multiple heads per branch, closed heads and bookmarks. The following mapping is used to represent the branching state of a Mercurial project in a given snapshot:
HEAD (optional) either the node pointed by the @ bookmark or the tip of the default branch
branch-tip/<branch-name> (required) the first head of the branch, sorted by nodeid if there are multiple heads.
bookmarks/<bookmark_name> (optional) holds the bookmarks mapping if any
branch-heads/<branch_name>/0..n (optional) for any branch with multiple open heads, list all open heads
branch-closed-heads/<branch_name>/0..n (optional) for any branch with at least one closed head, list all closed heads
tags/<tag-name> (optional) record tags
The format is not ambiguous regardless of branch name since we know it ends with a /<index>, as long as we have a stable sorting of the heads (we sort by nodeid). There may be some overlap between the refs, but it’s simpler not to try to figure out de-duplication. However, to reduce the redundancy between snapshot branches in the most common case, when a branch has a single open head, it will only be referenced as branch-tip/<branch-name>. The branch-heads/ hierarchy only appears when a branch has multiple open heads, which we consistently sort by increasing nodeid. The branch-closed-heads/ hierarchy is also sorted by increasing nodeid.
Initialize the loader.
- Parameters:
url – url of the repository.
directory – directory of the local repository.
logging_class – class of the loader logger.
visit_date – visit date of the repository
config – loader configuration
- CONFIG_BASE_FILENAME = 'loader/mercurial'#
- pre_cleanup() None [source]#
As a first step, will try and check for dangling data to cleanup. This should do its best to avoid raising issues.
- prepare() None [source]#
Second step executed by the loader to prepare some state needed by the loader.
- fetch_data() bool [source]#
Fetch the data from the source the loader is currently loading
- Returns:
a value that is interpreted as a boolean. If True, fetch_data needs to be called again to complete loading.
- load_status() Dict[str, str] [source]#
Detailed loading status.
Defaults to logging an eventful load.
- Returns: a dictionary that is eventually passed back as the task’s
result to the scheduler, allowing tuning of the task recurrence mechanism.
- get_revision_id_from_hg_nodeid(hg_nodeid: HgNodeId) bytes [source]#
Return the git sha1 of a revision given its hg nodeid.
- Parameters:
hg_nodeid – the hg nodeid of the revision.
- Returns:
the sha1_git of the revision.
- get_revision_parents(rev_ctx: basectx) Tuple[bytes, ...] [source]#
Return the git sha1 of the parent revisions.
- Parameters:
hg_nodeid – the hg nodeid of the revision.
- Returns:
the sha1_git of the parent revisions.
- store_revision(rev_ctx: basectx) None [source]#
Store a revision given its hg nodeid.
- Parameters:
rev_ctx – the he revision context.
- Returns:
the sha1_git of the stored revision.
- store_release(name: bytes, target: bytes) bytes [source]#
Store a release given its name and its target.
A release correspond to a user defined tag in mercurial. The mercurial api as a tip tag that must be ignored.
- Parameters:
name – name of the release.
target – sha1_git of the target revision.
- Returns:
the sha1_git of the stored release.
- store_content(rev_ctx: basectx, file_path: bytes) Content [source]#
Store a revision content hg nodeid and file path.
Content is a mix of file content at a given revision and its permissions found in the changeset’s manifest.
- Parameters:
rev_ctx – the he revision context.
file_path – the hg path of the content.
- Returns:
the sha1_git of the top level directory.
- store_directories(rev_ctx: basectx) bytes [source]#
Store a revision directories given its hg nodeid.
Mercurial as no directory as in git. A Git like tree must be build from file paths to obtain each directory hash.
- Parameters:
rev_ctx – the he revision context.
- Returns:
the sha1_git of the top level directory.
- class swh.loader.mercurial.loader.HgArchiveLoader(storage: StorageInterface, url: str, visit_date: datetime | None = None, archive_path: str | None = None, temp_directory: str = '/tmp', **kwargs: Any)[source]#
Bases:
HgLoader
Mercurial loader for repository wrapped within tarballs.
Initialize the loader.
- Parameters:
url – url of the repository.
directory – directory of the local repository.
logging_class – class of the loader logger.
visit_date – visit date of the repository
config – loader configuration