swh.vault.cookers.git_bare module#
This cooker creates tarballs containing a bare .git directory, that can be unpacked and cloned like any git repository.
It works in three steps:
Write objects one by one in
.git/objects/
Calls
git repack
to pack all these objects into git packfiles.Creates a tarball of the resulting repository
It keeps a set of all written (or about-to-be-written) object hashes in memory to avoid downloading and writing the same objects twice.
The first step is the most complex. When swh-graph is available, this roughly does the following:
Find all the revisions and releases in the induced subgraph, adds them to todo-lists
Grab a batch from (release/revision/directory/content) todo-lists, and load them. Add directory and content objects they reference to the todo-list
If any todo-list is not empty, goto 1
When swh-graph is not available, steps 1 and 2 are merged, because revisions need to be loaded in order to compute the subgraph.
- class swh.vault.cookers.git_bare.RootObjectType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
- DIRECTORY = 'directory'#
- REVISION = 'revision'#
- RELEASE = 'release'#
- SNAPSHOT = 'snapshot'#
- swh.vault.cookers.git_bare.assert_never(value: NoReturn, msg) NoReturn [source]#
mypy makes sure this function is never called, through exhaustive checking of
value
in the parent function.See https://mypy.readthedocs.io/en/latest/literal_types.html#exhaustive-checks for details.
- class swh.vault.cookers.git_bare.GitBareCooker(*args, **kwargs)[source]#
Bases:
BaseVaultCooker
Initialize the cooker.
The type of the object represented by the id depends on the concrete class. Very likely, each type of bundle will have its own cooker class.
- Parameters:
swhid – id of the object to be cooked into a bundle.
backend – the vault backend (swh.vault.backend.VaultBackend).
- SUPPORTED_OBJECT_TYPES: ClassVar[Set[swh.model.swhids.ObjectType]] = {ObjectType.DIRECTORY, ObjectType.RELEASE, ObjectType.REVISION, ObjectType.SNAPSHOT}#
- use_fsck = True#
- obj_type: RootObjectType#
- prepare_bundle() None [source]#
Main entry point. Initializes the state, creates the bundle, and sends it to the backend.
- git_fsck() None [source]#
Runs git-fsck and ignores expected errors (eg. because of missing objects).
- write_refs(snapshot=None) None [source]#
Writes all files in
.git/refs/
.For non-snapshot objects, this is only
master
.
- object_exists(obj_id: bytes) bool [source]#
Returns whether the object identified by the given
obj_id
was already written to a file in.git/object/
.This function ignores objects contained in a git pack.
- write_object(obj_id: bytes, obj: bytes) bool [source]#
Writes a git object on disk.
Returns whether it was already written.
- push_subgraph(obj_type: RootObjectType, obj_id) None [source]#
Adds graph induced by the given
obj_id
without recursing through directories, to the todo-lists.If swh-graph is not available, this immediately loads revisions, as they need to be fetched in order to compute the subgraph, and fetching them immediately avoids duplicate fetches.
- load_objects() None [source]#
Repeatedly loads objects in the todo-lists, until all lists are empty.
- push_revision_subgraph(obj_id: bytes) None [source]#
Fetches the graph of revisions induced by the given
obj_id
and adds them toself._rev_stack
.If swh-graph is not available, this requires fetching the revisions themselves, so they are directly loaded instead.
- push_snapshot_subgraph(obj_id: bytes) None [source]#
Fetches a snapshot and all its children, excluding directories and contents, and pushes them to the todo-lists.
Also loads revisions if swh-graph is not available, see
push_revision_subgraph()
.
- load_revisions(obj_ids: List[bytes]) None [source]#
Given a list of revision ids, loads these revisions and their directories; but not their parent revisions (ie. this is not recursive).
- load_releases(obj_ids: List[bytes]) List[Release] [source]#
Loads release objects, and returns them.