swh.vault.cookers.git_bare module#

This cooker creates tarballs containing a bare .git directory, that can be unpacked and cloned like any git repository.

It works in three steps:

  1. Write objects one by one in .git/objects/

  2. Calls git repack to pack all these objects into git packfiles.

  3. Creates a tarball of the resulting repository

It keeps a set of all written (or about-to-be-written) object hashes in memory to avoid downloading and writing the same objects twice.

The first step is the most complex. When swh-graph is available, this roughly does the following:

  1. Find all the revisions and releases in the induced subgraph, adds them to todo-lists

  2. Grab a batch from (release/revision/directory/content) todo-lists, and load them. Add directory and content objects they reference to the todo-list

  3. If any todo-list is not empty, goto 1

When swh-graph is not available, steps 1 and 2 are merged, because revisions need to be loaded in order to compute the subgraph.

class swh.vault.cookers.git_bare.RootObjectType(value)[source]#

Bases: Enum

An enumeration.

DIRECTORY = 'directory'#
REVISION = 'revision'#
RELEASE = 'release'#
SNAPSHOT = 'snapshot'#
swh.vault.cookers.git_bare.assert_never(value: NoReturn, msg) NoReturn[source]#

mypy makes sure this function is never called, through exhaustive checking of value in the parent function.

See https://mypy.readthedocs.io/en/latest/literal_types.html#exhaustive-checks for details.

class swh.vault.cookers.git_bare.GitBareCooker(*args, **kwargs)[source]#

Bases: BaseVaultCooker

Initialize the cooker.

The type of the object represented by the id depends on the concrete class. Very likely, each type of bundle will have its own cooker class.

  • swhid – id of the object to be cooked into a bundle.

  • backend – the vault backend (swh.vault.backend.VaultBackend).

BUNDLE_TYPE: ClassVar[str] = 'git_bare'#
SUPPORTED_OBJECT_TYPES: ClassVar[Set[swh.model.swhids.ObjectType]] = {<ObjectType.RELEASE: 'rel'>, <ObjectType.SNAPSHOT: 'snp'>, <ObjectType.DIRECTORY: 'dir'>, <ObjectType.REVISION: 'rev'>}#
use_fsck = True#
obj_type: RootObjectType#
check_exists() bool[source]#

Returns whether the root object is present in the archive.

prepare_bundle() None[source]#

Main entry point. Initializes the state, creates the bundle, and sends it to the backend.

init_git() None[source]#

Creates an empty .git directory.

create_object_dirs() None[source]#

Creates all possible subdirectories of .git/objects/

repack() None[source]#

Moves all objects from .git/objects/ to a packfile.

git_fsck() None[source]#

Runs git-fsck and ignores expected errors (eg. because of missing objects).

write_refs(snapshot=None) None[source]#

Writes all files in .git/refs/.

For non-snapshot objects, this is only master.


Creates the final .tar file.

object_exists(obj_id: bytes) bool[source]#

Returns whether the object identified by the given obj_id was already written to a file in .git/object/.

This function ignores objects contained in a git pack.

write_object(obj_id: bytes, obj: bytes) bool[source]#

Writes a git object on disk.

Returns whether it was already written.

push_subgraph(obj_type: RootObjectType, obj_id) None[source]#

Adds graph induced by the given obj_id without recursing through directories, to the todo-lists.

If swh-graph is not available, this immediately loads revisions, as they need to be fetched in order to compute the subgraph, and fetching them immediately avoids duplicate fetches.

load_objects() None[source]#

Repeatedly loads objects in the todo-lists, until all lists are empty.

push_revision_subgraph(obj_id: bytes) None[source]#

Fetches the graph of revisions induced by the given obj_id and adds them to self._rev_stack.

If swh-graph is not available, this requires fetching the revisions themselves, so they are directly loaded instead.

push_snapshot_subgraph(obj_id: bytes) None[source]#

Fetches a snapshot and all its children, excluding directories and contents, and pushes them to the todo-lists.

Also loads revisions if swh-graph is not available, see push_revision_subgraph().

load_revisions(obj_ids: List[bytes]) None[source]#

Given a list of revision ids, loads these revisions and their directories; but not their parent revisions (ie. this is not recursive).

write_revision_node(revision: Revision) bool[source]#

Writes a revision object to disk

load_releases(obj_ids: List[bytes]) List[Release][source]#

Loads release objects, and returns them.

push_releases_subgraphs(obj_ids: List[bytes]) None[source]#

Given a list of release ids, loads these releases and adds their target to the list of objects to visit

write_release_node(release: Release) bool[source]#

Writes a release object to disk

load_directories(obj_ids: List[bytes]) None[source]#
load_directory(obj_id: bytes, raw_manifest: Optional[bytes]) None[source]#
load_contents(obj_ids: List[bytes]) None[source]#
write_content(obj_id: bytes, content: bytes) None[source]#