swh.vault.cookers.git_bare module#

This cooker creates tarballs containing a bare .git directory, that can be unpacked and cloned like any git repository.

It works in three steps:

  1. Write objects one by one in .git/objects/

  2. Calls git repack to pack all these objects into git packfiles.

  3. Creates a tarball of the resulting repository

It keeps a set of all written (or about-to-be-written) object hashes in memory to avoid downloading and writing the same objects twice.

The first step is the most complex. When swh-graph is available, this roughly does the following:

  1. Find all the revisions and releases in the induced subgraph, adds them to todo-lists

  2. Grab a batch from (release/revision/directory/content) todo-lists, and load them. Add directory and content objects they reference to the todo-list

  3. If any todo-list is not empty, goto 1

When swh-graph is not available, steps 1 and 2 are merged, because revisions need to be loaded in order to compute the subgraph.

class swh.vault.cookers.git_bare.RootObjectType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

DIRECTORY = 'directory'#
REVISION = 'revision'#
RELEASE = 'release'#
SNAPSHOT = 'snapshot'#
swh.vault.cookers.git_bare.assert_never(value: NoReturn, msg) NoReturn[source]#

mypy makes sure this function is never called, through exhaustive checking of value in the parent function.

See https://mypy.readthedocs.io/en/latest/literal_types.html#exhaustive-checks for details.

class swh.vault.cookers.git_bare.GitBareCooker(*args, **kwargs)[source]#

Bases: BaseVaultCooker

Initialize the cooker.

The type of the object represented by the id depends on the concrete class. Very likely, each type of bundle will have its own cooker class.

Parameters:
  • swhid – id of the object to be cooked into a bundle.

  • backend – the vault backend (swh.vault.backend.VaultBackend).

BUNDLE_TYPE: ClassVar[str] = 'git_bare'#
SUPPORTED_OBJECT_TYPES: ClassVar[Set[swh.model.swhids.ObjectType]] = {ObjectType.DIRECTORY, ObjectType.RELEASE, ObjectType.REVISION, ObjectType.SNAPSHOT}#
use_fsck = True#
obj_type: RootObjectType#
check_exists() bool[source]#

Returns whether the root object is present in the archive.

prepare_bundle() None[source]#

Main entry point. Initializes the state, creates the bundle, and sends it to the backend.

init_git() None[source]#

Creates an empty .git directory.

create_object_dirs() None[source]#

Creates all possible subdirectories of .git/objects/

repack() None[source]#

Moves all objects from .git/objects/ to a packfile.

git_fsck() None[source]#

Runs git-fsck and ignores expected errors (eg. because of missing objects).

write_refs(snapshot=None) None[source]#

Writes all files in .git/refs/.

For non-snapshot objects, this is only master.

write_archive()[source]#

Creates the final .tar file.

object_exists(obj_id: bytes) bool[source]#

Returns whether the object identified by the given obj_id was already written to a file in .git/object/.

This function ignores objects contained in a git pack.

write_object(obj_id: bytes, obj: bytes) bool[source]#

Writes a git object on disk.

Returns whether it was already written.

push_subgraph(obj_type: RootObjectType, obj_id) None[source]#

Adds graph induced by the given obj_id without recursing through directories, to the todo-lists.

If swh-graph is not available, this immediately loads revisions, as they need to be fetched in order to compute the subgraph, and fetching them immediately avoids duplicate fetches.

load_objects() None[source]#

Repeatedly loads objects in the todo-lists, until all lists are empty.

push_revision_subgraph(obj_id: bytes) None[source]#

Fetches the graph of revisions induced by the given obj_id and adds them to self._rev_stack.

If swh-graph is not available, this requires fetching the revisions themselves, so they are directly loaded instead.

push_snapshot_subgraph(obj_id: bytes) None[source]#

Fetches a snapshot and all its children, excluding directories and contents, and pushes them to the todo-lists.

Also loads revisions if swh-graph is not available, see push_revision_subgraph().

load_revisions(obj_ids: List[bytes]) None[source]#

Given a list of revision ids, loads these revisions and their directories; but not their parent revisions (ie. this is not recursive).

write_revision_node(revision: Revision) bool[source]#

Writes a revision object to disk

load_releases(obj_ids: List[bytes]) List[Release][source]#

Loads release objects, and returns them.

push_releases_subgraphs(obj_ids: List[bytes]) None[source]#

Given a list of release ids, loads these releases and adds their target to the list of objects to visit

write_release_node(release: Release) bool[source]#

Writes a release object to disk

load_directories(obj_ids: List[bytes]) None[source]#
load_directory(obj_id: bytes, raw_manifest: bytes | None) None[source]#
load_content(obj_id: bytes) None[source]#
write_content(obj_id: bytes, content: bytes) None[source]#