swh.loader.git.dumb module#

class swh.loader.git.dumb.BytesWriter(*args, **kwargs)[source]#

Bases: Protocol

write(data: bytes)[source]#
swh.loader.git.dumb.requests_kwargs(kwargs: Dict[str, Any]) Dict[str, Any][source]#

Inject User-Agent header in the requests kwargs

swh.loader.git.dumb.check_protocol(repo_url: str, requests_extra_kwargs: Dict[str, Any] = {}) bool[source]#

Checks if a git repository can be cloned using the dumb protocol.

Parameters:
  • repo_url – Base URL of a git repository

  • requests_extra_kwargs – extra keyword arguments to be passed to requests, e.g. timeout, verify.

Returns:

Whether the dumb protocol is supported.

class swh.loader.git.dumb.GitObjectsFetcher(repo_url: str, base_repo: RepoRepresentation, pack_size_limit: int, requests_extra_kwargs: Dict[str, Any] = {})[source]#

Bases: object

Git objects fetcher using dumb HTTP protocol.

Fetches a set of git objects for a repository according to its archival state by Software Heritage and provides iterators on them.

Parameters:
  • repo_url – Base URL of a git repository

  • base_repo – State of repository archived by Software Heritage

  • requests_extra_kwargs – extra keyword arguments to be passed to requests, e.g. timeout, verify.

fetch_object_ids() None[source]#

Fetches identifiers of git objects to load into the archive.

iter_objects(object_type: bytes) Iterable[ShaFile][source]#

Returns a generator on fetched git objects per type.

Parameters:

object_type – Git object type, either b”blob”, b”commit”, b”tag” or b”tree”

Returns:

A generator fetching git objects on the fly.