swh.loader.git.dumb module

class swh.loader.git.dumb.DumbHttpGitClient(base_url: str)[source]

Bases: dulwich.client.Urllib3HttpGitClient

Simple wrapper around dulwich.client.HTTPGitClient

Create a new GitClient instance.

  • thin_packs – Whether or not thin packs should be retrieved

  • report_activity – Optional callback for reporting transport activity.

  • include_tags – send annotated tags when sending the objects they point to

get(url: str) urllib3.response.HTTPResponse[source]
swh.loader.git.dumb.check_protocol(repo_url: str) bool[source]

Checks if a git repository can be cloned using the dumb protocol.


repo_url – Base URL of a git repository


Whether the dumb protocol is supported.

class swh.loader.git.dumb.GitObjectsFetcher(repo_url: str, base_repo: RepoRepresentation)[source]

Bases: object

Git objects fetcher using dumb HTTP protocol.

Fetches a set of git objects for a repository according to its archival state by Software Heritage and provides iterators on them.

  • repo_url – Base URL of a git repository

  • base_repo – State of repository archived by Software Heritage

fetch_object_ids() None[source]

Fetches identifiers of git objects to load into the archive.

iter_objects(object_type: bytes) Iterable[dulwich.objects.ShaFile][source]

Returns a generator on fetched git objects per type.


object_type – Git object type, either b”blob”, b”commit”, b”tag” or b”tree”


A generator fetching git objects on the fly.