swh.lister.cgit.lister module#
- class swh.lister.cgit.lister.CGitLister(scheduler: SchedulerInterface, url: str | None = None, instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, base_git_url: str | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#
Bases:
StatelessLister
[List
[Dict
[str
,Any
]]]Lister class for CGit repositories.
This lister will retrieve the list of published git repositories by parsing the HTML page(s) of the index retrieved at url.
The lister currently defines 2 listing behaviors:
If the base_git_url is provided, the listed origin urls are computed out of the base git url link and the one listed in the main listed page (resulting in less HTTP queries than the 2nd behavior below). This is expected to be the main deployed behavior.
Otherwise (with no base_git_url), for each found git repository listed, one extra HTTP query is made at the given url found in the main listing page to gather published “Clone” URLs to be used as origin URL for that git repo. If several “Clone” urls are provided, prefer the http/https one, if any, otherwise fallback to the first one.
Lister class for CGit repositories.
- Parameters:
url – (Optional) Root URL of the CGit instance, i.e. url of the index of published git repositories on this instance. Defaults to
https://instance
if unset.instance – Name of cgit instance. Defaults to url’s network location if unset.
base_git_url – Optional base git url which allows the origin url computations.