swh.lister.gitlab.lister module

class swh.lister.gitlab.lister.GitLabListerState(last_seen_next_link: Optional[str] = None)[source]

Bases: object

State of the GitLabLister

Last link header (not visited yet) during an incremental pass

class swh.lister.gitlab.lister.PageResult(repositories: Optional[Tuple[Dict[str, Any], ...]] = None, next_page: Optional[str] = None)[source]

Bases: object

Result from a query to a gitlab project api page.

repositories: Optional[Tuple[Dict[str, Any], ...]] = None
next_page: Optional[str] = None
class swh.lister.gitlab.lister.GitLabLister(scheduler, url: str, name: Optional[str] = 'gitlab', instance: Optional[str] = None, credentials: Optional[Dict[str, Dict[str, List[Dict[str, str]]]]] = None, incremental: bool = False)[source]

Bases: swh.lister.pattern.Lister[swh.lister.gitlab.lister.GitLabListerState, swh.lister.gitlab.lister.PageResult]

List origins for a gitlab instance.

By default, the lister runs in incremental mode: it lists all repositories, starting with the last_seen_next_link stored in the scheduler backend.

  • scheduler – a scheduler instance

  • url – the api v4 url of the gitlab instance to visit (e.g. https://gitlab.com/api/v4/)

  • instance – a specific instance name (e.g. gitlab, tor, git-kernel, …), url network location will be used if not provided

  • incremental – defines if incremental listing is activated or not

state_from_dict(d: Dict[str, Any]) swh.lister.gitlab.lister.GitLabListerState[source]

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: swh.lister.gitlab.lister.GitLabListerState) Dict[str, Any][source]

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

get_page_result(url: str) swh.lister.gitlab.lister.PageResult[source]
page_url(id_after: Optional[int] = None) str[source]
get_pages() Iterator[swh.lister.gitlab.lister.PageResult][source]

Retrieve a list of pages of listed results. This is the main loop of the lister.


an iterator of raw pages fetched from the platform currently being listed.

get_origins_from_page(page_result: swh.lister.gitlab.lister.PageResult) Iterator[swh.scheduler.model.ListedOrigin][source]

Extract a list of model.ListedOrigin from a raw page of results.


page – a single page of results


an iterator for the origins present on the given page of results

commit_page(page_result: swh.lister.gitlab.lister.PageResult) None[source]

Update currently stored state using the latest listed “next” page if relevant.

Relevancy is determined by the next_page link whose ‘page’ id must be strictly superior to the currently stored one.

Note: this is a noop for full listing mode

finalize() None[source]

finalize the lister state when relevant (see fn:commit_page for details)

Note: this is a noop for full listing mode