swh.lister.gitlab.lister module#

class swh.lister.gitlab.lister.GitLabListerState(last_seen_next_link: str | None = None)[source]#

Bases: object

State of the GitLabLister

Last link header (not visited yet) during an incremental pass

class swh.lister.gitlab.lister.PageResult(repositories: Tuple[Dict[str, Any], ...] | None = None, next_page: str | None = None)[source]#

Bases: object

Result from a query to a gitlab project api page.

repositories: Tuple[Dict[str, Any], ...] | None = None#
next_page: str | None = None#
class swh.lister.gitlab.lister.GitLabLister(scheduler, url: str | None = None, name: str | None = 'gitlab', instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, incremental: bool = False, ignored_project_prefixes: List[str] | None = None)[source]#

Bases: Lister[GitLabListerState, PageResult]

List origins for a gitlab instance.

By default, the lister runs in incremental mode: it lists all repositories, starting with the last_seen_next_link stored in the scheduler backend.

Parameters:
  • scheduler – a scheduler instance

  • url – the api v4 url of the gitlab instance to visit (e.g. api/v4/)

  • instance – a specific instance name (e.g. gitlab, tor, git-kernel, …), url network location will be used if not provided

  • incremental – defines if incremental listing is activated or not

  • ignored_project_prefixes – List of prefixes of project paths to ignore

build_url(instance: str) str[source]#

Build gitlab api url.

state_from_dict(d: Dict[str, Any]) GitLabListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: GitLabListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

get_page_result(url: str) PageResult[source]#
page_url(id_after: int | None = None) str[source]#
get_pages() Iterator[PageResult][source]#

Retrieve a list of pages of listed results. This is the main loop of the lister.

Returns:

an iterator of raw pages fetched from the platform currently being listed.

get_origins_from_page(page_result: PageResult) Iterator[ListedOrigin][source]#

Extract a list of model.ListedOrigin from a raw page of results.

Parameters:

page – a single page of results

Returns:

an iterator for the origins present on the given page of results

commit_page(page_result: PageResult) None[source]#

Update currently stored state using the latest listed “next” page if relevant.

Relevancy is determined by the next_page link whose ‘page’ id must be strictly superior to the currently stored one.

Note: this is a noop for full listing mode

finalize() None[source]#

finalize the lister state when relevant (see fn:commit_page for details)

Note: this is a noop for full listing mode