swh.lister.gitlab.lister module#

class swh.lister.gitlab.lister.GitLabListerState(last_seen_next_link: str | None = None)[source]#

Bases: object

State of the GitLabLister

last_seen_next_link: str | None = None#: Last link header (not visited yet) during an incremental pass

class swh.lister.gitlab.lister.PageResult(repositories: Tuple[Dict[str, Any], ...] | None = None, next_page: str | None = None)[source]#

Bases: object

Result from a query to a gitlab project api page.

repositories: Tuple[Dict[str, Any], ...] | None = None#

next_page: str | None = None#

class swh.lister.gitlab.lister.GitLabLister(scheduler, url: str | None = None, name: str | None = 'gitlab', instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, incremental: bool = False, ignored_project_prefixes: List[str] | None = None)[source]#

Bases: Lister[GitLabListerState, PageResult]

List origins for a gitlab instance.

By default, the lister runs in incremental mode: it lists all repositories, starting with the last_seen_next_link stored in the scheduler backend.

Parameters:

scheduler – a scheduler instance
url – the api v4 url of the gitlab instance to visit (e.g. api/v4/)
instance – a specific instance name (e.g. gitlab, tor, git-kernel, …), url network location will be used if not provided
incremental – defines if incremental listing is activated or not
ignored_project_prefixes – List of prefixes of project paths to ignore

build_url(instance: str) → str[source]#: Build gitlab api url.

state_from_dict(d: Dict[str, Any]) → GitLabListerState[source]#: Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: GitLabListerState) → Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

get_page_result(url: str) → PageResult[source]#

page_url(id_after: int | None = None) → str[source]#

get_pages() → Iterator[PageResult][source]#

Retrieve a list of pages of listed results. This is the main loop of the lister.

Returns:: an iterator of raw pages fetched from the platform currently being listed.

get_origins_from_page(page_result: PageResult) → Iterator[ListedOrigin][source]#

Extract a list of model.ListedOrigin from a raw page of results.

Parameters:: page – a single page of results
Returns:: an iterator for the origins present on the given page of results

commit_page(page_result: PageResult) → None[source]#

Update currently stored state using the latest listed “next” page if relevant.

Relevancy is determined by the next_page link whose ‘page’ id must be strictly superior to the currently stored one.

Note: this is a noop for full listing mode

finalize() → None[source]#

finalize the lister state when relevant (see fn:commit_page for details)

Note: this is a noop for full listing mode

swh.lister.gitlab.lister module#

This Page