swh.lister.gitlab.lister module#
- class swh.lister.gitlab.lister.GitLabListerState(last_seen_next_link: str | None = None)[source]#
Bases:
object
State of the GitLabLister
- class swh.lister.gitlab.lister.PageResult(repositories: Tuple[Dict[str, Any], ...] | None = None, next_page: str | None = None)[source]#
Bases:
object
Result from a query to a gitlab project api page.
- class swh.lister.gitlab.lister.GitLabLister(scheduler, url: str | None = None, name: str | None = 'gitlab', instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, incremental: bool = False, ignored_project_prefixes: List[str] | None = None)[source]#
Bases:
Lister
[GitLabListerState
,PageResult
]List origins for a gitlab instance.
By default, the lister runs in incremental mode: it lists all repositories, starting with the last_seen_next_link stored in the scheduler backend.
- Parameters:
scheduler – a scheduler instance
url – the api v4 url of the gitlab instance to visit (e.g. api/v4/)
instance – a specific instance name (e.g. gitlab, tor, git-kernel, …), url network location will be used if not provided
incremental – defines if incremental listing is activated or not
ignored_project_prefixes – List of prefixes of project paths to ignore
- state_from_dict(d: Dict[str, Any]) GitLabListerState [source]#
Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.
- state_to_dict(state: GitLabListerState) Dict[str, Any] [source]#
Convert the StateType for this lister to its serialization as dict for storage in the scheduler.
Values must be JSON-compatible as that’s what the backend database expects.
- get_page_result(url: str) PageResult [source]#
- get_pages() Iterator[PageResult] [source]#
Retrieve a list of pages of listed results. This is the main loop of the lister.
- Returns:
an iterator of raw pages fetched from the platform currently being listed.
- get_origins_from_page(page_result: PageResult) Iterator[ListedOrigin] [source]#
Extract a list of
model.ListedOrigin
from a raw page of results.- Parameters:
page – a single page of results
- Returns:
an iterator for the origins present on the given page of results
- commit_page(page_result: PageResult) None [source]#
Update currently stored state using the latest listed “next” page if relevant.
Relevancy is determined by the next_page link whose ‘page’ id must be strictly superior to the currently stored one.
Note: this is a noop for full listing mode