swh.lister.bitbucket.lister module#

class swh.lister.bitbucket.lister.BitbucketLister(scheduler: SchedulerInterface, url: str | None = None, instance: str | None = None, page_size: int = 1000, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, **kwargs)[source]#

Bases: Lister[StateType, List[Dict[str, Any]]], ABC

Commonalities between Bitbucket instance types

LISTER_NAME: str = 'bitbucket'#
CLOUD_INSTANCES = (None, '', 'bitbucket', 'bitbucket.org')#
CLONES = ['https', 'http', 'ssh']#
URL_PARAMS: Dict[str, Any]#
THIS_PAGE: str#
LEN_PAGE: str#
NEXT_PAGE: str#
SCM: str#
api_url: str#
set_credentials(username: str | None, password: str | None) None[source]#

Set basic authentication headers with given credentials.

get_pages() Iterator[List[Dict[str, Any]]][source]#

Retrieve a list of pages of listed results. This is the main loop of the lister.

Returns:

an iterator of raw pages fetched from the platform currently being listed.

get_origins_from_page(page: List[Dict[str, Any]]) Iterator[ListedOrigin][source]#

Convert a page of Bitbucket repositories into a list of ListedOrigins.

abstract page_url(page: int | None = None) str | None[source]#

Optionally return the URL for a specific page if appropriate.

abstract initial_page() Any[source]#

Return the initial page

abstract next_page(body: Dict[str, Any]) Any[source]#

Return the next page from the current page

abstract error_url_params(body: Dict[str, Any], page: Any) Dict[str, Any][source]#

Return the URL params to use on error

abstract get_last_update(repo: Dict[str, Any])[source]#

Optionally return the date a repo last changed

abstract get_enabled(repo: Dict[str, Any]) bool[source]#

Return whether or not the repo should be downloaded

classmethod from_config(scheduler: Dict[str, Any], instance: str | None = None, url: str | None = None, incremental: bool = True, skip_mro: bool = False, **config: Any)[source]#

Instantiate a lister from a configuration dict.

This is basically a backwards-compatibility shim for the CLI.

Parameters:
  • scheduler – instantiation config for the scheduler

  • config – the configuration dict for the lister, with the following keys: - credentials (optional): credentials list for the scheduler - any other kwargs passed to the lister.

Returns:

the instantiated lister

class swh.lister.bitbucket.lister.BitbucketServerLister(scheduler: SchedulerInterface, url: str | None = None, instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, **kwargs)[source]#

Bases: BitbucketLister, StatelessLister[List[Dict[str, Any]]]

List origins from Bitbucket Server and Data Centre instances using the REST API. https://docs.atlassian.com/bitbucket-server/rest/7.0.0/bitbucket-rest.html#idp392 https://developer.atlassian.com/server/bitbucket/rest/v1000/api-group-repository/#api-api-latest-repos-get

URL_PARAMS: Dict[str, Any] = {}#
THIS_PAGE: str = 'start'#
LEN_PAGE: str = 'limit'#
NEXT_PAGE: str = 'nextPageStart'#
SCM: str = 'scmId'#
page_url(page: int | None = None) str | None[source]#

Optionally return the URL for a specific page if appropriate.

initial_page() int[source]#

Return the initial page

next_page(body: Dict[str, Any]) Any[source]#

Return the next page from the current page

error_url_params(body: Dict[str, Any], page: int)[source]#

Return the URL params to use on error

get_last_update(repo: Dict[str, Any])[source]#

Optionally return the date a repo last changed

get_enabled(repo: Dict[str, Any]) bool[source]#

Return whether or not the repo should be downloaded

class swh.lister.bitbucket.lister.BitbucketCloudListerState(last_repo_cdate: datetime | None = None)[source]#

Bases: object

State of Bitbucket Cloud lister

last_repo_cdate: datetime | None = None#

Creation date and time of the last listed repository during an incremental pass

class swh.lister.bitbucket.lister.BitbucketCloudLister(scheduler: SchedulerInterface, url: str | None = None, instance: str | None = None, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, incremental: bool = True, **kwargs)[source]#

Bases: BitbucketLister, Lister[BitbucketCloudListerState, List[Dict[str, Any]]]

List origins from Bitbucket Cloud using the REST API.

https://developer.atlassian.com/cloud/bitbucket/rest/api-group-repositories/#api-repositories-get

Bitbucket Cloud API has the following rate-limit configuration:

  • 60 requests per hour for anonymous users

  • 1000 requests per hour for authenticated users

The lister is working in anonymous mode by default but Bitbucket account credentials can be provided to perform authenticated requests.

URL_PARAMS: Dict[str, Any] = {'fields': 'next,values.is_private,values.scm,values.links.clone,values.updated_on,values.created_on,'}#
THIS_PAGE: str = 'after'#
LEN_PAGE: str = 'pagelen'#
NEXT_PAGE: str = 'next'#
SCM: str = 'scm'#
state_from_dict(d: Dict[str, Any]) BitbucketCloudListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: BitbucketCloudListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

page_url(page: int | None = None) str | None[source]#

Optionally return the URL for a specific page if appropriate.

initial_page() str[source]#

Return the initial page

next_page(body: Dict[str, Any]) str | None[source]#

Return the next page from the current page

error_url_params(body: Dict[str, Any], page: str) Dict[str, Any][source]#

Return the URL params to use on error

get_last_update(repo: Dict[str, Any])[source]#

Optionally return the date a repo last changed

get_enabled(repo: Dict[str, Any]) bool[source]#

Return whether or not the repo should be downloaded

commit_page(page: List[Dict[str, Any]]) None[source]#

Update the currently stored state using the latest listed page.

finalize() None[source]#

Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.