swh.lister.pypi.lister module#

class swh.lister.pypi.lister.PyPIListerState(last_serial: int | None = None)[source]#

Bases: object

State of PyPI lister

last_serial: int | None = None#

Last seen serial when visiting the pypi instance

swh.lister.pypi.lister.pypi_url(package_name: str) str[source]#

Build pypi url out of a package name.

class swh.lister.pypi.lister.PyPILister(scheduler: SchedulerInterface, url: str = 'https://pypi.org/pypi', instance: str = 'pypi', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#

Bases: Lister[PyPIListerState, List[Tuple[str, datetime]]]

List origins from PyPI.

LISTER_NAME: str = 'pypi'#
INSTANCE = 'pypi'#
PACKAGE_LIST_URL = 'https://pypi.org/pypi'#
PACKAGE_URL = 'https://pypi.org/project/{package_name}/'#
state_from_dict(d: Dict[str, Any]) PyPIListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: PyPIListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

get_pages() Iterator[List[Tuple[str, datetime]]][source]#
Iterate other changelog events per package, determine the max release date for that

package and use that max release date as last_update. When the execution is done, this will also set the self.last_processed_serial attribute so we can finalize the state of the lister for the next visit.

Yields:

List of Tuple of (package-name, max release-date)

get_origins_from_page(packages: List[Tuple[str, datetime]]) Iterator[ListedOrigin][source]#

Convert a page of PyPI repositories into a list of ListedOrigins.

finalize()[source]#

Finalize the visit state by updating with the new last_serial if updates actually happened.