swh.lister.pypi.lister module

class swh.lister.pypi.lister.PyPIListerState(last_serial: Optional[int] = None)[source]

Bases: object

State of PyPI lister

last_serial: Optional[int] = None

Last seen serial when visiting the pypi instance

swh.lister.pypi.lister.pypi_url(package_name: str) str[source]

Build pypi url out of a package name.

class swh.lister.pypi.lister.PyPILister(scheduler: swh.scheduler.interface.SchedulerInterface, credentials: Optional[Dict[str, Dict[str, List[Dict[str, str]]]]] = None)[source]

Bases: swh.lister.pattern.Lister[swh.lister.pypi.lister.PyPIListerState, List[Tuple[str, datetime.datetime]]]

List origins from PyPI.

LISTER_NAME: str = 'pypi'
INSTANCE = 'pypi'
PACKAGE_LIST_URL = 'https://pypi.org/pypi'
PACKAGE_URL = 'https://pypi.org/project/{package_name}/'
state_from_dict(d: Dict[str, Any]) swh.lister.pypi.lister.PyPIListerState[source]

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: swh.lister.pypi.lister.PyPIListerState) Dict[str, Any][source]

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

get_pages() Iterator[List[Tuple[str, datetime.datetime]]][source]
Iterate other changelog events per package, determine the max release date for that

package and use that max release date as last_update. When the execution is done, this will also set the self.last_processed_serial attribute so we can finalize the state of the lister for the next visit.

Yields

List of Tuple of (package-name, max release-date)

get_origins_from_page(packages: List[Tuple[str, datetime.datetime]]) Iterator[swh.scheduler.model.ListedOrigin][source]

Convert a page of PyPI repositories into a list of ListedOrigins.

finalize()[source]

Finalize the visit state by updating with the new last_serial if updates actually happened.