swh.lister.packagist.lister module#

exception swh.lister.packagist.lister.NotModifiedSinceLastVisit[source]#

Bases: ValueError

Exception raised when a package has seen no change since the last visit.

class swh.lister.packagist.lister.PackagistListerState(last_listing_date: datetime | None = None)[source]#

Bases: object

State of Packagist lister

last_listing_date: datetime | None = None#

Last date when packagist lister was executed

class swh.lister.packagist.lister.PackagistLister(scheduler: SchedulerInterface, url: str = 'https://packagist.org/packages/list.json', instance: str = 'packagist', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, record_batch_size: int = 1000)[source]#

Bases: Lister[PackagistListerState, List[str]]

List all Packagist projects and send associated origins to scheduler.

The lister queries the Packagist API, whose documentation can be found at https://packagist.org/apidoc.

For each package, its metadata are retrieved using Packagist API endpoints whose responses are served from static files, which are guaranteed to be efficient on the Packagist side (no dymamic queries). Furthermore, subsequent listing will send the “If-Modified-Since” HTTP header to only retrieve packages metadata updated since the previous listing operation in order to save bandwidth and return only origins which might have new released versions.

LISTER_NAME: str = 'Packagist'#
INSTANCE = 'packagist'#
PACKAGIST_PACKAGES_LIST_URL = 'https://packagist.org/packages/list.json'#
PACKAGIST_PACKAGE_URL_FORMATS = ['https://repo.packagist.org/p2/{package_name}.json', 'https://repo.packagist.org/p2/{package_name}~dev.json', 'https://repo.packagist.org/p/{package_name}.json', 'https://repo.packagist.org/packages/{package_name}.json']#
state_from_dict(d: Dict[str, Any]) PackagistListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: PackagistListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

api_request(url: str) Dict[source]#

Execute api request to packagist server.

Raises:

NotModifiedSinceLastVisit – if the url returns a 304 response.

Returns:

The json result in case of a 200, an empty response otherwise.

get_pages() Iterator[List[str]][source]#

Retrieve & randomize unique list of packages into pages of packages.

get_origins_from_page(page: List[str]) Iterator[ListedOrigin][source]#

Iterate on all Packagist projects and yield ListedOrigin instances.

finalize() None[source]#

Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.