swh.lister.npm.lister module#

class swh.lister.npm.lister.NpmListerState(last_seq: int | None = None)[source]#

Bases: object

State of npm lister

last_seq: int | None = None#
class swh.lister.npm.lister.NpmLister(scheduler: SchedulerInterface, url: str = 'https://replicate.npmjs.com/_all_docs', instance: str = 'npm', page_size: int = 1000, incremental: bool = False, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#

Bases: Lister[NpmListerState, List[Dict[str, Any]]]

List all packages hosted on the npm registry.

The lister is based on the npm replication API powered by a CouchDB database (https://docs.couchdb.org/en/stable/api/database/).

Parameters:
  • scheduler – a scheduler instance

  • page_size – number of packages info to return per page when querying npm API

  • incremental – defines if incremental listing should be used, in that case only modified or new packages since last incremental listing operation will be returned, otherwise all packages will be listed in lexicographical order

LISTER_NAME: str = 'npm'#
INSTANCE = 'npm'#
API_BASE_URL = 'https://replicate.npmjs.com'#
API_INCREMENTAL_LISTING_URL = 'https://replicate.npmjs.com/_changes'#
API_FULL_LISTING_URL = 'https://replicate.npmjs.com/_all_docs'#
PACKAGE_URL_TEMPLATE = 'https://www.npmjs.com/package/{package_name}'#
state_from_dict(d: Dict[str, Any]) NpmListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: NpmListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

request_params(last_package_id: str) Dict[str, Any][source]#
get_pages() Iterator[List[Dict[str, Any]]][source]#

Retrieve a list of pages of listed results. This is the main loop of the lister.

Returns:

an iterator of raw pages fetched from the platform currently being listed.

get_origins_from_page(page: List[Dict[str, Any]]) Iterator[ListedOrigin][source]#

Convert a page of Npm repositories into a list of ListedOrigin.

commit_page(page: List[Dict[str, Any]])[source]#

Update the currently stored state using the latest listed page.

finalize()[source]#

Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.