- class swh.lister.debian.lister.DebianListerState(package_versions: ~typing.Dict[str, ~typing.Set[str]] = <factory>)[source]#
State of debian lister
- class swh.lister.debian.lister.DebianLister(scheduler: SchedulerInterface, distribution: str = 'Debian', mirror_url: str = 'http://deb.debian.org/debian/', suites: List[str] = ['stretch', 'buster', 'bullseye'], components: List[str] = ['main', 'contrib', 'non-free'], credentials: Optional[Dict[str, Dict[str, List[Dict[str, str]]]]] = None, max_origins_per_page: Optional[int] = None, max_pages: Optional[int] = None, enable_origins: bool = True)[source]#
List source packages for a given debian or derivative distribution.
The lister will create a snapshot for each package name from all its available versions.
If a package snapshot is different from the last listing operation, it will be send to the scheduler that will create a loading task to archive newly found source code.
scheduler – instance of SchedulerInterface
distribution – identifier of listed distribution (e.g. Debian, Ubuntu)
mirror_url – debian package archives mirror URL
suites – list of distribution suites to process
components – list of package components to process
- state_from_dict(d: Dict[str, Any]) DebianListerState [source]#
Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.
- state_to_dict(state: DebianListerState) Dict[str, Any] [source]#
Convert the StateType for this lister to its serialization as dict for storage in the scheduler.
Values must be JSON-compatible as that’s what the backend database expects.
- debian_index_urls(suite: str, component: str) Iterator[Tuple[str, str]] [source]#
Return an iterator on possible Sources file URLs as multiple compression formats can be used.
- page_request(suite: str, component: str) Iterator[Sources] [source]#
Return parsed package Sources file for a given debian suite and component.
- get_pages() Iterator[Iterator[Sources]] [source]#
Return an iterator on parsed debian package Sources files, one per combination of debian suite and component.
- get_origins_from_page(page: Iterator[Sources]) Iterator[ListedOrigin] [source]#
Convert a page of debian package sources into an iterator of ListedOrigin.
Please note that the returned origins correspond to packages only listed for the first time in order to get an accurate origins counter in the statistics returned by the run method of the lister.
Packages already listed in another page but with different versions will be put in cache by the method and updated ListedOrigin objects will be sent to the scheduler later in the commit_page method.
Indeed as multiple debian suites can be processed, a similar set of package names can be listed for two different package source pages, only their version will differ, resulting in origins counted multiple times in lister statistics.
Custom hook to finalize the lister state before returning from the main loop.
This method must set
updatedif the lister has done some work.
If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.
This method is called in a finally block, which means it will also run when the lister fails.