swh.lister.debian.lister module#

class swh.lister.debian.lister.DebianListerState(package_versions: ~typing.Dict[str, ~typing.Set[str]] = <factory>)[source]#

Bases: object

State of debian lister

package_versions: Dict[str, Set[str]]#

Dictionary mapping a package name to all the versions found during last listing

class swh.lister.debian.lister.DebianLister(scheduler: SchedulerInterface, url: str = 'http://deb.debian.org/debian/', instance: str = 'Debian', suites: List[str] = ['stretch', 'buster', 'bullseye'], components: List[str] = ['main', 'contrib', 'non-free'], credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#

Bases: Lister[DebianListerState, Iterator[Sources]]

List source packages for a given debian or derivative distribution.

The lister will create a snapshot for each package name from all its available versions.

If a package snapshot is different from the last listing operation, it will be send to the scheduler that will create a loading task to archive newly found source code.

  • scheduler – instance of SchedulerInterface

  • distribution – identifier of listed distribution (e.g. Debian, Ubuntu)

  • mirror_url – debian package archives mirror URL

  • suites – list of distribution suites to process

  • components – list of package components to process

LISTER_NAME: str = 'debian'#
MIRROR_URL = 'http://deb.debian.org/debian/'#
INSTANCE = 'Debian'#
state_from_dict(d: Dict[str, Any]) DebianListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: DebianListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

debian_index_urls(suite: str, component: str) Iterator[Tuple[str, str]][source]#

Return an iterator on possible Sources file URLs as multiple compression formats can be used.

page_request(suite: str, component: str) Iterator[Sources][source]#

Return parsed package Sources file for a given debian suite and component.

get_pages() Iterator[Iterator[Sources]][source]#

Return an iterator on parsed debian package Sources files, one per combination of debian suite and component.

origin_url_for_package(package_name: str) str[source]#

Return the origin url for the given package

get_origins_from_page(page: Iterator[Sources]) Iterator[ListedOrigin][source]#

Convert a page of debian package sources into an iterator of ListedOrigin.

Please note that the returned origins correspond to packages only listed for the first time in order to get an accurate origins counter in the statistics returned by the run method of the lister.

Packages already listed in another page but with different versions will be put in cache by the method and updated ListedOrigin objects will be sent to the scheduler later in the commit_page method.

Indeed as multiple debian suites can be processed, a similar set of package names can be listed for two different package source pages, only their version will differ, resulting in origins counted multiple times in lister statistics.


Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.