swh.lister.rpm.lister module#

swh.lister.rpm.lister.RPMPageType#

Each page is a list of packages for a given (release, component) pair from a Red Hat based distribution.

alias of Tuple[str, str, Repo] | None

class swh.lister.rpm.lister.RPMSourceData[source]#

Bases: TypedDict

Dictionary holding relevant data for listing RPM source packages.

See content of the lister config directory to get examples of RPM source data for famous RedHat based distributions.

base_url: str#

Base URL of a RPM repository

releases: List[str]#

List of release identifiers for a Red Hat based distribution

components: List[str]#

List of components for a Red Hat based distribution

index_url_templates: List[str]#

List of URL templates to discover source packages metadata, the following variables can be substituted in them: base_url, release and edition, see string.Template for more details about the format. The generated URLs must target directories containing a sub-directory named repodata, which contains packages metadata, in order to be successfully processed by the lister.

class swh.lister.rpm.lister.RPMListerState(package_versions: ~typing.Dict[str, ~typing.Set[str]] = <factory>)[source]#

Bases: object

State of RPM lister

package_versions: Dict[str, Set[str]]#

Dictionary mapping a package name to all the versions found during last listing

class swh.lister.rpm.lister.RPMLister(scheduler: SchedulerInterface, url: str, instance: str, rpm_src_data: List[RPMSourceData], incremental: bool = False, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None)[source]#

Bases: Lister[RPMListerState, Tuple[str, str, Repo] | None]

List source packages for a Red Hat based linux distribution.

The lister creates a snapshot for each package from all its available versions.

In incremental mode, only packages with different snapshot since the last listing operation will be sent to the scheduler that will create loading tasks to archive newly found source code.

Parameters:
  • scheduler – instance of SchedulerInterface

  • url – Red Hat based distribution info URL

  • instance – name of Red Hat based distribution

  • rpm_src_data – list of dictionaries holding data required to list RPM source packages, see examples in the config directory.

  • incremental – if True, only packages with new versions are sent to the scheduler when relisting

LISTER_NAME: str = 'rpm'#
state_from_dict(d: Dict[str, Any]) RPMListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: RPMListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

repo_request(index_url_template: Template, base_url: str, release: str, component: str) Tuple[str, str, Repo] | None[source]#

Return parsed packages for a given distribution release and component.

get_pages() Iterator[Tuple[str, str, Repo] | None][source]#

Return an iterator on parsed rpm packages, one page per (release, component) pair.

origin_url_for_package(package_name: str) str[source]#

Return the origin url for the given package.

get_origins_from_page(page: Tuple[str, str, Repo] | None) Iterator[ListedOrigin][source]#

Convert a page of rpm package sources into an iterator of ListedOrigin.

finalize()[source]#

Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.