swh.lister.bioconductor.lister module#

class swh.lister.bioconductor.lister.BioconductorListerState(package_versions: ~typing.Dict[str, ~typing.Set[str]] = <factory>)[source]#

Bases: object

State of the Bioconductor lister

package_versions: Dict[str, Set[str]]#

Dictionary mapping a package name to all the versions found during last listing

class swh.lister.bioconductor.lister.BioconductorLister(scheduler: SchedulerInterface, url: str = 'https://www.bioconductor.org', instance: str = 'bioconductor', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, releases: List[str] | None = None, categories: List[str] | None = None, incremental: bool = False, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, record_batch_size: int = 1000)[source]#

Bases: Lister[BioconductorListerState, Tuple[str, str, Dict[str, Any]] | None]

List origins from Bioconductor, a collection of open source software for bioinformatics based on the R statistical programming language.

LISTER_NAME: str = 'bioconductor'#
VISIT_TYPE = 'bioconductor'#
INSTANCE = 'bioconductor'#
BIOCONDUCTOR_HOMEPAGE = 'https://www.bioconductor.org'#
state_from_dict(d: Dict[str, Any]) BioconductorListerState[source]#

Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.

state_to_dict(state: BioconductorListerState) Dict[str, Any][source]#

Convert the StateType for this lister to its serialization as dict for storage in the scheduler.

Values must be JSON-compatible as that’s what the backend database expects.

origin_url_for_package(package_name: str) str[source]#
get_pages() Iterator[Tuple[str, str, Dict[str, Any]] | None][source]#

Return an iterator for each page. Every page is a (release, category) pair.

fetch_versions() List[str][source]#
parse_packages(text: str) Dict[str, Any][source]#

Parses packages.json and PACKAGES files

get_origins_from_page(page: Tuple[str, str, Dict[str, Any]] | None) Iterator[ListedOrigin][source]#

Convert a page of BioconductorLister PACKAGES/packages.json metadata into a list of ListedOrigins

finalize() None[source]#

Custom hook to finalize the lister state before returning from the main loop.

This method must set updated if the lister has done some work.

If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.

This method is called in a finally block, which means it will also run when the lister fails.