swh.lister.bioconductor.lister module#
- class swh.lister.bioconductor.lister.BioconductorListerState(package_versions: ~typing.Dict[str, ~typing.Set[str]] = <factory>)[source]#
Bases:
object
State of the Bioconductor lister
- class swh.lister.bioconductor.lister.BioconductorLister(scheduler: SchedulerInterface, url: str = 'https://www.bioconductor.org', instance: str = 'bioconductor', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, releases: List[str] | None = None, categories: List[str] | None = None, incremental: bool = False, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, record_batch_size: int = 1000)[source]#
Bases:
Lister
[BioconductorListerState
,Tuple
[str
,str
,Dict
[str
,Any
]] |None
]List origins from Bioconductor, a collection of open source software for bioinformatics based on the R statistical programming language.
- VISIT_TYPE = 'bioconductor'#
- INSTANCE = 'bioconductor'#
- BIOCONDUCTOR_HOMEPAGE = 'https://www.bioconductor.org'#
- state_from_dict(d: Dict[str, Any]) BioconductorListerState [source]#
Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.
- state_to_dict(state: BioconductorListerState) Dict[str, Any] [source]#
Convert the StateType for this lister to its serialization as dict for storage in the scheduler.
Values must be JSON-compatible as that’s what the backend database expects.
- get_pages() Iterator[Tuple[str, str, Dict[str, Any]] | None] [source]#
Return an iterator for each page. Every page is a (release, category) pair.
- get_origins_from_page(page: Tuple[str, str, Dict[str, Any]] | None) Iterator[ListedOrigin] [source]#
Convert a page of BioconductorLister PACKAGES/packages.json metadata into a list of ListedOrigins
- finalize() None [source]#
Custom hook to finalize the lister state before returning from the main loop.
This method must set
updated
if the lister has done some work.If relevant, this method can use :meth`get_state_from_scheduler` to merge the current lister state with the one from the scheduler backend, reducing the risk of race conditions if we’re running concurrent listings.
This method is called in a finally block, which means it will also run when the lister fails.