swh.lister.sourceforge.lister module#
- class swh.lister.sourceforge.lister.VcsNames(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#
Bases:
Enum
Used to filter SourceForge tool names for valid VCS types
- CVS = 'cvs'#
- GIT = 'git'#
- SUBVERSION = 'svn'#
- MERCURIAL = 'hg'#
- BAZAAR = 'bzr'#
- class swh.lister.sourceforge.lister.SourceForgeListerEntry(vcs: swh.lister.sourceforge.lister.VcsNames, url: str, last_modified: datetime.date)[source]#
Bases:
object
- class swh.lister.sourceforge.lister.SourceForgeListerState(subsitemap_last_modified: ~typing.Dict[str, ~datetime.date] = <factory>, empty_projects: ~typing.Dict[str, ~datetime.date] = <factory>)[source]#
Bases:
object
Current state of the SourceForge lister in incremental runs
- class swh.lister.sourceforge.lister.SourceForgeLister(scheduler: SchedulerInterface, url: str = 'https://sourceforge.net', instance: str = 'main', incremental: bool = False, credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#
Bases:
Lister
[SourceForgeListerState
,List
[SourceForgeListerEntry
]]List origins from the “SourceForge” forge.
- SOURCEFORGE_URL = 'https://sourceforge.net'#
- INSTANCE = 'main'#
- state_from_dict(d: Dict[str, Dict[str, Any]]) SourceForgeListerState [source]#
Convert the state stored in the scheduler backend (as a dict), to the concrete StateType for this lister.
- state_to_dict(state: SourceForgeListerState) Dict[str, Any] [source]#
Convert the StateType for this lister to its serialization as dict for storage in the scheduler.
Values must be JSON-compatible as that’s what the backend database expects.
- get_pages() Iterator[List[SourceForgeListerEntry]] [source]#
SourceForge has a main XML sitemap that lists its sharded sitemaps for all projects. Each XML sub-sitemap lists project pages, which are not unique per project: a project can have a wiki, a home, a git, an svn, etc. For each unique project, we query an API endpoint that lists (among other things) the tools associated with said project, some of which are the VCS used. Subprojects are considered separate projects. Lastly we use the information of which VCS are used to build the predictable clone URL for any given VCS.
- get_origins_from_page(page: List[SourceForgeListerEntry]) Iterator[ListedOrigin] [source]#
Extract a list of
model.ListedOrigin
from a raw page of results.- Parameters:
page – a single page of results
- Returns:
an iterator for the origins present on the given page of results