swh.lister.rubygems.lister module#
- class swh.lister.rubygems.lister.RubyGemsLister(scheduler: SchedulerInterface, url: str = 'https://s3-us-west-2.amazonaws.com/rubygems-dumps', instance: str = 'rubygems', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True)[source]#
Bases:
StatelessLister
[Dict
[str
,Any
]]Lister for RubyGems.org, the Ruby community’s gem hosting service.
Instead of querying rubygems.org Web API, it uses gems data from the daily PostreSQL database dump of rubygems. It enables to gather all interesting info about a gem and its release artifacts (version number, download URL, checksums, release date) in an efficient way and without flooding rubygems Web API with numerous HTTP requests (as there is more than 187000 gems available on 07/10/2022).
- VISIT_TYPE = 'rubygems'#
- INSTANCE = 'rubygems'#
- RUBY_GEMS_POSTGRES_DUMP_BASE_URL = 'https://s3-us-west-2.amazonaws.com/rubygems-dumps'#
- RUBY_GEMS_POSTGRES_DUMP_LIST_URL = 'https://s3-us-west-2.amazonaws.com/rubygems-dumps?prefix=production/public_postgresql'#
- RUBY_GEM_DOWNLOAD_URL_PATTERN = 'https://rubygems.org/downloads/{gem}-{version}.gem'#
- RUBY_GEM_ORIGIN_URL_PATTERN = 'https://rubygems.org/gems/{gem}'#
- RUBY_GEM_EXTRINSIC_METADATA_URL_PATTERN = 'https://rubygems.org/api/v2/rubygems/{gem}/versions/{version}.json'#
- DB_NAME = 'rubygems'#
- DUMP_SQL_PATH = 'public_postgresql/databases/PostgreSQL.sql.gz'#