swh.lister.rubygems package#
Submodules#
- swh.lister.rubygems.lister module
RubyGemsListerRubyGemsLister.LISTER_NAMERubyGemsLister.VISIT_TYPERubyGemsLister.INSTANCERubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_BASE_URLRubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_LIST_URLRubyGemsLister.RUBY_GEM_DOWNLOAD_URL_PATTERNRubyGemsLister.RUBY_GEM_ORIGIN_URL_PATTERNRubyGemsLister.RUBY_GEM_EXTRINSIC_METADATA_URL_PATTERNRubyGemsLister.DB_NAMERubyGemsLister.DUMP_SQL_PATHRubyGemsLister.get_latest_dump_file()RubyGemsLister.create_rubygems_db()RubyGemsLister.populate_rubygems_db()RubyGemsLister.get_pages()RubyGemsLister.get_origins_from_page()
- swh.lister.rubygems.tasks module
Module contents#
RubyGems lister#
The RubyGems lister list origins from RubyGems.org, the Ruby community’s gem hosting service.
As of July 2025 RubyGems.org list 186,003 package names.
Origins retrieving strategy#
To list all available gems and retrieve relevant data about gems in a performant way, the daily PostgreSQL database dump of RubyGems, is exploited.
All gems are listed by executing the following query:
SELECT id, name FROM rubygems
Relevant listing data are then retrieved by executing that query for each gem:
SELECT built_at, full_name, number, sha256, size
FROM versions
WHERE rubygem_id = <gem_id> AND yanked_at IS NULL
Page listing#
Each page returns listing info about one gem, its origin url is based on the following pattern:
https://rubygems.org/gems/{gem_name}
Origins from page#
The lister yields one listed origin per page.
Running tests#
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/rubygems/tests
Testing with Docker#
Change directory to swh/docker then launch the docker environment:
docker compose up -d
Then schedule a RubyGems listing task:
docker compose exec swh-scheduler swh scheduler task add -p oneshot list-rubygems
You can follow lister execution by displaying logs of swh-lister service:
docker compose logs -f swh-lister