swh.lister.rubygems package#
Submodules#
- swh.lister.rubygems.lister module
RubyGemsLister
RubyGemsLister.LISTER_NAME
RubyGemsLister.VISIT_TYPE
RubyGemsLister.INSTANCE
RubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_BASE_URL
RubyGemsLister.RUBY_GEMS_POSTGRES_DUMP_LIST_URL
RubyGemsLister.RUBY_GEM_DOWNLOAD_URL_PATTERN
RubyGemsLister.RUBY_GEM_ORIGIN_URL_PATTERN
RubyGemsLister.RUBY_GEM_EXTRINSIC_METADATA_URL_PATTERN
RubyGemsLister.DB_NAME
RubyGemsLister.DUMP_SQL_PATH
RubyGemsLister.get_latest_dump_file()
RubyGemsLister.create_rubygems_db()
RubyGemsLister.populate_rubygems_db()
RubyGemsLister.get_pages()
RubyGemsLister.get_origins_from_page()
- swh.lister.rubygems.tasks module
Module contents#
RubyGems lister#
The RubyGems lister list origins from RubyGems.org, the Ruby community’s gem hosting service.
As of September 2022 RubyGems.org list 173384 package names.
Origins retrieving strategy#
To get a list of all package names we call an http endpoint which returns a list of gems as text.
Page listing#
Each page returns an origin url based on the following pattern:
https://rubygems.org/gems/{pkgname}
Origins from page#
The lister yields one origin url per page.
Running tests#
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/rubygems/tests
Testing with Docker#
Change directory to swh/docker then launch the docker environment:
docker compose up -d
Then schedule a RubyGems listing task:
docker compose exec swh-scheduler swh scheduler task add -p oneshot list-rubygems
You can follow lister execution by displaying logs of swh-lister service:
docker compose logs -f swh-lister