swh.lister.cpan package#


Module contents#

Cpan lister#

The Cpan lister list origins from cpan.org, the Comprehensive Perl Archive Network. It provides search features via metacpan.org.

As of September 2022 cpan.org list 43675 package names.

Origins retrieving strategy#

To get a list of all package names and their associated release artifacts we call a first http api endpoint that retrieve results and a _scroll_id that will be used to scroll pages through search endpoint.

Page listing#

Each page returns a list of results which are raw data from api response.

Origins from page#

Origin url is the html page corresponding to a package name on metacpan.org, following this pattern:


Running tests#

Activate the virtualenv and run from within swh-lister directory:

pytest -s -vv --log-cli-level=DEBUG swh/lister/cpan/tests

Testing with Docker#

Change directory to swh/docker then launch the docker environment:

docker compose up -d

Then schedule a Cpan listing task:

docker compose exec swh-scheduler swh scheduler task add -p oneshot list-cpan

You can follow lister execution by displaying logs of swh-lister service:

docker compose logs -f swh-lister