Source code for swh.lister.pubdev

# Copyright (C) 2022  The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information

""" lister

The Pubdev lister list origins from ``_, the `Dart`_ and `Flutter`_ packages registry.

The registry provide an `http api`_ from where the lister retrieve package names.

As of August 2022 ``_ list 33535 package names.

Origins retrieving strategy

To get a list of all package names we call `` endpoint.
There is no other way for discovery (no archive index, no database dump, no dvcs repository).

Origins from page

The lister yields all origin urls from a single page.

Getting last update date for each package

Before sending a listed pubdev origin to the scheduler, we query the
`{pkgname}` endpoint to get the last update date
for a package (date of its latest release). It enables Software Heritage to create
new loading task for a package only if it has new releases since last visit.

Running tests

Activate the virtualenv and run from within swh-lister directory::

   pytest -s -vv --log-cli-level=DEBUG swh/lister/pubdev/tests

Testing with Docker

Change directory to swh/docker then launch the docker environment::

   docker-compose up -d

Then schedule a pubdev listing task::

   docker compose exec swh-scheduler swh scheduler task add -p oneshot list-pubdev

You can follow lister execution by displaying logs of swh-lister service::

   docker compose logs -f swh-lister

.. _Dart:
.. _Flutter:
.. _http api:

[docs] def register(): from .lister import PubDevLister return { "lister": PubDevLister, "task_modules": ["%s.tasks" % __name__], }