Source code for swh.lister.hackage

# Copyright (C) 2022  The Software Heritage developers
# See the AUTHORS file at the top-level directory of this distribution
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information


"""
Hackage lister
==============

The Hackage lister list origins from `hackage.haskell.org`_, the `Haskell`_ Package
Repository.

The registry provide an `http api`_ from where the lister retrieve package names
and build origins urls.

As of August 2022 `hackage.haskell.org`_ list 15536 package names.

Origins retrieving strategy
---------------------------

To get a list of all package names we make a POST call to
``https://hackage.haskell.org/packages/search`` endpoint with some params given as
json data.

Default params::

    {
        "page": 0,
        "sortColumn": "default",
        "sortDirection": "ascending",
        "searchQuery": "(deprecated:any)",
    }

The page size is 50. The lister will make has much http api call has needed to get
all results.

For incremental mode we expand the search query with ``lastUpload`` greater than
``state.last_listing_date``, the api will return all new or updated package names since
last run.

Page listing
------------

The result is paginated, each page is 50 records long.

Entry data set example::

    {
        "description": "3D model parsers",
        "downloads": 6,
        "lastUpload": "2014-11-08T03:55:23.879047Z",
        "maintainers": [{"display": "capsjac", "uri": "/user/capsjac"}],
        "name": {"display": "3dmodels", "uri": "/package/3dmodels"},
        "tags": [
            {"display": "graphics", "uri": "/packages/tag/graphics"},
            {"display": "lgpl", "uri": "/packages/tag/lgpl"},
            {"display": "library", "uri": "/packages/tag/library"},
        ],
        "votes": 1.5,
    }

Origins from page
-----------------

The lister yields 50 origins url per page.
Each ListedOrigin has a ``last_update`` date set.

Running tests
-------------

Activate the virtualenv and run from within swh-lister directory::

   pytest -s -vv --log-cli-level=DEBUG swh/lister/hackage/tests

Testing with Docker
-------------------

Change directory to swh/docker then launch the docker environment::

   docker compose up -d

Then schedule an Hackage listing task::

   docker compose exec swh-scheduler swh scheduler task add -p oneshot list-hackage

You can follow lister execution by displaying logs of swh-lister service::

   docker compose logs -f swh-lister

.. _hackage.haskell.org: https://hackage.haskell.org/
.. _Haskell: https://haskell.org/
.. _http api: https://hackage.haskell.org/api
"""


[docs] def register(): from .lister import HackageLister return { "lister": HackageLister, "task_modules": ["%s.tasks" % __name__], }