Source code for swh.lister.aur

# Copyright (C) 2022 the Software Heritage developers
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information


"""
AUR (Arch User Repository) lister
=================================

The AUR lister list origins from `aur.archlinux.org`_, the Arch User Repository.
For each package, there is a git repository, we use the git url as origin and the
snapshot url as the artifact for the loader to download.

Each git repository consist of a directory (for which name corresponds to the package name),
and at least two files, .SRCINFO and PKGBUILD which are recipes for building the package.

Each package has a version, the latest one. There isn't any archives of previous versions,
so the lister will always list one version per package.

As of August 2022 `aur.archlinux.org`_ list 84438 packages. Please note that this amount
is the total of `regular`_ and `split`_ packages.
We will archive `regular`  and `split` packages but only their `pkgbase` because that is
the only one that actually has source code.
The packages amount is 78554 after removing the split ones.

Origins retrieving strategy
---------------------------

An rpc api exists but it is recommended to save bandwidth so it's not used. See
`New AUR Metadata Archives`_ for more on this topic.

To get an index of all AUR existing packages we download a `packages-meta-v1.json.gz`_
which contains a json file listing all existing packages definitions.

Each entry describes the latest released version of a package. The origin url
for a package is built using `pkgbase` and corresponds to a git repository.

Note that we list only standard package (when pkgbase equal pkgname), not the ones
belonging to split packages.

It takes only a couple of minutes to download the 7 MB index archive and parses its
content.

Page listing
------------

Each page is related to one package. As its not possible to get all previous
versions, it will always returns one line.

Each page corresponds to a package with a `version`, an `url` for a Git
repository, a `project_url` which represents the upstream project url and
a canonical `snapshot_url` from which a tar.gz archive of the package can
be downloaded.

The data schema for each line is:

* **pkgname**: Package name
* **version**: Package version
* **url**: Git repository url for a package
* **snapshot_url**: Package download url
* **project_url**: Upstream project url if any
* **last_modified**: Iso8601 last update date

Origins from page
-----------------

The lister yields one origin per page.
The origin url corresponds to the git url of a package, for example ``https://aur.archlinux.org/{package}.git``.

Additionally we add some data set to "extra_loader_arguments":

* **artifacts**: Represent data about the Aur package snapshot to download,
  following :ref:`original-artifacts-json specification <extrinsic-metadata-original-artifacts-json>`
* **aur_metadata**: To store all other interesting attributes that do not belongs to artifacts.

Origin data example::

    {
        "visit_type": "aur",
        "url": "https://aur.archlinux.org/hg-evolve.git",
        "extra_loader_arguments": {
            "artifacts": [
                {
                    "filename": "hg-evolve.tar.gz",
                    "url": "https://aur.archlinux.org/cgit/aur.git/snapshot/hg-evolve.tar.gz",  # noqa: B950
                    "version": "10.5.1-1",
                }
            ],
            "aur_metadata": [
                {
                    "version": "10.5.1-1",
                    "project_url": "https://www.mercurial-scm.org/doc/evolution/",
                    "last_update": "2022-04-27T20:02:56+00:00",
                    "pkgname": "hg-evolve",
                }
            ],
        },

Running tests
-------------

Activate the virtualenv and run from within swh-lister directory::

   pytest -s -vv --log-cli-level=DEBUG swh/lister/aur/tests

Testing with Docker
-------------------

Change directory to swh/docker then launch the docker environment::

   docker compose up -d

Then schedule an aur listing task::

   docker compose exec swh-scheduler swh scheduler task add -p oneshot list-aur

You can follow lister execution by displaying logs of swh-lister service::

   docker compose logs -f swh-lister

.. _aur.archlinux.org: https://aur.archlinux.org
.. _New AUR Metadata Archives: https://lists.archlinux.org/pipermail/aur-general/2021-November/036659.html
.. _packages-meta-v1.json.gz: https://aur.archlinux.org/packages-meta-v1.json.gz
.. _regular: https://wiki.archlinux.org/title/PKGBUILD#Package_name
.. _split: https://man.archlinux.org/man/PKGBUILD.5#PACKAGE_SPLITTING
"""


[docs] def register(): from .lister import AurLister return { "lister": AurLister, "task_modules": ["%s.tasks" % __name__], }