# Copyright (C) 2022 the Software Heritage developers
# License: GNU General Public License version 3, or any later version
# See top-level LICENSE file for more information

AUR (Arch User Repository) lister

The AUR lister list origins from ``_, the Arch User Repository.
For each package, there is a git repository, we use the git url as origin and the
snapshot url as the artifact for the loader to download.

Each git repository consist of a directory (for which name corresponds to the package name),
and at least two files, .SRCINFO and PKGBUILD which are recipes for building the package.

Each package has a version, the latest one. There isn't any archives of previous versions,
so the lister will always list one version per package.

As of August 2022 ``_ list 84438 packages. Please note that this amount
is the total of `regular`_ and `split`_ packages.
We will archive `regular`  and `split` packages but only their `pkgbase` because that is
the only one that actually has source code.
The packages amount is 78554 after removing the split ones.

Origins retrieving strategy

An rpc api exists but it is recommended to save bandwidth so it's not used. See
`New AUR Metadata Archives`_ for more on this topic.

To get an index of all AUR existing packages we download a `packages-meta-v1.json.gz`_
which contains a json file listing all existing packages definitions.

Each entry describes the latest released version of a package. The origin url
for a package is built using `pkgbase` and corresponds to a git repository.

Note that we list only standard package (when pkgbase equal pkgname), not the ones
belonging to split packages.

It takes only a couple of minutes to download the 7 MB index archive and parses its

Page listing

Each page is related to one package. As its not possible to get all previous
versions, it will always returns one line.

Each page corresponds to a package with a `version`, an `url` for a Git
repository, a `project_url` which represents the upstream project url and
a canonical `snapshot_url` from which a tar.gz archive of the package can
be downloaded.

The data schema for each line is:

* **pkgname**: Package name
* **version**: Package version
* **url**: Git repository url for a package
* **snapshot_url**: Package download url
* **project_url**: Upstream project url if any
* **last_modified**: Iso8601 last update date

Origins from page

The lister yields one origin per page.
The origin url corresponds to the git url of a package, for example ``{package}.git``.

Additionally we add some data set to "extra_loader_arguments":

* **artifacts**: Represent data about the Aur package snapshot to download,
  following :ref:`original-artifacts-json specification <extrinsic-metadata-original-artifacts-json>`
* **aur_metadata**: To store all other interesting attributes that do not belongs to artifacts.

Origin data example::

        "visit_type": "aur",
        "url": "",
        "extra_loader_arguments": {
            "artifacts": [
                    "filename": "hg-evolve.tar.gz",
                    "url": "",  # noqa: B950
                    "version": "10.5.1-1",
            "aur_metadata": [
                    "version": "10.5.1-1",
                    "project_url": "",
                    "last_update": "2022-04-27T20:02:56+00:00",
                    "pkgname": "hg-evolve",

Running tests

Activate the virtualenv and run from within swh-lister directory::

   pytest -s -vv --log-cli-level=DEBUG swh/lister/aur/tests

Testing with Docker

Change directory to swh/docker then launch the docker environment::

   docker compose up -d

Then schedule an aur listing task::

   docker compose exec swh-scheduler swh scheduler task add -p oneshot list-aur

You can follow lister execution by displaying logs of swh-lister service::

   docker compose logs -f swh-lister

[docs] def register(): from .lister import AurLister return { "lister": AurLister, "task_modules": ["%s.tasks" % __name__], }