swh.lister.aur package#

Submodules#

Module contents#

AUR (Arch User Repository) lister#

The AUR lister list origins from aur.archlinux.org, the Arch User Repository. For each package, there is a git repository, we use the git url as origin and the snapshot url as the artifact for the loader to download.

Each git repository consist of a directory (for which name corresponds to the package name), and at least two files, .SRCINFO and PKGBUILD which are recipes for building the package.

Each package has a version, the latest one. There isn’t any archives of previous versions, so the lister will always list one version per package.

As of August 2022 aur.archlinux.org list 84438 packages. Please note that this amount is the total of regular and split packages. We will archive regular and split packages but only their pkgbase because that is the only one that actually has source code. The packages amount is 78554 after removing the split ones.

Origins retrieving strategy#

An rpc api exists but it is recommended to save bandwidth so it’s not used. See New AUR Metadata Archives for more on this topic.

To get an index of all AUR existing packages we download a packages-meta-v1.json.gz which contains a json file listing all existing packages definitions.

Each entry describes the latest released version of a package. The origin url for a package is built using pkgbase and corresponds to a git repository.

Note that we list only standard package (when pkgbase equal pkgname), not the ones belonging to split packages.

It takes only a couple of minutes to download the 7 MB index archive and parses its content.

Page listing#

Each page is related to one package. As its not possible to get all previous versions, it will always returns one line.

Each page corresponds to a package with a version, an url for a Git repository, a project_url which represents the upstream project url and a canonical snapshot_url from which a tar.gz archive of the package can be downloaded.

The data schema for each line is:

  • pkgname: Package name

  • version: Package version

  • url: Git repository url for a package

  • snapshot_url: Package download url

  • project_url: Upstream project url if any

  • last_modified: Iso8601 last update date

Origins from page#

The lister yields one origin per page. The origin url corresponds to the git url of a package, for example https://aur.archlinux.org/{package}.git.

Additionally we add some data set to “extra_loader_arguments”:

  • artifacts: Represent data about the Aur package snapshot to download, following original-artifacts-json specification

  • aur_metadata: To store all other interesting attributes that do not belongs to artifacts.

Origin data example:

{
    "visit_type": "aur",
    "url": "https://aur.archlinux.org/hg-evolve.git",
    "extra_loader_arguments": {
        "artifacts": [
            {
                "filename": "hg-evolve.tar.gz",
                "url": "https://aur.archlinux.org/cgit/aur.git/snapshot/hg-evolve.tar.gz",  # noqa: B950
                "version": "10.5.1-1",
            }
        ],
        "aur_metadata": [
            {
                "version": "10.5.1-1",
                "project_url": "https://www.mercurial-scm.org/doc/evolution/",
                "last_update": "2022-04-27T20:02:56+00:00",
                "pkgname": "hg-evolve",
            }
        ],
    },

Running tests#

Activate the virtualenv and run from within swh-lister directory:

pytest -s -vv --log-cli-level=DEBUG swh/lister/aur/tests

Testing with Docker#

Change directory to swh/docker then launch the docker environment:

docker compose up -d

Then schedule an aur listing task:

docker compose exec swh-scheduler swh scheduler task add -p oneshot list-aur

You can follow lister execution by displaying logs of swh-lister service:

docker compose logs -f swh-lister
swh.lister.aur.register()[source]#