swh.lister.aur package#
Submodules#
- swh.lister.aur.lister module
AurLister
AurLister.LISTER_NAME
AurLister.VISIT_TYPE
AurLister.INSTANCE
AurLister.BASE_URL
AurLister.DEFAULT_PACKAGES_INDEX_URL
AurLister.PACKAGE_VCS_URL_PATTERN
AurLister.PACKAGE_SNAPSHOT_URL_PATTERN
AurLister.ORIGIN_URL_PATTERN
AurLister.download_packages_index()
AurLister.get_pages()
AurLister.get_origins_from_page()
- swh.lister.aur.tasks module
Module contents#
AUR (Arch User Repository) lister#
The AUR lister list origins from aur.archlinux.org, the Arch User Repository. For each package, there is a git repository, we use the git url as origin and the snapshot url as the artifact for the loader to download.
Each git repository consist of a directory (for which name corresponds to the package name), and at least two files, .SRCINFO and PKGBUILD which are recipes for building the package.
Each package has a version, the latest one. There isn’t any archives of previous versions, so the lister will always list one version per package.
As of August 2022 aur.archlinux.org list 84438 packages. Please note that this amount is the total of regular and split packages. We will archive regular and split packages but only their pkgbase because that is the only one that actually has source code. The packages amount is 78554 after removing the split ones.
Origins retrieving strategy#
An rpc api exists but it is recommended to save bandwidth so it’s not used. See New AUR Metadata Archives for more on this topic.
To get an index of all AUR existing packages we download a packages-meta-v1.json.gz which contains a json file listing all existing packages definitions.
Each entry describes the latest released version of a package. The origin url for a package is built using pkgbase and corresponds to a git repository.
Note that we list only standard package (when pkgbase equal pkgname), not the ones belonging to split packages.
It takes only a couple of minutes to download the 7 MB index archive and parses its content.
Page listing#
Each page is related to one package. As its not possible to get all previous versions, it will always returns one line.
Each page corresponds to a package with a version, an url for a Git repository, a project_url which represents the upstream project url and a canonical snapshot_url from which a tar.gz archive of the package can be downloaded.
The data schema for each line is:
pkgname: Package name
version: Package version
url: Git repository url for a package
snapshot_url: Package download url
project_url: Upstream project url if any
last_modified: Iso8601 last update date
Origins from page#
The lister yields one origin per page.
The origin url corresponds to the git url of a package, for example https://aur.archlinux.org/{package}.git
.
Additionally we add some data set to “extra_loader_arguments”:
artifacts: Represent data about the Aur package snapshot to download, following original-artifacts-json specification
aur_metadata: To store all other interesting attributes that do not belongs to artifacts.
Origin data example:
{
"visit_type": "aur",
"url": "https://aur.archlinux.org/hg-evolve.git",
"extra_loader_arguments": {
"artifacts": [
{
"filename": "hg-evolve.tar.gz",
"url": "https://aur.archlinux.org/cgit/aur.git/snapshot/hg-evolve.tar.gz", # noqa: B950
"version": "10.5.1-1",
}
],
"aur_metadata": [
{
"version": "10.5.1-1",
"project_url": "https://www.mercurial-scm.org/doc/evolution/",
"last_update": "2022-04-27T20:02:56+00:00",
"pkgname": "hg-evolve",
}
],
},
Running tests#
Activate the virtualenv and run from within swh-lister directory:
pytest -s -vv --log-cli-level=DEBUG swh/lister/aur/tests
Testing with Docker#
Change directory to swh/docker then launch the docker environment:
docker compose up -d
Then schedule an aur listing task:
docker compose exec swh-scheduler swh scheduler task add -p oneshot list-aur
You can follow lister execution by displaying logs of swh-lister service:
docker compose logs -f swh-lister