swh.lister.arch.lister module#

class swh.lister.arch.lister.ArchLister(scheduler: SchedulerInterface, url: str = 'https://archlinux.org', instance: str = 'arch', credentials: Dict[str, Dict[str, List[Dict[str, str]]]] | None = None, max_origins_per_page: int | None = None, max_pages: int | None = None, enable_origins: bool = True, flavours: Dict[str, Any] = {'arm': {'archs': ['armv7h', 'aarch64'], 'base_api_url': '', 'base_archive_url': '', 'base_info_url': 'https://archlinuxarm.org', 'base_mirror_url': 'https://uk.mirror.archlinuxarm.org', 'repos': ['core', 'extra', 'community']}, 'official': {'archs': ['x86_64'], 'base_api_url': 'https://archlinux.org', 'base_archive_url': 'https://archive.archlinux.org', 'base_info_url': 'https://archlinux.org', 'base_mirror_url': '', 'repos': ['core', 'extra', 'community']}})[source]#

Bases: StatelessLister[List[Dict[str, Any]]]

List Arch linux origins from ‘core’, ‘extra’, and ‘community’ repositories

For ‘official’ Arch Linux it downloads core.tar.gz, extra.tar.gz and community.tar.gz from https://archive.archlinux.org/repos/last/ extract to a temp directory and then walks through each ‘desc’ files.

Each ‘desc’ file describe the latest released version of a package and helps to build an origin url from where scrapping artifacts metadata.

For ‘arm’ Arch Linux it follow the same discovery process parsing ‘desc’ files. The main difference is that we can’t get existing versions of an arm package because https://archlinuxarm.org does not have an ‘archive’ website or api.

LISTER_NAME: str = 'arch'#
VISIT_TYPE = 'arch'#
INSTANCE = 'arch'#
BASE_URL = 'https://archlinux.org'#
ARCH_PACKAGE_URL_PATTERN = '{base_url}/packages/{repo}/{arch}/{pkgname}'#
ARCH_PACKAGE_VERSIONS_URL_PATTERN = '{base_url}/packages/{pkgname[0]}/{pkgname}'#
ARCH_PACKAGE_DOWNLOAD_URL_PATTERN = '{base_url}/packages/{pkgname[0]}/{pkgname}/{filename}'#
ARCH_API_URL_PATTERN = '{base_url}/packages/{repo}/{arch}/{pkgname}/json'#
ARM_PACKAGE_URL_PATTERN = '{base_url}/packages/{arch}/{pkgname}'#
ARM_PACKAGE_DOWNLOAD_URL_PATTERN = '{base_url}/{arch}/{repo}/{filename}'#
scrap_package_versions(name: str, repo: str, base_url: str) List[Dict[str, Any]][source]#

Given a package ‘name’ and ‘repo’, make an http call to origin url and parse its content to get package versions artifacts data. That method is suitable only for ‘official’ Arch Linux, not ‘arm’.

  • name – Package name

  • repo – The repository the package belongs to (one of self.repos)


A list of dict of version


    {"url": "https://archive.archlinux.org/packages/d/dialog/dialog-1:1.3_20190211-1-x86_64.pkg.tar.xz",  # noqa: B950
    "arch": "x86_64",
    "repo": "core",
    "name": "dialog",
    "version": "1:1.3_20190211-1",
    "filename": "dialog-1:1.3_20190211-1-x86_64.pkg.tar.xz",
    "last_modified": "2019-02-13T08:36:00"},

get_repo_archive(url: str, destination_path: Path) Path[source]#

Given an url and a destination path, retrieve and extract .tar.gz archive which contains ‘desc’ file for each package. Each .tar.gz archive corresponds to an Arch Linux repo (‘core’, ‘extra’, ‘community’).

  • url – url of the .tar.gz archive to download

  • destination_path – the path on disk where to extract archive


a directory Path where the archive has been extracted to.

parse_desc_file(path: Path, repo: str, base_url: str, dl_url_fmt: str) Dict[str, Any][source]#

Extract package information from a ‘desc’ file. There are subtle differences between parsing ‘official’ and ‘arm’ des files

  • path – A path to a ‘desc’ file on disk

  • repo – The repo the package belongs to


A dict of metadata


{'api_url': 'https://archlinux.org/packages/core/x86_64/dialog/json',
 'arch': 'x86_64',
 'base': 'dialog',
 'builddate': '1650081535',
 'csize': '203028',
 'desc': 'A tool to display dialog boxes from shell scripts',
 'filename': 'dialog-1:1.3_20220414-1-x86_64.pkg.tar.zst',
 'isize': '483988',
 'license': 'LGPL2.1',
 'md5sum': '06407c0cb11c50d7bf83d600f2e8107c',
 'name': 'dialog',
 'packager': 'Evangelos Foutras <foutrelis@archlinux.org>',
 'pgpsig': 'pgpsig content xxx',
 'project_url': 'https://invisible-island.net/dialog/',
 'provides': 'libdialog.so=15-64',
 'repo': 'core',
 'sha256sum': 'ef8c8971f591de7db0f455970ef5d81d5aced1ddf139f963f16f6730b1851fa7',
 'url': 'https://archive.archlinux.org/packages/.all/dialog-1:1.3_20220414-1-x86_64.pkg.tar.zst',  # noqa: B950
 'version': '1:1.3_20220414-1'}

get_pages() Iterator[List[Dict[str, Any]]][source]#

Yield an iterator sorted by name in ascending order of pages.

Each page is a list of package belonging to a flavour (‘official’, ‘arm’), and a repo (‘core’, ‘extra’, ‘community’)

get_origins_from_page(page: List[Dict[str, Any]]) Iterator[ListedOrigin][source]#

Iterate on all arch pages and yield ListedOrigin instances.