Software Heritage - Development Documentation¶
Getting started¶
Run your own Software Heritage → deploy a local copy of the Software Heritage software stack in less than 5 minutes, or
Developer setup → get a working development setup that allows to hack on the Software Heritage software stack
Architecture¶
Software Architecture → get a glimpse of the Software Heritage software architecture
Mirroring → learn what a Software Heritage mirror is and how to set up one
Data Model and Specifications¶
SoftWare Heritage persistent IDentifiers (SWHIDs) Specifications of the SoftWare Heritage persistent IDentifiers (SWHID).
Data model Documentation of the main Software Heritage archive data model.
Software Heritage Journal — Specifications Documentation of the Kafka journal of the Software Heritage archive.
Components¶
Here is brief overview of the most relevant software components in the Software Heritage stack. Each component name is linked to the development documentation of the corresponding Python module.
- swh.core
low-level utilities and helpers used by almost all other modules in the stack
- swh.dataset
public datasets and periodic data dumps of the archive released by Software Heritage
- swh.deposit
push-based deposit of software artifacts to the archive
- swh.docs
developer documentation (used to generate this doc you are reading)
- swh.fuse
Virtual file system to browse the Software Heritage archive, based on FUSE
- swh.graph
Fast, compressed, in-memory representation of the archive, with tooling to generate and query it.
- swh.indexer
tools and workers used to crawl the content of the archive and extract derived information from any artifact stored in it
- swh.journal
persistent logger of changes to the archive, with publish-subscribe support
- swh.lister
collection of listers for all sorts of source code hosting and distribution places (forges, distributions, package managers, etc.)
- swh.loader-core
low-level loading utilities and helpers used by all other loaders
- swh.loader-git
loader for Git repositories
- swh.loader-mercurial
loader for Mercurial repositories
- swh.loader-svn
loader for Subversion repositories
- swh.model
implementation of the Data model to archive source code artifacts
- swh.objstorage
content-addressable object storage
- swh.objstorage.replayer
Object storage replication tool
- swh.scanner
source code scanner to analyze code bases and compare them with source code artifacts archived by Software Heritage
- swh.scheduler
task manager for asynchronous/delayed tasks, used for recurrent (e.g., listing a forge, loading new stuff from a Git repository) and one-off activities (e.g., loading a specific version of a source package)
- swh.search
search engine for the archive
- swh.storage
abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata
- swh.vault
implementation of the vault service, allowing to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.)
- swh.web
Web application(s) to browse the archive, for both interactive (HTML UI) and mechanized (REST API) use
- swh.web.client
Python client for swh.web
Dependencies¶
The dependency relationships among the various modules are depicted below.
Dependencies among top-level Python modules (click to zoom).¶
Archive¶
Archive ChangeLog: notable changes to the archive over time