An instance of the Software Heritage data store.
A component dedicated at replicating an archive and ensure there are enough copies of each element to ensure resiliency.
Archival Resource Key (ARK) is a Uniform Resource Locator (URL) that is a multi-purpose persistent identifier for information objects of any type.
software artifact
An artifact is one of many kinds of tangible by-products produced during the development of software.
A (specific version of a) file stored in the archive, identified by its cryptographic hashes (SHA1, “git-like” SHA1, SHA256) and its size. Also known as: blob. Note: it is incorrect to refer to Contents as “files”, because files are usually considered to be named, whereas Contents are nameless. It is only in the context of specific directories that contents acquire (local) names.
A set of named pointers to contents (file entries), directories (directory entries) and revisions (revision entries). All entries are associated to the local name of the entry (i.e., a relative path without any path separator) and permission metadata (e.g., chmod value or equivalent).
A Digital Object Identifier or DOI is a persistent identifier or handle used to uniquely identify objects, standardized by the International Organization for Standardization (ISO).
extrinsic metadata
Non-executable information obtained outside source code artifacts, such as from a forge API. See also: intrinsic metadata.
The journal is the persistent logger of the Software Heritage architecture in charge of logging changes of the archive, with publish-subscribe support.
A lister is a component of the Software Heritage architecture that is in charge of enumerating the software origin (e.g., VCS, packages, etc.) available at a source code distribution place.
A loader is a component of the Software Heritage architecture responsible for reading a source code origin (typically a git reposiitory) and import or update its content in the archive (ie. add new file contents int object storage and repository structure in the storage database).
cryptographic hash
A fixed-size “summary” of a stream of bytes that is easy to compute, and hard to reverse. (Cryptographic hash function Wikipedia article) also known as: checksum, digest.
A component of the Software Heritage architecture dedicated to producing metadata linked to the known blobs in the archive.
intrinsic metadata
Non-executable information extracted from code artifacts, such as license headers, debian/control, or package.json. See also: extrinsic metadata.
object store
object storage
Content-addressable object storage. It is the place where actual object blobs objects are stored.
software origin
data source
A location from which a coherent set of sources has been obtained, like a git repository, a directory containing tarballs, etc.
An entity referenced by a revision as either the author or the committer of the corresponding change. A person is associated to a full name and/or an email address.
a revision that has been marked as noteworthy with a specific name (e.g., a version number), together with associated development metadata (e.g., author, timestamp, etc).
A point in time snapshot of the content of a directory, together with associated development metadata (e.g., author, timestamp, log message, etc).
The component of the Software Heritage architecture dedicated to the management and the prioritization of the many tasks.
the state of all visible branches during a specific visit of an origin
storage database
The main database of the Software Heritage platform in which the all the elements of the Data model but the content are stored as a Merkle DAG.
type of origin
Information about the kind of hosting, e.g., whether it is a forge, a collection of repositories, an homepage publishing tarball, or a one shot source code repository. For all kind of repositories please specify which VCS system is in use (Git, SVN, CVS, etc.) object.
vault service
User-facing service that allows to retrieve parts of the archive as self-contained bundles (e.g., individual releases, entire repository snapshots, etc.)
The passage of Software Heritage on a given origin, to retrieve all source code and metadata available there at the time. A visit object stores the state of all visible branches (if any) available at the origin at visit time; each of them points to a revision object in the archive. Future visits of the same origin will create new visit objects, without removing previous ones.