swh-graph glossary#

Graphs#

backward graph

Transposed version of the “normal” graph that goes opposite to Software Heritage’s Merkle DAG: contents point to directories, directories point to their parent directories and to revisions, revisions point to their child revisions and to snapshots, and snapshots to origins.

forward graph

The “normal” graph that follows Software Heritage’s Merkle DAG: origins point to snapshots, snapshots to revisions and releases, revisions to their parent revisions and to directories, directories to contents.

full graph

A graph with tens of billions of nodes and about a trillion arcs, with all nodes in the Software Heritage archive.

history-and-hosting graph

An independently compressed subset of the full graph, that only contains origins, snapshots, releases, revisions; and the few directory and content, but does not have their properties (content length, or whether they are skipped).

Node ids in the history-and-hosting graph are contiguous

Not to be confused with the “history-and-hosting layer”, which is a subset of the full graph with mostly the same nodes.

symmetric graph

Union of a forward and a corresponding backward graph. Rarely used in practice, as it breaks the (very useful) property that Software Heritage graphs are DAGs.

Labels and properties#

label

Extra information on some arcs. In the forward graph, ori->snp arcs have labels that each contain a visit date and a visit date; snp->rev and snp->rel arcs have labels that each contain a branch name identifier; and dir->rev, dir->dir, and dir->cnt have labels that each contain a file name identifier and a permission.

property

Extra information on some nodes. Origins have a URL, releases and revisions have many (message, name, author identifier, …), and contents have a length.

Subgraphs#

../../_images/graph_layers.svg
filesystem layer

A subset of the full graph, that only contains directory and content nodes.

It is the complement of the “history-and-hosting layer”.

history-and-hosting layer

A subset of the full graph, that only contains origins, snapshots, releases, and revisions nodes.

It is the complement of the “filesystem layer”.

Not to be confused with the “history-and-hosting graph”, which is a subset of the full graph with mostly the same nodes.

Relations between nodes#

../../_images/node_relations.svg
ancestor

A node A is an ancestor of N_0 in a graph G if there is a chain of nodes N_1, …, N_n such that for all i, N_i is a predecessor of N_{i-1} in G. In other words, ancestors are the transitive closure of predecessors. In the forward graph, all non-orphan nodes have at least one origin as ancestor.

descendant

A node D is a descendant of N_0 in a graph G if there is a chain of nodes N_1, …, N_n such that for all i, N_i is a successor of N_{i-1} in G. In other words, descendents are the transitive closure of successors. In the forward graph, all non-empty nodes have at least one content as descendant.

predecessor

A node P is a predecessor of node N in a graph G if G contains the arc P -> N. In a forward graph, origins are successors of snapshots, revisions are predecessors of their parent revisions, directories are predecessor of the subdirectories and contents they contain.

successor

A node S is a successor of node N in a graph G if G contains the arc N -> S. In a forward graph, snapshots are successors of origins, revisions are successors of their child revisions, directories are successors of their parent directory, and contents are successors of directories.

“Special” nodes#

empty content

The content which contains zero bytes. Its SWHID is e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.

empty directory

The directory which does not have any successor in the forward graph (and does not have any predecessor in the backward graph). Its SWHID is swh:1:dir:4b825dc642cb6eb9a060e54bf8d69288fbee4904

merge

A revision with more than one parent revision (ie. more than one successor of type revision, in the forward graph). Also called an “octopus merge” when there are more than two.

root directory of a revision

The directory that is the successor of a revision in the forward graph. It is guaranteed to be unique.

root revision

A revision with no parent revision (ie. no successor of type revision, in the forward graph). rev1 in the images above.