swh-graph glossary#
Graphs#
- backward graph
Transposed version of the “normal” graph that goes opposite to Software Heritage’s Merkle DAG: contents point to directories, directories point to their parent directories and to revisions, revisions point to their child revisions and to snapshots, and snapshots to origins.
- forward graph
The “normal” graph that follows Software Heritage’s Merkle DAG: origins point to snapshots, snapshots to revisions and releases, revisions to their parent revisions and to directories, directories to contents.
- full graph
A graph with tens of billions of nodes and about a trillion arcs, with all nodes in the Software Heritage archive.
- history-and-hosting graph
An independently compressed subset of the full graph, that only contains origins, snapshots, releases, revisions; and the few directory and content, but does not have their properties (content length, or whether they are skipped).
Node ids in the history-and-hosting graph are contiguous
Not to be confused with the “history-and-hosting layer”, which is a subset of the full graph with mostly the same nodes.
- symmetric graph
Union of a forward and a corresponding backward graph. Rarely used in practice, as it breaks the (very useful) property that Software Heritage graphs are DAGs.
Labels and properties#
- label
Extra information on some arcs. In the forward graph,
ori->snparcs have labels that each contain a visit date and a visit date;snp->revandsnp->relarcs have labels that each contain a branch name identifier; anddir->rev,dir->dir, anddir->cnthave labels that each contain a file name identifier and a permission.- property
Extra information on some nodes. Origins have a URL, releases and revisions have many (message, name, author identifier, …), and contents have a length.
Subgraphs#
- filesystem layer
A subset of the full graph, that only contains directory and content nodes.
It is the complement of the “history-and-hosting layer”.
- history-and-hosting layer
A subset of the full graph, that only contains origins, snapshots, releases, and revisions nodes.
It is the complement of the “filesystem layer”.
Not to be confused with the “history-and-hosting graph”, which is a subset of the full graph with mostly the same nodes.
Relations between nodes#
- ancestor
A node
Ais an ancestor ofN_0in a graphGif there is a chain of nodesN_1, …,N_nsuch that for all i,N_iis a predecessor ofN_{i-1}inG. In other words, ancestors are the transitive closure of predecessors. In the forward graph, all non-orphan nodes have at least one origin as ancestor.- descendant
A node
Dis a descendant ofN_0in a graphGif there is a chain of nodesN_1, …,N_nsuch that for all i,N_iis a successor ofN_{i-1}inG. In other words, descendents are the transitive closure of successors. In the forward graph, all non-empty nodes have at least one content as descendant.- predecessor
A node
Pis a predecessor of nodeNin a graphGifGcontains the arcP -> N. In a forward graph, origins are successors of snapshots, revisions are predecessors of their parent revisions, directories are predecessor of the subdirectories and contents they contain.- successor
A node
Sis a successor of nodeNin a graphGifGcontains the arcN -> S. In a forward graph, snapshots are successors of origins, revisions are successors of their child revisions, directories are successors of their parent directory, and contents are successors of directories.
“Special” nodes#
- empty content
The content which contains zero bytes. Its SWHID is
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391.- empty directory
The directory which does not have any successor in the forward graph (and does not have any predecessor in the backward graph). Its SWHID is
swh:1:dir:4b825dc642cb6eb9a060e54bf8d69288fbee4904- merge
A revision with more than one parent revision (ie. more than one successor of type revision, in the forward graph). Also called an “octopus merge” when there are more than two.
- root directory of a revision
The directory that is the successor of a revision in the forward graph. It is guaranteed to be unique.
- root revision
A revision with no parent revision (ie. no successor of type revision, in the forward graph).
rev1in the images above.