.. _swh-graph-libs-toposort: Topological order ================= The `swh_graph_topology crate `_ supports topologically sorting nodes, and writing both a topological order and the generation number of each node (aka. depth of the sub-DAG of nodes reachable from that node). Both can be either forward (ori->snp->rel->rev->dir->cnt) or backward (cnt->dir->rev->rel->snp->ori). |swh| provides `precomputed versions of this dataset `__. Definitions ----------- In a forward topological order, each node is present after any of its ancestors and before any of its descendants. For example, a snapshot node is after any origin that contains it, and before any revision it contains; and a revision is after all of its child revisions (which are its predecessors/ancestors in the forward graph) and before all of its parent revisions (which are its successors/descendants in the forward graph). Forward generation numbers, or forward depths, are defined as the maximum length of a path from a root (typically an origin) to a particular node. An origin's forward depth is always 0, a snapshot's is always 1, a release's is almost always 2 (except for releases pointed by other releases), and a revision's is 3 if it does not have any child revision (which are predecessors/ancestors in the forward graph). The largest generation number in a complete graph (or subgraph with no dangling edges) is always a content object (though it can technically be the empty directory too). Symmetrically, in a backward topological order, each node is present after any of its descendants and before any of its ancestors. And backward generation numbers, or backward depths, are the maximum length of a path from a particular node to a leaf (typically a content in the full graph, or a root revision in the history-and-hosting graph). A content's backward generation number is always 0, and so is the empty directory's. The largest generation number in a complete graph is almost always an origin, though it can also be an orphan node, ie. a node present in no origin. File formats ------------ Depths are a plain array of ``u32``, which can be used as a map from node ids (consecutive integers from 0 to ``graph.num_nodes()``) to each node's depth. Generations are written to a compact format using two `bitstreams `_: one for node ids and one for offsets. Integers in the bitstreams are `gamma `_-encoded. They are written using this algorithm: * Write ``0`` to the offsets bitstream, gamma-encoded (for backward compatibility) * for each generation: * Initialize ``previous_node`` and ``num_written_bits`` to 0 * For each ``node`` in the generation: * Write ``node - previous_node`` to the nodes bitstream, gamma-encoded * Increment ``num_written_bits`` by the length of the gamma-encoding of ``node - previous_node`` * Write ``num_written_bits`` to the offset bitstream, gamma-encoded * Write ``0``, gamma-encoded, to mark the end of the file Computing the cumulative-sum of the offset bitstream allows computing the start and end of an arbitrary generation in the nodes bitstream. Given the start and end of a generation in the nodes bitstream, the generation is the computed cumulative sum of each gamma-encoded difference in the generation. Note: We could actually write ``node - previous_node - 1`` instead of ``node - previous_node`` in the nodes bitstream to save bits (eg. if ``previous_node + 1 = node``, the gamma-encoding would take only 1 bit instead of 4), but don't for backward compatibility. Implementation -------------- This is a BFS-like traversal implemented in :file:`topology/src/bin/generations.rs`. Generations are written to a bitstream while traversing, and an array with the depths is populated, then written at the end. Writing and reading are implemented in :file:`topology/src/generations.rs` and available in `the public API `__