Graph Querying HTTP API#
The Graph Querying API is a high-level HTTP API intended to run common, relatively simple traversal queries on the compressed graph.
The client/server architecture allows it to only load the graph in memory once then serve multiple different requests. However, it is limited in expressivity; more complex or resource-intensive queries should rather use the Low-level Rust API to run them as standalone programs.
Terminology#
This API uses the following notions:
Node: a node in the Software Heritage graph, represented by a SWHID.
Node type: the 3-letter specifier from the node SWHID (
cnt
,dir
,rel
,rev
,snp
,ori
), or*
for all node types.Edge type: a pair
src:dst
wheresrc
anddst
are either node types, or*
to denote all node types.Edge restrictions: a textual specification of which edges can be followed during graph traversal. Either
*
to denote that all edges can be followed or a comma separated list of edge types to allow following only those edges.Note that when traversing the backward (i.e., transposed) graph, edge types are reversed too. So, for instance,
ori:snp
makes sense when traversing the forward graph, but useless (due to lack of matching edges in the graph) when traversing the backward graph; converselysnp:ori
is useful when traversing the backward graph, but not in the forward one. For the same reasondir:dir
allows following edges from parent directories to sub-directories when traversing the forward graph, but the same restriction allows following edges from sub-directories to parent directories.Node restrictions: a textual specification of which type of nodes can be returned after a request. Either
*
to denote that all types of nodes can be returned or a comma separated list of node types to allow returning only those node types.
Examples#
swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
the SWHID of a node of type content containing the full text of the GPL3 license.swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35
the SWHID of a node of type revision corresponding to the commit in Linux that merged the ‘x86/urgent’ branch on 31 December 2017."dir:dir,dir:cnt"
node types allowing edges from directories to directories nodes, or directories to contents nodes."rev:rev,dir:*"
node types allowing edges from revisions to revisions nodes, or from directories nodes."*:rel"
node types allowing all edges to releases."cnt,snp"
accepted node types returned in the query results.
Endpoints#
Leaves#
- GET /graph/leaves/:src#
Performs a graph traversal and returns the leaves of the subgraph rooted at the specified source node.
- Parameters:
src (string) – source node specified as a SWHID
- Query Parameters:
edges (string) – edges types the traversal can follow; default to
"*"
direction (string) – direction in which graph edges will be followed; can be either
forward
orbackward
, default toforward
max_edges (integer) – how many edges can be traversed during the visit; default to 0 (not restricted)
return_types (string) – only return the nodes matching this type; default to
"*"
max_matching_nodes (integer) – how many results to return before stopping; default to 0 (not restricted)
- Status Codes:
200 OK – success
400 Bad Request – invalid query string provided
404 Not Found – starting node cannot be found
Example:
GET /graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323 HTTP/1.1 Content-Type: text/plain Transfer-Encoding: chunked
HTTP/1.1 200 OK swh:1:cnt:540faad6b1e02e2db4f349a4845192db521ff2bd swh:1:cnt:630585fc6d34e5e121139e2aee0a64e83dc9aae6 swh:1:cnt:f8634ced669f0a9155c8cab1b2621d57d778215e swh:1:cnt:ba6daa801ad3ea587904b1abe9161dceedb2e0bd ...
Neighbors#
- GET /graph/neighbors/:src#
Returns node direct neighbors (linked with exactly one edge) in the graph.
- Parameters:
src (string) – source node specified as a SWHID
- Query Parameters:
edges (string) – edges types allowed to be listed as neighbors; default to
"*"
direction (string) – direction in which graph edges will be followed; can be either
forward
orbackward
, default toforward
max_edges (integer) – how many edges can be traversed during the visit; default to 0 (not restricted)
return_types (string) – only return the nodes matching this type; default to
"*"
- Status Codes:
200 OK – success
400 Bad Request – invalid query string provided
404 Not Found – starting node cannot be found
Example:
GET /graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35 HTTP/1.1 Content-Type: text/plain Transfer-Encoding: chunked
HTTP/1.1 200 OK swh:1:rev:a31e58e129f73ab5b04016330b13ed51fde7a961 swh:1:dir:b5d2aa0746b70300ebbca82a8132af386cc5986d swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd ...
Walk#
Visit#
- GET /graph/visit/nodes/:src#
- GET /graph/visit/edges/:src#
Performs a graph traversal and returns explored nodes, edges or paths (in the order of the traversal).
- Parameters:
src (string) – starting node specified as a SWHID
- Query Parameters:
edges (string) – edges types the traversal can follow; default to
"*"
max_edges (integer) – how many edges can be traversed during the visit; default to 0 (not restricted)
return_types (string) – only return the nodes matching this type; default to
"*"
max_matching_nodes (integer) – how many nodes to return/visit before stopping; default to 0 (not restricted)
- Status Codes:
200 OK – success
400 Bad Request – invalid query string provided
404 Not Found – starting node cannot be found
Example:
GET /graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc HTTP/1.1 Content-Type: text/plain Transfer-Encoding: chunked
HTTP/1.1 200 OK swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:cfab784723a6c2d33468c9ed8a566fd5e2abd8c9 swh:1:rev:53e5df0e7a6b7bd4919074c081a173655c0da164 swh:1:rev:f85647f14b8243532283eff3e08f4ee96c35945f swh:1:rev:fe5f9ef854715fc59b9ec22f9878f11498cfcdbf swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb swh:1:cnt:c8cece50beae7a954f4ea27e3ae7bf941dc6d0c0 swh:1:dir:a358d0cf89821227d4c00b0ced5e0a8b3756b5db swh:1:cnt:cc407b7e24dd300d2e1a77d8f04af89b3f962a51 swh:1:cnt:701bd0a63e11b3390a547ce8515d28c6bab8a201 ...
Example:
GET /graph/visit/edges/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc HTTP/1.1 Content-Type: text/plain Transfer-Encoding: chunked
HTTP/1.1 200 OK swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:61f92a7db95f5a6d1fcb94d2b897ed3797584d7b swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:00e81c89c29ff3e58745fdaf7abb68daa1389e85 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:7596fdc31c9aa00aed281ccb026a74cabf2383bb swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:ec7a2341ac3d9d8b571bbdfb90a089d4e54dea56 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:1c5b5eac61eda2454034a43eb124ab490885ef3a swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:4dfa88ca55e04e8afe05e8543ddddee32dde7236 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:d56ae79e43ff1b37534370911c8a78ec7f38d437 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:19ba5d6203a040a39ecc4a77b165d3f097c1e662 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:9c56102eefea23c95405533e1de23da4b873ecc4 swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc swh:1:rev:3f54e816b46c2e179cd164e17fea93b3013a9db4 ...
Counting results#
The following method variants, with trailing /count added, behave like their already discussed counterparts but, instead of returning results, return the amount of results that would have been returned:
- GET /graph/leaves/count/:src#
Return the amount of
GET /graph/leaves/:src
results
- GET /graph/neighbors/count/:src#
Return the amount of
GET /graph/neighbors/:src
results
- GET /graph/visit/nodes/count/:src#
Return the amount of
GET /graph/visit/nodes/:src
results
Stats#
- GET /graph/stats#
Returns statistics on the compressed graph.
- Status Codes:
200 OK – success
Example
GET /graph/stats HTTP/1.1 Content-Type: application/json
HTTP/1.1 200 OK { "counts": { "nodes": 16222788, "edges": 9907464 }, "ratios": { "compression": 0.367, "bits_per_node": 5.846, "bits_per_edge": 9.573, "avg_locality": 270.369 }, "indegree": { "min": 0, "max": 12382, "avg": 0.6107127825377487 }, "outdegree": { "min": 0, "max": 1, "avg": 0.6107127825377487 } }
Use-case examples#
This section showcases how to leverage the endpoints of the HTTP API described above for some common use-cases.
Browsing#
The following use cases require traversing the forward graph.
ls: given a directory node, list (non recursively) all linked nodes of type directory and content
Endpoint:
/graph/neighbors/:DIR_ID?edges=dir:cnt,dir:dir
git log: given a revision node, recursively list all linked nodes of type revision
Endpoint:
/graph/visit/nodes/:REV_ID?edges=rev:rev
Vault#
The following use cases require traversing the forward graph.
tarball (same as ls -R above)
git bundle: given a node, recursively list all linked nodes of any kind
Endpoint:
/graph/visit/nodes/:NODE_ID?edges=*
Provenance#
The following use cases require traversing the backward (transposed) graph.
complete commit provenance: given a content or directory node, return all commits whose directory (recursively) contains it
Endpoint:
/graph/leaves/:NODE_ID?direction=backward&edges=dir:dir,cnt:dir,dir:rev
complete origin provenance: given a content, directory, or commit node, return all origins that have at least one snapshot that (recursively) contains it
Endpoint:
/graph/leaves/:NODE_ID?direction=backward&edges=*
Provenance statistics#
The following use cases require traversing the backward (transposed) graph.
content popularity across commits: count the number of commits (or commit popularity) that link to a directory that (recursively) includes a given content.
Endpoint:
/graph/count/leaves/:NODE_ID?direction=backward&edges=cnt:dir,dir:dir,dir:rev
commit popularity across origins: count the number of origins (or origin popularity) that have a snapshot that (recursively) includes a given commit.
Endpoint:
/graph/count/leaves/:NODE_ID?direction=backward&edges=*
The following use cases require traversing the forward graph.
revision size (as n. of contents) distribution: the number of contents that are (recursively) reachable from a given revision.
Endpoint:
/graph/count/leaves/:NODE_ID?edges=*
origin size (as n. of revisions) distribution: count the number of revisions that are (recursively) reachable from a given origin.
Endpoint:
/graph/count/leaves/:NODE_ID?edges=ori:snp,snp:rel,snp:rev,rel:rev,rev:rev