Graph REST API

Terminology

This API uses the following notions:

  • Node: a node in the Software Heritage graph, represented by a persistent identifier (abbreviated as SWH PID, or simply PID).

  • Node type: the 3-letter specifier from the node PID (cnt, dir, rel, rev, snp, ori), or * for all node types.

  • Edge type: a pair src:dst where src and dst are either node types, or * to denote all node types.

  • Edge restrictions: a textual specification of which edges can be followed during graph traversal. Either * to denote that all edges can be followed or a comma separated list of edge types to allow following only those edges.

    Note that when traversing the backward (i.e., transposed) graph, edge types are reversed too. So, for instance, ori:snp makes sense when traversing the forward graph, but useless (due to lack of matching edges in the graph) when traversing the backward graph; conversely snp:ori is useful when traversing the backward graph, but not in the forward one. For the same reason dir:dir allows following edges from parent directories to sub-directories when traversing the forward graph, but the same restriction allows following edges from sub-directories to parent directories.

Examples

  • swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2 the PID of a node of type content containing the full text of the GPL3 license.

  • swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35 the PID of a node of type revision corresponding to the commit in Linux that merged the ‘x86/urgent’ branch on 31 December 2017.

  • "dir:dir,dir:cnt" node types allowing edges from directories to directories nodes, or directories to contents nodes.

  • "rev:rev,dir:*" node types allowing edges from revisions to revisions nodes, or from directories nodes.

  • "*:rel" node types allowing all edges to releases.

Leaves

GET /graph/leaves/:src

Performs a graph traversal and returns the leaves of the subgraph rooted at the specified source node.

Parameters
  • src (string) – source node specified as a SWH PID

Query Parameters
  • edges (string) – edges types the traversal can follow; default to "*"

  • direction (string) – direction in which graph edges will be followed; can be either forward or backward, default to forward

Status Codes

Example:

GET /graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323 HTTP/1.1

Content-Type: text/plain
Transfer-Encoding: chunked
HTTP/1.1 200 OK

swh:1:cnt:540faad6b1e02e2db4f349a4845192db521ff2bd
swh:1:cnt:630585fc6d34e5e121139e2aee0a64e83dc9aae6
swh:1:cnt:f8634ced669f0a9155c8cab1b2621d57d778215e
swh:1:cnt:ba6daa801ad3ea587904b1abe9161dceedb2e0bd
...

Neighbors

GET /graph/neighbors/:src

Returns node direct neighbors (linked with exactly one edge) in the graph.

Parameters
  • src (string) – source node specified as a SWH PID

Query Parameters
  • edges (string) – edges types allowed to be listed as neighbors; default to "*"

  • direction (string) – direction in which graph edges will be followed; can be either forward or backward, default to forward

Status Codes

Example:

GET /graph/neighbors/swh:1:rev:f39d7d78b70e0f39facb1e4fab77ad3df5c52a35 HTTP/1.1

Content-Type: text/plain
Transfer-Encoding: chunked
HTTP/1.1 200 OK

swh:1:rev:a31e58e129f73ab5b04016330b13ed51fde7a961
swh:1:dir:b5d2aa0746b70300ebbca82a8132af386cc5986d
swh:1:rev:52c90f2d32bfa7d6eccd66a56c44ace1f78fbadd
...

Walk

GET /graph/randomwalk/:src/:dst

Performs a graph random traversal, i.e., picking one random successor node at each hop, from source to destination (final destination node included).

Parameters
  • src (string) – starting node specified as a SWH PID

  • dst (string) – destination node, either as a node PID or a node type. The traversal will stop at the first node encountered matching the desired destination.

Query Parameters
  • edges (string) – edges types the traversal can follow; default to "*"

  • direction (string) – direction in which graph edges will be followed; can be either forward or backward, default to forward

  • limit (int) – limit the number of nodes returned. You can use positive numbers to get the first N results, or negative numbers to get the last N results starting from the tail; default to 0, meaning no limit.

Status Codes

Example:

GET /graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward HTTP/1.1

Content-Type: text/plain
Transfer-Encoding: chunked
HTTP/1.1 200 OK

swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2
swh:1:dir:8de8a8823a0780524529c94464ee6ef60b98e2ed
swh:1:dir:7146ea6cbd5ffbfec58cc8df5e0552da45e69cb7
swh:1:rev:b12563e00026b48b817fd3532fc3df2db2a0f460
swh:1:rev:13e8ebe80fb878bade776131e738d5772aa0ad1b
swh:1:rev:cb39b849f167c70c1f86d4356f02d1285d49ee13
...
swh:1:rev:ff70949f336593d6c59b18e4989edf24d7f0f254
swh:1:snp:a511810642b7795e725033febdd82075064ed863
swh:1:ori:98aa0e71f5c789b12673717a97f6e9fa20aa1161

Limit example:

GET /graph/randomwalk/swh:1:cnt:94a9ed024d3859793618152ea559a168bbcbb5e2/ori?direction=backward&limit=-2 HTTP/1.1

Content-Type: text/plain
Transfer-Encoding: chunked
HTTP/1.1 200 OK

swh:1:ori:98aa0e71f5c789b12673717a97f6e9fa20aa1161
swh:1:snp:a511810642b7795e725033febdd82075064ed863

Visit

GET /graph/visit/nodes/:src
GET /graph/visit/paths/:src

Performs a graph traversal and returns explored nodes or paths (in the order of the traversal).

Parameters
  • src (string) – starting node specified as a SWH PID

Query Parameters
  • edges (string) – edges types the traversal can follow; default to "*"

  • direction (string) – direction in which graph edges will be followed; can be either forward or backward, default to forward

Status Codes

Example:

GET /graph/visit/nodes/swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc HTTP/1.1

Content-Type: text/plain
Transfer-Encoding: chunked
HTTP/1.1 200 OK

swh:1:snp:40f9f177b8ab0b7b3d70ee14bbc8b214e2b2dcfc
swh:1:rev:cfab784723a6c2d33468c9ed8a566fd5e2abd8c9
swh:1:rev:53e5df0e7a6b7bd4919074c081a173655c0da164
swh:1:rev:f85647f14b8243532283eff3e08f4ee96c35945f
swh:1:rev:fe5f9ef854715fc59b9ec22f9878f11498cfcdbf
swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb
swh:1:cnt:c8cece50beae7a954f4ea27e3ae7bf941dc6d0c0
swh:1:dir:a358d0cf89821227d4c00b0ced5e0a8b3756b5db
swh:1:cnt:cc407b7e24dd300d2e1a77d8f04af89b3f962a51
swh:1:cnt:701bd0a63e11b3390a547ce8515d28c6bab8a201
...

Example:

GET /graph/visit/nodes/swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb HTTP/1.1

Content-Type: application/x-ndjson
Transfer-Encoding: chunked
HTTP/1.1 200 OK

["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:acfb7cabd63b368a03a9df87670ece1488c8bce0"]
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:2a0837708151d76edf28fdbb90dc3eabc676cff3"]
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:cnt:eaf025ad54b94b2fdda26af75594cfae3491ec75"]
...
["swh:1:dir:644dd466d8ad527ea3a609bfd588a3244e6dafcb", "swh:1:dir:2ebd4b96fa5665ff74f2b27ae41aecdc43af4463", "swh:1:cnt:1d3b6575fb7bf2a147d228e78ffd77ea193c3639"]
...

Counting results

The following method variants, with trailing /count added, behave like their already discussed counterparts but, instead of returning results, return the amount of results that would have been returned:

GET /graph/leaves/count/:src

Return the amount of GET /graph/leaves/:src results

GET /graph/neighbors/count/:src

Return the amount of GET /graph/neighbors/:src results

GET /graph/visit/nodes/count/:src

Return the amount of GET /graph/visit/nodes/:src results

Stats

GET /graph/stats

Returns statistics on the compressed graph.

Status Codes

Example

GET /graph/stats HTTP/1.1

Content-Type: application/json
HTTP/1.1 200 OK

{
    "counts": {
        "nodes": 16222788,
        "edges": 9907464
    },
    "ratios": {
        "compression": 0.367,
        "bits_per_node": 5.846,
        "bits_per_edge": 9.573,
        "avg_locality": 270.369
    },
    "indegree": {
        "min": 0,
        "max": 12382,
        "avg": 0.6107127825377487
    },
    "outdegree": {
        "min": 0,
        "max": 1,
        "avg": 0.6107127825377487
    }
}