This quick tutorial shows how to compress and browse a graph using swh.graph.

It does not cover the technical details behind the graph compression techniques (refer to Graph compression).


In order to run the swh.graph tool, you will need Python (>= 3.7) and Java JRE, you do not need the JDK if you install the package from pypi, but may want to install it if you want to hack the code or install it from this git repository. To compress a graph, you will need zstd compression tools.

It is highly recommended to install this package in a virtualenv.

On a Debian stable (buster) system:

$ sudo apt install python3-virtualenv default-jre zstd


Create a virtualenv and activate it:

~/tmp$ mkdir swh-graph-tests
~/tmp$ cd swh-graph-tests
~/t/swh-graph-tests$ virtualenv swhenv
~/t/swh-graph-tests$ . swhenv/bin/activate

Install the swh.graph python package:

(swhenv) ~/t/swh-graph-tests$ pip install swh.graph
(swhenv) ~/t/swh-graph-tests swh graph --help
Usage: swh graph [OPTIONS] COMMAND [ARGS]...

  Software Heritage graph tools.

  -C, --config-file FILE  YAML configuration file
  -h, --help              Show this message and exit.

  api-client  client for the graph RPC service
  cachemount  Cache the mmapped files of the compressed graph in a tmpfs.
  compress    Compress a graph using WebGraph Input: a pair of files...
  map         Manage swh-graph on-disk maps
  rpc-serve   run the graph RPC service


Existing datasets

You can directly use compressed graph datasets provided by Software Heritage. Here is a small and realistic dataset (3.1GB):

(swhenv) ~/t/swh-graph-tests$ curl -O https://annex.softwareheritage.org/public/dataset/graph/latest/popular-3k-python/python3kcompress.tar
(swhenv) ~/t/swh-graph-tests$ tar xvf python3kcompress.tar
(swhenv) ~/t/swh-graph-tests$ touch python3kcompress/*.obl # fix the mtime of cached offset files to allow faster loading

Note: not for the faint heart, but the full dataset is available at:

Own datasets

A graph is described as both its adjacency list and the set of nodes identifiers in plain text format. Such graph example can be found in the swh/graph/tests/dataset/ folder.

You can compress the example graph on the command line like this:

(swhenv) ~/t/swh-graph-tests$ swh graph compress --graph swh/graph/tests/dataset/example --outdir output/


(swhenv) ~/t/swh-graph-tests$ ls output/
 example-bv.properties  example.mph             example.obl      example.outdegree   example.swhid2node.bin    example-transposed.offsets
 example.graph          example.node2swhid.bin  example.offsets  example.properties  example-transposed.graph  example-transposed.properties
 example.indegree       example.node2type.map   example.order    example.stats       example-transposed.obl

API server

To start a swh.graph API server of a compressed graph dataset, run:

(swhenv) ~/t/swh-graph-tests$ swh graph rpc-serve -g output/example
Loading graph output/example ...
Graph loaded.
======== Running on ========
(Press CTRL+C to quit)

From there you can use this endpoint to query the compressed graph, for example with httpie (sudo apt install) from another terminal:

~/tmp$ http :5009/graph/visit/nodes/swh:1:rel:0000000000000000000000000000000000000010
 HTTP/1.1 200 OK
 Content-Type: text/plain
 Date: Tue, 15 Sep 2020 08:33:25 GMT
 Server: Python/3.8 aiohttp/3.6.2
 Transfer-Encoding: chunked


Running the existing python3kcompress dataset:

(swhenv) ~/t/swh-graph-tests$ swh graph rpc-serve -g python3kcompress/python3k
Loading graph python3kcompress/python3k ...
Graph loaded.
======== Running on ========
(Press CTRL+C to quit)

~/tmp$ http :5009/graph/leaves/swh:1:dir:432d1b21c1256f7408a07c577b6974bbdbcc1323
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Tue, 15 Sep 2020 08:35:19 GMT
Server: Python/3.8 aiohttp/3.6.2
Transfer-Encoding: chunked


See the documentation of the API for more details.