Memory & Performance tuning#
This page discusses various considerations related to memory usage and
performance tuning when using the swh-graph
library to load large
compressed graphs.
JVM options#
In production, we tend to use very large servers which have enough RAM to load the entire graph in RAM. In these setups, the default JVM options are often suboptimal. We recommend to start the JVM with the following options, which tend to significantly improve performance:
java \
-ea \
-server \
-XX:PretenureSizeThreshold=512M \
-XX:MaxNewSize=4G \
-XX:+UseLargePages \
-XX:+UseTransparentHugePages \
-XX:+UseNUMA \
-XX:+UseTLAB \
-XX:+ResizeTLAB \
These options are documented in the manual of java(1)
the Oracle
documentation.
Temporary directory#
Many of the graph algorithms (either for compression or traversal) tend to offload some of their run-time memory to disk. For instance, the BFS algorithm in the LAW library uses a temporary directory to write its queue of nodes to visit.
Because these can be quite large and sometimes overflow the default /tmp
partition, it is advised to systematically specify a path to a local temporary
directory with enough space to accommodate the needs of the Java programs. This
can be done using the -Djava.io.tmpdir
parameter on the Java CLI:
java -Djava.io.tmpdir=/srv/softwareheritage/ssd/tmp
Memory mapping vs Direct loading#
The main dial you can use to manage your memory usage is to chose between memory-mapping and direct-loading the graph data. The different loading modes available when loading the graph are documented in Using the Java API.
Loading in mapped mode will not load any extra data in RAM, but will instead
use the mmap(1)
syscall to put the graph file located on disk in the
virtual address space. The Linux kernel will then be free to arbitrarily cache
the file, either partially or in its entirety, depending on the available
memory space.
In our experiments, memory-mapping a small graph from a SSD only incurs a relatively small slowdown (about 15-20%). However, when the graph is too big to fit in RAM, the kernel has to constantly invalidate pages to cache newly accessed sections, which incurs a very large performance penalty. A full traversal of a large graph that usually takes about 20 hours when loaded in main memory could take more than a year when mapped from a hard drive!
When deciding what to direct-load and what to memory-map, here are a few rules of thumb:
If you don’t need random access to the graph edges, you can consider using the “offline” loading mode. The offsets won’t be loaded which will save dozens of gigabytes of RAM.
If you only need to query some specific nodes or run trivial traversals, memory-mapping the graph from a HDD should be a reasonable solution that doesn’t take an inordinate amount of time. It might be bad for your disks, though.
If you are constrained in available RAM, memory-mapping the graph from an SSD offers reasonable performance for reasonably complex algorithms.
If you have a heavy workload (i.e. running a full traversal of the entire graph) and you can afford the RAM, direct loading will be orders of magnitude faster than all the above options.