Command-line interface

swh graph

Software Heritage graph tools.

swh graph [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>

YAML configuration file

api-client

client for the graph REST service

swh graph api-client [OPTIONS]

Options

--host <host>

Graph server host

--port <port>

Graph server port

cachemount

Cache the mmapped files of the compressed graph in a tmpfs.

This command creates a new directory at the path given by CACHE that has the same structure as the compressed graph basename, except it copies the files that require mmap access (*.graph) but uses symlinks from the source for all the other files (.map, .bin, …).

The command outputs the path to the memory cache directory (particularly useful when relying on the default value).

swh graph cachemount [OPTIONS]

Options

-g, --graph <GRAPH>

Required compressed graph basename

-c, --cache <CACHE>

Memory cache path (defaults to /dev/shm/swh-graph/default)

compress

Compress a graph using WebGraph

Input: a pair of files g.nodes.csv.gz, g.edges.csv.gz

Output: a directory containing a WebGraph compressed graph

Compression steps are: (1) mph, (2) bv, (3) bv_obl, (4) bfs, (5) permute, (6) permute_obl, (7) stats, (8) transpose, (9) transpose_obl, (10) maps, (11) clean_tmp. Compression steps can be selected by name or number using –steps, separating them with commas; step ranges (e.g., 3-9, 6-, etc.) are also supported.

swh graph compress [OPTIONS]

Options

-g, --graph <GRAPH>

Required input graph basename

-o, --outdir <DIR>

Required directory where to store compressed graph

-s, --steps <STEPS>

run only these compression steps (default: all steps)

map

Manage swh-graph on-disk maps

swh graph map [OPTIONS] COMMAND [ARGS]...

dump

Dump a binary PID<->node map to textual format.

swh graph map dump [OPTIONS] FILENAME

Options

-t, --type <map_type>

Required type of map to dump

Options

pid2node|node2pid

Arguments

FILENAME

Required argument

lookup

Lookup identifiers using on-disk maps.

Depending on the identifier type lookup either a PID into a PID->node (and return the node integer identifier) or, vice-versa, lookup a node integer identifier into a node->PID (and return the PID). The desired behavior is chosen depending on the syntax of each given identifier.

Identifiers can be passed either directly on the command line or on standard input, separate by blanks. Logical lines (as returned by readline()) in stdin will be preserved in stdout.

swh graph map lookup [OPTIONS] [IDENTIFIERS]...

Options

-g, --graph <GRAPH>

Required compressed graph basename

Arguments

IDENTIFIERS

Optional argument(s)

restore

Restore a binary PID<->node map from textual format.

swh graph map restore [OPTIONS] FILENAME

Options

-t, --type <map_type>

Required type of map to dump

Options

pid2node|node2pid

-l, --length <length>

map size in number of logical records (required for node2pid maps)

Arguments

FILENAME

Required argument

write

Write a map to disk sequentially.

read from stdin a textual PID->node mapping (for pid2node, or a simple sequence of PIDs for node2pid) and write it to disk in the requested binary map format

note that no sorting is applied, so the input should already be sorted as required by the chosen map type (by PID for pid2node, by int for node2pid)

swh graph map write [OPTIONS] FILENAME

Options

-t, --type <map_type>

Required type of map to write

Options

pid2node|node2pid

Arguments

FILENAME

Required argument

rpc-serve

run the graph REST service

swh graph rpc-serve [OPTIONS]

Options

-h, --host <IP>

host IP address to bind the server on

Default

0.0.0.0

-p, --port <PORT>

port to bind the server on

Default

5009

-g, --graph <GRAPH>

Required compressed graph basename