swh.graph.webgraph module¶
WebGraph driver
-
class
swh.graph.webgraph.
CompressionStep
(value)[source]¶ Bases:
enum.Enum
An enumeration.
-
MPH
= 1¶
-
BV
= 2¶
-
BV_OBL
= 3¶
-
BFS
= 4¶
-
PERMUTE
= 5¶
-
PERMUTE_OBL
= 6¶
-
STATS
= 7¶
-
TRANSPOSE
= 8¶
-
TRANSPOSE_OBL
= 9¶
-
MAPS
= 10¶
-
CLEAN_TMP
= 11¶
-
-
swh.graph.webgraph.
compress
(graph_name: str, in_dir: pathlib.Path, out_dir: pathlib.Path, steps: Set[swh.graph.webgraph.CompressionStep] = {<CompressionStep.CLEAN_TMP: 11>, <CompressionStep.STATS: 7>, <CompressionStep.PERMUTE_OBL: 6>, <CompressionStep.MAPS: 10>, <CompressionStep.TRANSPOSE: 8>, <CompressionStep.BV: 2>, <CompressionStep.PERMUTE: 5>, <CompressionStep.TRANSPOSE_OBL: 9>, <CompressionStep.BFS: 4>, <CompressionStep.BV_OBL: 3>, <CompressionStep.MPH: 1>}, conf: Dict[str, str] = {})[source]¶ graph compression pipeline driver from nodes/edges files to compressed on-disk representation
- Parameters
graph_name – graph base name, relative to in_dir
in_dir – input directory, where the uncompressed graph can be found
out_dir – output directory, where the compressed graph will be stored
steps – compression steps to run (default: all steps)
conf –
compression configuration, supporting the following keys (all are optional, so an empty configuration is fine and is the default)
batch_size: batch size for WebGraph transformations; defaults to 1 billion
classpath: java classpath, defaults to swh-graph JAR only
java: command to run java VM, defaults to “java”
java_tool_options: value for JAVA_TOOL_OPTIONS environment variable; defaults to various settings for high memory machines
logback: path to a logback.xml configuration file; if not provided a temporary one will be created and used
max_ram: maximum RAM to use for compression; defaults to available virtual memory
tmp_dir: temporary directory, defaults to the “tmp” subdir of out_dir