Command-line interface#

swh indexer#

Software Heritage Indexer tools.

The Indexer is used to mine the content of the archive and extract derived information from archive source code artifacts.

swh indexer [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

journal-client#

Listens for new objects from the SWH Journal, and runs the indexer with the name passed as argument

Passing ‘*’ as indexer name runs all indexers.

swh indexer journal-client [OPTIONS] {origin_intrinsic_metadata|extrinsic_meta
                           data|content_mimetype|content_fossology_license|*}

Options

--broker <brokers>#

Kafka broker to connect to.

--prefix <prefix>#

Prefix of Kafka topic names to read from.

--group-id <group_id>#

Consumer/group id for reading from Kafka.

-m, --stop-after-objects <stop_after_objects>#

Maximum number of objects to replay. Default is to run forever.

-b, --batch-size <batch_size>#

Batch size. Default is 200.

Arguments

INDEXER#

Required argument

mapping#

Manage Software Heritage Indexer mappings.

swh indexer mapping [OPTIONS] COMMAND [ARGS]...

list#

Prints the list of known mappings.

swh indexer mapping list [OPTIONS]

list-terms#

Prints the list of known CodeMeta terms, and which mappings support them.

swh indexer mapping list-terms [OPTIONS]

Options

--exclude-mapping <exclude_mapping>#

Exclude the given mapping from the output

--concise#

Don’t print the list of mappings supporting each term.

translate#

Translates file from mapping-name to codemeta format.

swh indexer mapping translate [OPTIONS] MAPPING_NAME FILE

Arguments

MAPPING_NAME#

Required argument

FILE#

Required argument

rpc-serve#

Starts a Software Heritage Indexer RPC HTTP server.

swh indexer rpc-serve [OPTIONS] CONFIG_PATH

Options

--host <host>#

Host to run the server

--port <port>#

Binding port of the server

--debug, --nodebug#

Indicates if the server should run in debug mode

Arguments

CONFIG_PATH#

Required argument