Command-line interface#

swh indexer#

Software Heritage Indexer tools.

The Indexer is used to mine the content of the archive and extract derived information from archive source code artifacts.

swh indexer [OPTIONS] COMMAND [ARGS]...

Options

-C, --config-file <config_file>#

Configuration file.

journal-client#

Listens for new objects from the SWH Journal, and either:

  • runs the indexer with the name passed as argument, if any

  • schedules tasks to run relevant indexers (currently, only origin_intrinsic_metadata) on these new objects otherwise.

Passing ‘*’ as indexer name runs all indexers.

swh indexer journal-client [OPTIONS] [[origin_intrinsic_metadata|extrinsic_met
                           adata|content_mimetype|content_fossology_license|*]
                           ]

Options

-s, --scheduler-url <scheduler_url>#

URL of the scheduler API

--origin-metadata-task-type <origin_metadata_task_type>#

Name of the task running the origin metadata indexer.

--broker <brokers>#

Kafka broker to connect to.

--prefix <prefix>#

Prefix of Kafka topic names to read from.

--group-id <group_id>#

Consumer/group id for reading from Kafka.

-m, --stop-after-objects <stop_after_objects>#

Maximum number of objects to replay. Default is to run forever.

-b, --batch-size <batch_size>#

Batch size. Default is 200.

Arguments

INDEXER#

Optional argument

mapping#

Manage Software Heritage Indexer mappings.

swh indexer mapping [OPTIONS] COMMAND [ARGS]...

list#

Prints the list of known mappings.

swh indexer mapping list [OPTIONS]

list-terms#

Prints the list of known CodeMeta terms, and which mappings support them.

swh indexer mapping list-terms [OPTIONS]

Options

--exclude-mapping <exclude_mapping>#

Exclude the given mapping from the output

--concise#

Don’t print the list of mappings supporting each term.

translate#

Translates file from mapping-name to codemeta format.

swh indexer mapping translate [OPTIONS] MAPPING_NAME FILE

Arguments

MAPPING_NAME#

Required argument

FILE#

Required argument

rpc-serve#

Starts a Software Heritage Indexer RPC HTTP server.

swh indexer rpc-serve [OPTIONS] CONFIG_PATH

Options

--host <host>#

Host to run the server

--port <port>#

Binding port of the server

--debug, --nodebug#

Indicates if the server should run in debug mode

Arguments

CONFIG_PATH#

Required argument

schedule#

Manipulate Software Heritage Indexer tasks.

Via SWH Scheduler’s API.

swh indexer schedule [OPTIONS] COMMAND [ARGS]...

Options

-s, --scheduler-url <scheduler_url>#

URL of the scheduler API

-i, --indexer-storage-url <indexer_storage_url>#

URL of the indexer storage API

-g, --storage-url <storage_url>#

URL of the (graph) storage API

--dry-run, --no-dry-run#

List only what would be scheduled.

reindex_origin_metadata#

Schedules indexing tasks for origins that were already indexed.

swh indexer schedule reindex_origin_metadata [OPTIONS]

Options

-b, --batch-size <origin_batch_size>#

Number of origins per task

Default:

10

-t, --tool-id <tool_ids>#

Restrict search of old metadata to this/these tool ids.

-m, --mapping <mappings>#

Mapping(s) that should be re-scheduled (eg. ‘npm’, ‘gemspec’, ‘maven’)

--task-type <task_type>#

Name of the task type to schedule.

Default:

index-origin-metadata