Using the gRPC API#

The gRPC API is the core API used to query the provenance database remotely. It uses the gRPC framework to provide high-performance provenance answers with server streaming.

Quickstart#

Building the server#

Get Rust >= 1.79, eg. with rustup.

Run:

RUSTFLAGS="-C target-cpu=native" cargo install --locked https://gitlab.softwareheritage.org/swh/devel/swh-provenance.git

Or:

git clone https://gitlab.softwareheritage.org/swh/devel/swh-provenance.git
cd swh-provenance
cargo build --release

Getting a provenance database#

pip3 install awscli
aws s3 cp --no-sign-request --recursive s3://softwareheritage/derived_datasets/2024-12-06/provenance/all/ provenance-2024-12-06/

You also need a local graph. Either use swh graph download to download a full graph, or get only the minimal set of required files with:

aws s3 cp --no-sign-request s3://softwareheritage/graph/2024-12-06/compressed/graph.pthash graph-2024-12-06/
aws s3 cp --no-sign-request s3://softwareheritage/graph/2024-12-06/compressed/graph.pthash.order graph-2024-12-06/
aws s3 cp --no-sign-request s3://softwareheritage/graph/2024-12-06/compressed/graph.node2swhid.bin graph-2024-12-06/
aws s3 cp --no-sign-request s3://softwareheritage/graph/2024-12-06/compressed/graph.node2type.bin graph-2024-12-06/

Starting the server#

Before the first start, you need to build database indexes:

$ swh-provenance-index --database file:///provenance-2024-12-06/ --indexes provenance-2024-12-06-indexes/

Or, if you installed from Git:

$ cargo run --release --bin swh-provenance-index -- --database file:///provenance-2024-12-06/ --indexes provenance-2024-12-06-indexes/

The gRPC server is automatically started on port 50091 when the HTTP server is started with:

$ swh-provenance-grpc-serve --graph graph-2024-12-06/ --database file:///provenance-2024-12-06/ --indexes provenance-2024-12-06-indexes/

Or, if you installed from Git:

$ cargo run --release --bin swh-graph-grpc-serve -- --graph graph-2024-12-06/ --database file:///provenance-2024-12-06/ --indexes provenance-2024-12-06-indexes/

Running queries#

The gRPC command line tool can be an easy way to query the gRPC API from the command line. It is invoked with the grpc_cli command. Of course, it is also possible to use a generated RPC client in any programming language supported by gRPC.

All RPC methods are defined in the service swh.provenance.ProvenanceService. The available endpoints can be listed with ls:

$ rpc_cli ls localhost:50141 swh.provenance.ProvenanceService
WhereIsOne
WhereAreOne

A RPC method can be called with the call subcommand.:

$ grpc_cli call localhost:50141 swh.provenance.ProvenanceService.WhereIsOne "swhid: 'swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945'"
swhid: "swh:1:cnt:27766b99cdcab4e9b68501c3b50f1712e016c945"
anchor: "swh:1:rev:1564a9e70426251655286156957f8d710f0db278"