Software Heritage - Web client#

Client for Software Heritage Web applications, via their APIs.

Sample usage#

from swh.web.client.client import WebAPIClient
cli = WebAPIClient()

# retrieve any archived object via its SWHID
cli.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# same, but for specific object types
cli.revision('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# get() always retrieve entire objects, following pagination
# WARNING: this might *not* be what you want for large objects
cli.get('swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a')

# type-specific methods support explicit iteration through pages
next(cli.snapshot('swh:1:snp:cabcc7d7bf639bbe1cc3b41989e1806618dd5764'))

Authentication#

If you have a user account registered on Software Heritage Identity Provider, it is possible to authenticate requests made to the Web APIs through the use of an OpenID Connect bearer token. Sending authenticated requests can notably allow to lift API rate limiting depending on your permissions.

To get this token, a dedicated CLI tool is made available when installing swh-web-client:

$ swh auth
Usage: swh auth [OPTIONS] COMMAND [ARGS]...

  Software Heritage Authentication tools.

  This CLI eases the retrieval of a bearer token to authenticate a user
  querying Software Heritage Web APIs.

Options:
  --oidc-server-url TEXT  URL of OpenID Connect server (default to
                          "https://auth.softwareheritage.org/auth/")

  --realm-name TEXT       Name of the OpenID Connect authentication realm
                          (default to "SoftwareHeritage")

  --client-id TEXT        OpenID Connect client identifier in the realm
                          (default to "swh-web")

  -h, --help              Show this message and exit.

Commands:
  generate-token  Generate a new bearer token for a Web API authentication.
  revoke-token    Revoke a bearer token used for a Web API authentication.

In order to get your tokens, you need to use the generate-token subcommand of the CLI tool by passing your username as argument. You will be prompted for your password and if the authentication succeeds a new OpenID Connect offline session will be created and token will be dumped to standard output.

$ swh auth --client-id swh-web generate-token <username>
Password:
eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJmNjMzMD...

To authenticate yourself, you need to send that token value in request headers when querying the Web API. Considering you have stored that token value in a TOKEN environment variable, you can perform an authenticated call the following way using curl:

$ curl -H "Authorization: Bearer ${TOKEN}" https://archive.softwareheritage.org/api/1/<endpoint>

Note that if you intend to use the swh.web.client.client.WebAPIClient class, you can activate authentication by using the following code snippet:

from swh.web.client.client import WebAPIClient

TOKEN = '.......'  # Use "swh auth generate-token" command to get it

client = WebAPIClient(bearer_token=TOKEN)

# All requests to the Web API will be authenticated
resp = client.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

It is also possible to revoke a token, preventing future Web API authentication when using it. The revoke-token subcommand of the CLI tool has to be used to perform that task.

$ swh auth --client-id swh-web revoke-token $REFRESH_TOKEN
Token successfully revoked.