Software Heritage - Web client

Client for Software Heritage Web applications, via their APIs.

Sample usage

from swh.web.client import WebAPIClient
cli = WebAPIClient()

# retrieve any archived object via its PID
cli.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# same, but for specific object types
cli.revision('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# get() always retrieve entire objects, following pagination
# WARNING: this might *not* be what you want for large objects
cli.get('swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a')

# type-specific methods support explicit iteration through pages
next(cli.snapshot('swh:1:snp:cabcc7d7bf639bbe1cc3b41989e1806618dd5764'))

Authentication

If you have a user account registered on Software Heritage Identity Provider, it is possible to authenticate requests made to the Web APIs through the use of a OpenID Connect bearer token. Sending authenticated requests can notably allow to lift API rate limiting depending on your permissions.

To get this token, a dedicated CLI tool is made available when installing swh-web-client:

$ swh auth
Usage: swh auth [OPTIONS] COMMAND [ARGS]...

  Authenticate Software Heritage users with OpenID Connect.

  This CLI tool eases the retrieval of bearer tokens to authenticate a user
  querying the Software Heritage Web API.

Options:
  --oidc-server-url TEXT  URL of OpenID Connect server (default to
                          "https://auth.softwareheritage.org/auth/")
  --realm-name TEXT       Name of the OpenID Connect authentication realm
                          (default to "SoftwareHeritage")
  --client-id TEXT        OpenID Connect client identifier in the realm
                          (default to "swh-web")
  -h, --help              Show this message and exit.

Commands:
  login    Login and create new offline OpenID Connect session.
  logout   Logout from an offline OpenID Connect session.

In order to get your tokens, you need to use the login subcommand of that CLI tool by passing your username as argument. You will be prompted for your password and if the authentication succeeds a new OpenID Connect session will be created and tokens will be dumped to standard output.

$ swh auth login <username>
Password:
eyJhbGciOiJIUzI1NiIsInR5cCIgOiAiSldUIiwia2lkIiA6ICJmNjMzMD...

To authenticate yourself, you need to send that token value in request headers when querying the Web API. Considering you have stored that token value in a TOKEN environment variable, you can perform an authenticated call the following way using curl:

$ curl -H "Authorization: Bearer ${TOKEN}" https://archive.softwareheritage.org/api/1/<endpoint>

Note that if you intend to use the swh.web.client.client.WebAPIClient class, you can activate authentication by using the following code snippet:

from swh.web.client import WebAPIClient

TOKEN = '.......'  # Use "swh auth login" command to get it

client = WebAPIClient(bearer_token=TOKEN)

# All requests to the Web API will be authenticated
resp = client.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

It is also possible to logout from the authenticated OpenID Connect session which definitely revokes the token.

$ swh auth logout $REFRESH_TOKEN
Successfully logged out from OpenID Connect session