swh.web.client.client module

Python client for the Software Heritage Web API

Light wrapper around requests for the archive API, taking care of data conversions and pagination.

from swh.web.client import WebAPIClient
cli = WebAPIClient()

# retrieve any archived object via its PID
cli.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# same, but for specific object types
cli.revision('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')

# get() always retrieve entire objects, following pagination
# WARNING: this might *not* be what you want for large objects
cli.get('swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a')

# type-specific methods support explicit iteration through pages
next(cli.snapshot('swh:1:snp:cabcc7d7bf639bbe1cc3b41989e1806618dd5764'))
swh.web.client.client.typify(data: Any, obj_type: str) → Any[source]

Type API responses using pythonic types where appropriate

The following conversions are performed:

  • identifiers are converted from strings to PersistentId instances

  • timestamps are converted from strings to datetime.datetime objects

class swh.web.client.client.WebAPIClient(api_url='https://archive.softwareheritage.org/api/1', auth_url='https://auth.softwareheritage.org/auth/')[source]

Bases: object

Client for the Software Heritage archive Web API, see

https://archive.softwareheritage.org/api/

get(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Any[source]

Retrieve information about an object of any kind

Dispatcher method over the more specific methods content(), directory(), etc.

Note that this method will buffer the entire output in case of long, iterable output (e.g., for snapshot()), see the iter() method for streaming.

iter(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Generator[Dict[str, Any], None, None][source]

Stream over the information about an object of any kind

Streaming variant of get()

content(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Dict[str, Any][source]

Retrieve information about a content object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Raises

requests.HTTPError – if HTTP request fails

directory(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → List[Dict[str, Any]][source]

Retrieve information about a directory object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Raises

requests.HTTPError – if HTTP request fails

revision(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Dict[str, Any][source]

Retrieve information about a revision object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Raises

requests.HTTPError – if HTTP request fails

release(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Dict[str, Any][source]

Retrieve information about a release object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Raises

requests.HTTPError – if HTTP request fails

snapshot(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Generator[Dict[str, Any], None, None][source]

Retrieve information about a snapshot object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Returns

an iterator over partial snapshots (dictionaries mapping branch names to information about where they point to), each containing a subset of available branches

Raises

requests.HTTPError – if HTTP request fails

visits(origin: str, per_page: Optional[int] = None, last_visit: Optional[int] = None, **req_args) → Generator[Dict[str, Any], None, None][source]

List visits of an origin

Parameters
  • origin – the URL of a software origin

  • per_page – the number of visits to list

  • last_visit – visit to start listing from

  • req_args – extra keyword arguments for requests.get()

Returns

an iterator over visits of the origin

Raises

requests.HTTPError – if HTTP request fails

content_exists(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → bool[source]

Check if a content object exists in the archive

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.head()

Raises

requests.HTTPError – if HTTP request fails

directory_exists(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → bool[source]

Check if a directory object exists in the archive

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.head()

Raises

requests.HTTPError – if HTTP request fails

revision_exists(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → bool[source]

Check if a revision object exists in the archive

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.head()

Raises

requests.HTTPError – if HTTP request fails

release_exists(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → bool[source]

Check if a release object exists in the archive

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.head()

Raises

requests.HTTPError – if HTTP request fails

snapshot_exists(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → bool[source]

Check if a snapshot object exists in the archive

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.head()

Raises

requests.HTTPError – if HTTP request fails

content_raw(pid: Union[swh.model.identifiers.PersistentId, str], **req_args) → Generator[bytes, None, None][source]

Iterate over the raw content of a content object

Parameters
  • pid – object identifier

  • req_args – extra keyword arguments for requests.get()

Raises

requests.HTTPError – if HTTP request fails

authenticate(refresh_token: str)[source]

Authenticate API requests using OpenID Connect bearer token

Parameters

refresh_token – A refresh token retrieved using the swh auth login command (see Authentication section in main documentation)

Raises

swh.web.client.auth.AuthenticationError – if authentication fails