swh.web.client.client module¶
Python client for the Software Heritage Web API
Light wrapper around requests for the archive API, taking care of data conversions and pagination.
from swh.web.client.client import WebAPIClient
cli = WebAPIClient()
# retrieve any archived object via its SWHID
cli.get('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')
# same, but for specific object types
cli.revision('swh:1:rev:aafb16d69fd30ff58afdd69036a26047f3aebdc6')
# get() always retrieve entire objects, following pagination
# WARNING: this might *not* be what you want for large objects
cli.get('swh:1:snp:6a3a2cf0b2b90ce7ae1cf0a221ed68035b686f5a')
# type-specific methods support explicit iteration through pages
next(cli.snapshot('swh:1:snp:cabcc7d7bf639bbe1cc3b41989e1806618dd5764'))
-
swh.web.client.client.
typify_json
(data: Any, obj_type: str) → Any[source]¶ Type API responses using pythonic types where appropriate
The following conversions are performed:
identifiers are converted from strings to SWHID instances
timestamps are converted from strings to datetime.datetime objects
-
class
swh.web.client.client.
WebAPIClient
(api_url: str = 'https://archive.softwareheritage.org/api/1', bearer_token: Optional[str] = None)[source]¶ Bases:
object
Client for the Software Heritage archive Web API, see
https://archive.softwareheritage.org/api/
-
get
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Any[source]¶ Retrieve information about an object of any kind
Dispatcher method over the more specific methods content(), directory(), etc.
Note that this method will buffer the entire output in case of long, iterable output (e.g., for snapshot()), see the iter() method for streaming.
-
iter
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Iterator[Dict[str, Any]][source]¶ Stream over the information about an object of any kind
Streaming variant of get()
-
content
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Dict[str, Any][source]¶ Retrieve information about a content object
- Parameters
swhid – object persistent identifier
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Raises
requests.HTTPError – if HTTP request fails
-
directory
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → List[Dict[str, Any]][source]¶ Retrieve information about a directory object
- Parameters
swhid – object persistent identifier
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Raises
requests.HTTPError – if HTTP request fails
-
revision
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Dict[str, Any][source]¶ Retrieve information about a revision object
- Parameters
swhid – object persistent identifier
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Raises
requests.HTTPError – if HTTP request fails
-
release
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Dict[str, Any][source]¶ Retrieve information about a release object
- Parameters
swhid – object persistent identifier
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Raises
requests.HTTPError – if HTTP request fails
-
snapshot
(swhid: Union[swh.model.identifiers.CoreSWHID, str], typify: bool = True, **req_args) → Iterator[Dict[str, Any]][source]¶ Retrieve information about a snapshot object
- Parameters
swhid – object persistent identifier
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Returns
an iterator over partial snapshots (dictionaries mapping branch names to information about where they point to), each containing a subset of available branches
- Raises
requests.HTTPError – if HTTP request fails
-
visits
(origin: str, per_page: Optional[int] = None, last_visit: Optional[int] = None, typify: bool = True, **req_args) → Iterator[Dict[str, Any]][source]¶ List visits of an origin
- Parameters
origin – the URL of a software origin
per_page – the number of visits to list
last_visit – visit to start listing from
typify – if True, convert return value to pythonic types wherever possible, otherwise return raw JSON types (default: True)
req_args – extra keyword arguments for requests.get()
- Returns
an iterator over visits of the origin
- Raises
requests.HTTPError – if HTTP request fails
-
known
(swhids: Iterator[Union[swh.model.identifiers.CoreSWHID, str]], **req_args) → Dict[swh.model.identifiers.CoreSWHID, Dict[Any, Any]][source]¶ Verify the presence in the archive of several objects at once
- Parameters
swhids – SWHIDs of the objects to verify
- Returns
a dictionary mapping object SWHIDs to archive information about them; the dictionary includes a “known” key associated to a boolean value that is true if and only if the object is known to the archive
- Raises
requests.HTTPError – if HTTP request fails
-
content_exists
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → bool[source]¶ Check if a content object exists in the archive
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
directory_exists
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → bool[source]¶ Check if a directory object exists in the archive
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
revision_exists
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → bool[source]¶ Check if a revision object exists in the archive
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
release_exists
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → bool[source]¶ Check if a release object exists in the archive
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
snapshot_exists
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → bool[source]¶ Check if a snapshot object exists in the archive
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
origin_exists
(origin: str, **req_args) → bool[source]¶ Check if an origin object exists in the archive
- Parameters
origin – the URL of a software origin
req_args – extra keyword arguments for requests.head()
- Raises
requests.HTTPError – if HTTP request fails
-
content_raw
(swhid: Union[swh.model.identifiers.CoreSWHID, str], **req_args) → Iterator[bytes][source]¶ Iterate over the raw content of a content object
- Parameters
swhid – object persistent identifier
req_args – extra keyword arguments for requests.get()
- Raises
requests.HTTPError – if HTTP request fails
-
origin_search
(query: str, limit: Optional[int] = None, with_visit: bool = False, **req_args) → Iterator[Dict[str, Any]][source]¶ List origin search results
- Parameters
query – search keywords
limit – the maximum number of found origins to return
with_visit – if true, only return origins with at least one visit
- Returns
an iterator over search results
- Raises
requests.HTTPError – if HTTP request fails
-