swh.objstorage.api.client module#

class swh.objstorage.api.client.RemoteObjStorage(url, api_exception=None, timeout=None, chunk_size=4096, reraise_exceptions=None, **kwargs)[source]#

Bases: RPCClient

Proxy to a remote object storage.

This class allows to connect to an object storage server via http protocol.

url#

The url of the server to connect. Must end with a ‘/’

Type:

string

session#

The session to send requests.

api_exception#

alias of ObjStorageAPIError

reraise_exceptions: ClassVar[List[Type[Exception]]] = [<class 'swh.objstorage.exc.ObjNotFoundError'>, <class 'swh.objstorage.exc.Error'>, <class 'swh.objstorage.exc.ObjCorruptedError'>]#

On server errors, if any of the exception classes in this list has the same name as the error name, then the exception will be instantiated and raised instead of a generic RemoteException.

backend_class#

alias of ObjStorageInterface

restore(content: bytes, obj_id: bytes | CompositeObjId) None[source]#
list_content(last_obj_id: bytes | CompositeObjId | None = None, limit: int | None = 10000) Iterator[CompositeObjId][source]#
add(content: bytes, obj_id: bytes | CompositeObjId, check_presence: bool = True) None#

Add a new object to the object storage.

Parameters:
  • content – object’s raw content to add in storage.

  • obj_id – either dict of checksums, or single checksum of [bytes] using [ID_HASH_ALGO] algorithm. It is trusted to match the bytes.

  • check_presence (bool) – indicate if the presence of the content should be verified before adding the file.

Returns:

the id (bytes) of the object into the storage.

add_batch(contents: Mapping[bytes, bytes] | Iterable[Tuple[bytes | CompositeObjId, bytes]], check_presence: bool = True) Dict#

Add a batch of new objects to the object storage.

Parameters:

contents – either mapping from [ID_HASH_ALGO] checksums to object contents, or list of pairs of dict hashes and object contents

Returns:

the summary of objects added to the storage (count of object, count of bytes object)

check(obj_id: bytes | CompositeObjId) None#

Perform an integrity check for a given object.

Verify that the file object is in place and that the content matches the object id.

Parameters:

obj_id – object identifier.

Raises:
check_config(*, check_write)#

Check whether the object storage is properly configured.

Parameters:
  • check_write (bool) – if True, check if writes to the object storage

  • succeed. (can)

Returns:

True if the configuration check worked, an exception if it didn’t.

delete(obj_id: bytes | CompositeObjId)#

Delete an object.

Parameters:

obj_id – object identifier.

Raises:

ObjNotFoundError – if the requested object is missing.

download_url(obj_id: bytes | CompositeObjId, content_disposition: str | None = None, expiry: timedelta | None = None) str | None#

Get a direct download link for the object if the obstorage backend supports such feature.

Some objstorage backends, typically cloud based ones like azure or s3, can provide a direct download link for a stored object.

Parameters:
  • obj_id – object identifier

  • content_disposition – set Content-Disposition header for the generated URL response if the objstorage backend supports it

  • expiry – the duration after which the URL expires if the objstorage backend supports it, if not provided the URL expires 24 hours after its creation

Returns:

Direct download URL for the object or None if the objstorage backend does

not support such feature.

get(obj_id: bytes | CompositeObjId) bytes#

Retrieve the content of a given object.

Parameters:

obj_id – object id.

Returns:

the content of the requested object as bytes.

Raises:

ObjNotFoundError – if the requested object is missing.

get_batch(obj_ids: Iterable[bytes | CompositeObjId]) Iterator[bytes | None]#

Retrieve objects’ raw content in bulk from storage.

Note: This function does have a default implementation in ObjStorage that is suitable for most cases.

For object storages that needs to do the minimal number of requests possible (ex: remote object storages), that method can be overridden to perform a more efficient operation.

Parameters:

obj_ids – list of object ids.

Returns:

list of resulting contents, or None if the content could not be retrieved. Do not raise any exception as a fail for one content will not cancel the whole request.