swh.objstorage.multiplexer package

Subpackages:

Submodules:

Module contents:

class swh.objstorage.multiplexer.MultiplexerObjStorage(storages, **kwargs)[source]

Bases: swh.objstorage.objstorage.ObjStorage

Implementation of ObjStorage that distributes between multiple storages.

The multiplexer object storage allows an input to be demultiplexed among multiple storages that will or will not accept it by themselves (see .filter package).

As the ids can be different, no pre-computed ids should be submitted. Also, there are no guarantees that the returned ids can be used directly into the storages that the multiplexer manage.

Use case examples follow.

Example 1:

storage_v1 = filter.read_only(PathSlicingObjStorage('/dir1',
                                                    '0:2/2:4/4:6'))
storage_v2 = PathSlicingObjStorage('/dir2', '0:1/0:5')
storage = MultiplexerObjStorage([storage_v1, storage_v2])

When using ‘storage’, all the new contents will only be added to the v2 storage, while it will be retrievable from both.

Example 2:

storage_v1 = filter.id_regex(
    PathSlicingObjStorage('/dir1', '0:2/2:4/4:6'),
    r'[^012].*'
)
storage_v2 = filter.if_regex(
    PathSlicingObjStorage('/dir2', '0:1/0:5'),
    r'[012]/*'
)
storage = MultiplexerObjStorage([storage_v1, storage_v2])

When using this storage, the contents with a sha1 starting with 0, 1 or 2 will be redirected (read AND write) to the storage_v2, while the others will be redirected to the storage_v1. If a content starting with 0, 1 or 2 is present in the storage_v1, it would be ignored anyway.

wrap_call(threads, call, *args, **kwargs)[source]
get_read_threads(obj_id=None)[source]
get_write_threads(obj_id=None)[source]
check_config(*, check_write)[source]

Check whether the object storage is properly configured.

Parameters
  • check_write (bool) – if True, check if writes to the object storage

  • succeed. (can) –

Returns

True if the configuration check worked, an exception if it didn’t.

add(content, obj_id=None, check_presence=True)[source]

Add a new object to the object storage.

If the adding step works in all the storages that accept this content, this is a success. Otherwise, the full adding step is an error even if it succeed in some of the storages.

Parameters
  • content – content of the object to be added to the storage.

  • obj_id – checksum of [bytes] using [ID_HASH_ALGO] algorithm. When given, obj_id will be trusted to match the bytes. If missing, obj_id will be computed on the fly.

  • check_presence – indicate if the presence of the content should be verified before adding the file.

Returns

an id of the object into the storage. As the write-storages are always readable as well, any id will be valid to retrieve a content.

add_batch(contents, check_presence=True) Dict[source]

Add a batch of new objects to the object storage.

restore(content, obj_id=None)[source]
get(obj_id)[source]
check(obj_id)[source]
delete(obj_id)[source]
get_random(batch_size)[source]
class swh.objstorage.multiplexer.StripingObjStorage(storages, **kwargs)[source]

Bases: swh.objstorage.multiplexer.multiplexer_objstorage.MultiplexerObjStorage

Stripes objects across multiple objstorages

This objstorage implementation will write objects to objstorages in a predictable way: it takes the modulo of the last 8 bytes of the object identifier with the number of object storages passed, which will yield an (almost) even distribution.

Objects are read from all storages in turn until it succeeds.

MOD_BYTES = 8
get_storage_index(obj_id)[source]
get_write_threads(obj_id)[source]
get_read_threads(obj_id=None)[source]
add_batch(contents, check_presence=True) Dict[source]

Add a batch of new objects to the object storage.