swh.objstorage.backends.pathslicing module¶
-
class
swh.objstorage.backends.pathslicing.
PathSlicingObjStorage
(root, slicing, compression='gzip', **kwargs)[source]¶ Bases:
swh.objstorage.objstorage.ObjStorage
Implementation of the ObjStorage API based on the hash of the content.
On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).
To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
The files in the storage are stored in gzipped compressed format.
-
root
¶ path to the root directory of the storage on the disk.
- Type
string
-
bounds
¶ list of tuples that indicates the beginning and the end of each subdirectory for a content.
-
add
(content, obj_id=None, check_presence=True)[source]¶ Add a new object to the object storage.
- Parameters
content (bytes) – object’s raw content to add in storage.
obj_id (bytes) – checksum of [bytes] using [ID_HASH_ALGO] algorithm. When given, obj_id will be trusted to match the bytes. If missing, obj_id will be computed on the fly.
check_presence (bool) – indicate if the presence of the content should be verified before adding the file.
- Returns
the id (bytes) of the object into the storage.
-
get
(obj_id)[source]¶ Retrieve the content of a given object.
- Parameters
obj_id (bytes) – object id.
- Returns
the content of the requested object as bytes.
- Raises
ObjNotFoundError – if the requested object is missing.
-
check
(obj_id)[source]¶ Perform an integrity check for a given object.
Verify that the file object is in place and that the content matches the object id.
- Parameters
obj_id (bytes) – object identifier.
- Raises
ObjNotFoundError – if the requested object is missing.
Error – if the request object is corrupted.
-
delete
(obj_id)[source]¶ Delete an object.
- Parameters
obj_id (bytes) – object identifier.
- Raises
ObjNotFoundError – if the requested object is missing.
-
get_random
(batch_size)[source]¶ Get random ids of existing contents.
This method is used in order to get random ids to perform content integrity verifications on random contents.
- Parameters
batch_size (int) – Number of ids that will be given
- Yields
An iterable of ids (bytes) of contents that are in the current object storage.
-
add_stream
(content_iter, obj_id, check_presence=True)[source]¶ Add a new object to the object storage using streaming.
This function is identical to add() except it takes a generator that yields the chunked content instead of the whole content at once.
- Parameters
content (bytes) – chunked generator that yields the object’s raw content to add in storage.
obj_id (bytes) – object identifier
check_presence (bool) – indicate if the presence of the content should be verified before adding the file.
- Returns
the id (bytes) of the object into the storage.
-
get_stream
(obj_id, chunk_size=2097152)[source]¶ Retrieve the content of a given object as a chunked iterator.
- Parameters
obj_id (bytes) – object id.
- Returns
the content of the requested object as bytes.
- Raises
ObjNotFoundError – if the requested object is missing.