- class swh.objstorage.backends.pathslicing.PathSlicingObjStorage(root, slicing, compression='gzip', **kwargs)
Implementation of the ObjStorage API based on the hash of the content.
On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).
To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
The files in the storage are stored in gzipped compressed format.
path to the root directory of the storage on the disk.
list of tuples that indicates the beginning and the end of each subdirectory for a content.
Create an object to access a hash-slicing based object storage.
root (string) – path to the root directory of the storage on the disk.
slicing (string) – string that indicates the slicing to perform on the hash of the content to know the path where it should be stored.
- check_config(*, check_write)
Check whether this object storage is properly configured
- add(content, obj_id=None, check_presence=True)
Add a new object to the object storage.
content (bytes) – object’s raw content to add in storage.
obj_id (bytes) – checksum of [bytes] using [ID_HASH_ALGO] algorithm. When given, obj_id will be trusted to match the bytes. If missing, obj_id will be computed on the fly.
check_presence (bool) – indicate if the presence of the content should be verified before adding the file.
the id (bytes) of the object into the storage.
Retrieve the content of a given object.
Perform an integrity check for a given object.
Verify that the file object is in place and that the content matches the object id.
Delete an object.
Get random ids of existing contents.
This method is used in order to get random ids to perform content integrity verifications on random contents.
batch_size (int) – Number of ids that will be given
An iterable of ids (bytes) of contents that are in the current object storage.
- add_stream(content_iter, obj_id, check_presence=True)
Add a new object to the object storage using streaming.
This function is identical to add() except it takes a generator that yields the chunked content instead of the whole content at once.
- get_stream(obj_id, chunk_size=2097152)
Retrieve the content of a given object as a chunked iterator.
- list_content(last_obj_id=None, limit=10000)
Generates known object ids.
obj_id (bytes): object ids.
- iter_from(obj_id, n_leaf=False)