swh.objstorage.backends.pathslicing module#

class swh.objstorage.backends.pathslicing.PathSlicer(root: str, slicing: str)[source]#

Bases: object

Helper class to compute a path based on a hash.

Used to compute a directory path based on the object hash according to a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.

For instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will have the following computed path:

  • 0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689

  • 0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689

Args:

root (str): path to the root directory of the storage on the disk. slicing (str): the slicing configuration.

check_config()[source]#

Check the slicing configuration is valid.

Raises:

ValueError – if the slicing configuration is invalid.

get_directory(hex_obj_id: str) str[source]#

Compute the storage directory of an object.

See also: PathSlicer::get_path

Parameters:

hex_obj_id – object id as hexlified string.

Returns:

Absolute path (including root) to the directory that contains the given object id.

get_path(hex_obj_id: str) str[source]#

Compute the full path to an object into the current storage.

See also: PathSlicer::get_directory

Parameters:

hex_obj_id (str) – object id as hexlified string.

Returns:

Absolute path (including root) to the object corresponding to the given object id.

get_slices(hex_obj_id: str) List[str][source]#

Compute the path elements for the given hash.

Parameters:

hex_obj_id (str) – object id as hexlified string.

Returns:

Relative path to the actual object corresponding to the given id as a list.

class swh.objstorage.backends.pathslicing.PathSlicingObjStorage(root, slicing, compression='gzip', **kwargs)[source]#

Bases: ObjStorage

Implementation of the ObjStorage API based on the hash of the content.

On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).

To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.

So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :

  • 0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689

  • 0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689

The files in the storage are stored in gzipped compressed format.

Parameters:
  • root (str) – path to the root directory of the storage on the disk.

  • slicing (str) – string that indicates the slicing to perform on the hash of the content to know the path where it should be stored (see the documentation of the PathSlicer class).

PRIMARY_HASH: Literal['sha1'] = 'sha1'#
check_config(*, check_write)[source]#

Check whether this object storage is properly configured

add(content: bytes, obj_id: bytes | CompositeObjId, check_presence: bool = True) None[source]#
get(obj_id: bytes | CompositeObjId) bytes[source]#
check(obj_id: bytes | CompositeObjId) None[source]#
delete(obj_id: bytes | CompositeObjId)[source]#
chunk_writer(obj_id)[source]#
list_content(last_obj_id: bytes | CompositeObjId | None = None, limit: int | None = 10000) Iterator[CompositeObjId][source]#
iter_from(obj_id, n_leaf=False)[source]#