swh.objstorage.backends.pathslicing module#

swh.objstorage.backends.pathslicing.is_valid_filename(filename: str, algo: Literal['sha1', 'sha256'] = 'sha1')[source]#

Checks that the file points to a valid hexdigest for the given algo.

class swh.objstorage.backends.pathslicing.PathSlicer(root: str, slicing: str)[source]#

Bases: object

Helper class to compute a path based on a hash.

Used to compute a directory path based on the object hash according to a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.

For instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will have the following computed path:

  • 0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689

  • 0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689

Args:

root (str): path to the root directory of the storage on the disk. slicing (str): the slicing configuration.

check_config()[source]#

Check the slicing configuration is valid.

Raises:

ValueError – if the slicing configuration is invalid.

get_directory(hex_obj_id: str) str[source]#

Compute the storage directory of an object.

See also: PathSlicer::get_path

Parameters:

hex_obj_id – object id as hexlified string.

Returns:

Absolute path (including root) to the directory that contains the given object id.

get_path(hex_obj_id: str) str[source]#

Compute the full path to an object into the current storage.

See also: PathSlicer::get_directory

Parameters:

hex_obj_id (str) – object id as hexlified string.

Returns:

Absolute path (including root) to the object corresponding to the given object id.

get_slices(hex_obj_id: str) List[str][source]#

Compute the path elements for the given hash.

Parameters:

hex_obj_id (str) – object id as hexlified string.

Returns:

Relative path to the actual object corresponding to the given id as a list.

class swh.objstorage.backends.pathslicing.PathSlicingObjStorage(*, root: str = '', compression: Literal['bz2', 'lzma', 'gzip', 'zlib', 'none'] = 'gzip', slicing: str = '', **kwargs)[source]#

Bases: ObjStorage

Implementation of the ObjStorage API based on the hash of the content.

On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).

To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.

So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :

  • 0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689

  • 0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689

The files in the storage are stored in gzipped compressed format.

Parameters:
  • root (str) – path to the root directory of the storage on the disk.

  • slicing (str) – string that indicates the slicing to perform on the hash of the content to know the path where it should be stored (see the documentation of the PathSlicer class).

PRIMARY_HASH: Literal['sha1', 'sha256'] = 'sha1'#
name: str = 'pathslicing'#

Default objstorage name; can be overloaded at instantiation time giving a ‘name’ argument to the constructor

check_config(*, check_write)[source]#

Check whether this object storage is properly configured

add(content: bytes, obj_id: ObjId, check_presence: bool = True) None[source]#

Add a new object to the object storage.

Parameters:
  • content – object’s raw content to add in storage.

  • obj_id – either dict of checksums, or single checksum of [bytes] using [ID_HASH_ALGO] algorithm. It is trusted to match the bytes.

  • check_presence (bool) – indicate if the presence of the content should be verified before adding the file.

Returns:

the id (bytes) of the object into the storage.

get(obj_id: ObjId) bytes[source]#

Retrieve the content of a given object.

Parameters:

obj_id – object id.

Returns:

the content of the requested object as bytes.

Raises:

ObjNotFoundError – if the requested object is missing.

delete(obj_id: ObjId)[source]#

Delete an object.

Parameters:

obj_id – object identifier.

Raises:

ObjNotFoundError – if the requested object is missing.

list_content(last_obj_id: ObjId | None = None, limit: int | None = 10000) Iterator[ObjId][source]#

Generates known object ids.

Parameters:
  • last_obj_id – object id from which to iterate from (excluded).

  • limit (int) – max number of object ids to generate. If unset (None), generate all objects (behavior might not be guaranteed for all backends).

Generates:

obj_id: object ids.

iter_from(obj_id, n_leaf=False)[source]#