swh.objstorage.backends.pathslicing module#
- swh.objstorage.backends.pathslicing.is_valid_filename(filename: str, algo: Literal['sha1', 'sha256'] = 'sha1')[source]#
Checks that the file points to a valid hexdigest for the given algo.
- class swh.objstorage.backends.pathslicing.PathSlicer(root: str, slicing: str)[source]#
Bases:
object
Helper class to compute a path based on a hash.
Used to compute a directory path based on the object hash according to a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
For instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will have the following computed path:
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
- Args:
root (str): path to the root directory of the storage on the disk. slicing (str): the slicing configuration.
- check_config()[source]#
Check the slicing configuration is valid.
- Raises:
ValueError – if the slicing configuration is invalid.
- get_directory(hex_obj_id: str) str [source]#
Compute the storage directory of an object.
See also: PathSlicer::get_path
- Parameters:
hex_obj_id – object id as hexlified string.
- Returns:
Absolute path (including root) to the directory that contains the given object id.
- class swh.objstorage.backends.pathslicing.PathSlicingObjStorage(*, root: str = '', compression: Literal['bz2', 'lzma', 'gzip', 'zlib', 'none'] = 'gzip', slicing: str = '', **kwargs)[source]#
Bases:
ObjStorage
Implementation of the ObjStorage API based on the hash of the content.
On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).
To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
The files in the storage are stored in gzipped compressed format.
- Parameters:
- name: str = 'pathslicing'#
Default objstorage name; can be overloaded at instantiation time giving a ‘name’ argument to the constructor
- add(content: bytes, obj_id: ObjId, check_presence: bool = True) None [source]#
Add a new object to the object storage.
- Parameters:
content – object’s raw content to add in storage.
obj_id – either dict of checksums, or single checksum of [bytes] using [ID_HASH_ALGO] algorithm. It is trusted to match the bytes.
check_presence (bool) – indicate if the presence of the content should be verified before adding the file.
- Returns:
the id (bytes) of the object into the storage.
- get(obj_id: ObjId) bytes [source]#
Retrieve the content of a given object.
- Parameters:
obj_id – object id.
- Returns:
the content of the requested object as bytes.
- Raises:
ObjNotFoundError – if the requested object is missing.
- delete(obj_id: ObjId)[source]#
Delete an object.
- Parameters:
obj_id – object identifier.
- Raises:
ObjNotFoundError – if the requested object is missing.
- list_content(last_obj_id: ObjId | None = None, limit: int | None = 10000) Iterator[ObjId] [source]#
Generates known object ids.
- Parameters:
last_obj_id – object id from which to iterate from (excluded).
limit (int) – max number of object ids to generate. If unset (None), generate all objects (behavior might not be guaranteed for all backends).
- Generates:
obj_id: object ids.