swh.objstorage.backends.pathslicing module#
- class swh.objstorage.backends.pathslicing.PathSlicer(root: str, slicing: str)[source]#
Bases:
object
Helper class to compute a path based on a hash.
Used to compute a directory path based on the object hash according to a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
For instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will have the following computed path:
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
- Args:
root (str): path to the root directory of the storage on the disk. slicing (str): the slicing configuration.
- check_config()[source]#
Check the slicing configuration is valid.
- Raises:
ValueError – if the slicing configuration is invalid.
- get_directory(hex_obj_id: str) str [source]#
Compute the storage directory of an object.
See also: PathSlicer::get_path
- Parameters:
hex_obj_id – object id as hexlified string.
- Returns:
Absolute path (including root) to the directory that contains the given object id.
- class swh.objstorage.backends.pathslicing.PathSlicingObjStorage(root, slicing, compression='gzip', **kwargs)[source]#
Bases:
ObjStorage
Implementation of the ObjStorage API based on the hash of the content.
On disk, an object storage is a directory tree containing files named after their object IDs. An object ID is a checksum of its content, depending on the value of the ID_HASH_ALGO constant (see swh.model.hashutil for its meaning).
To avoid directories that contain too many files, the object storage has a given slicing. Each slicing correspond to a directory that is named according to the hash of its content.
So for instance a file with SHA1 34973274ccef6ab4dfaaf86599792fa9c3fe4689 will be stored in the given object storages :
0:2/2:4/4:6 : 34/97/32/34973274ccef6ab4dfaaf86599792fa9c3fe4689
0:1/0:5/ : 3/34973/34973274ccef6ab4dfaaf86599792fa9c3fe4689
The files in the storage are stored in gzipped compressed format.
- Parameters:
- PRIMARY_HASH: typing_extensions.Literal[sha1] = 'sha1'#
- add(content: bytes, obj_id: Union[bytes, CompositeObjId], check_presence: bool = True) None [source]#
- delete(obj_id: Union[bytes, CompositeObjId])[source]#
- list_content(last_obj_id: Optional[Union[bytes, CompositeObjId]] = None, limit: Optional[int] = 10000) Iterator[CompositeObjId] [source]#