swh.model.hashutil module#
Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.
Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.
This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.
Basic usage examples:
- file object: MultiHash.from_file(
file_object, hash_names=DEFAULT_ALGORITHMS).digest()
path (filepath): MultiHash.from_path(b’foo’).hexdigest()
data (bytes): MultiHash.from_data(b’foo’).bytehexdigest()
“Complex” usage, defining a swh hashlib instance first:
To compute length, integrate the length to the set of algorithms to compute, for example:
h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS)) with open(filepath, 'rb') as f: h.update(f.read(HASH_BLOCK_SIZE)) hashes = h.digest() # returns a dict of {hash_algo_name: hash_in_bytes}
Write alongside computing hashing algorithms (from a stream), example:
h = MultiHash(length=length) with open(filepath, 'wb') as f: for chunk in r.iter_content(): # r a stream of sort h.update(chunk) f.write(chunk) hashes = h.hexdigest() # returns a dict of {hash_algo_name: hash_in_hex}
- swh.model.hashutil.ALGORITHMS = {'blake2b512', 'blake2s256', 'md5', 'sha1', 'sha1_git', 'sha256', 'sha512'}#
Hashing algorithms supported by this module
- swh.model.hashutil.DEFAULT_ALGORITHMS_LIST: Tuple[Literal['sha1', 'sha256', 'sha1_git', 'blake2s256'], ...] = ('sha1', 'sha256', 'sha1_git', 'blake2s256')#
Algorithms computed when identifying Content objects (as a tuple, with order).
- swh.model.hashutil.DEFAULT_ALGORITHMS: FrozenSet[Literal['sha1', 'sha256', 'sha1_git', 'blake2s256']] = frozenset({'blake2s256', 'sha1', 'sha1_git', 'sha256'})#
Algorithms computed by default when calling the functions from this module.
Subset of
ALGORITHMS.
- swh.model.hashutil.HASH_BLOCK_SIZE = 32768#
Block size for streaming hash computations made in this module
- class swh.model.hashutil.MultiHash(hash_names=frozenset({'blake2s256', 'sha1', 'sha1_git', 'sha256'}), length=None)[source]#
Bases:
objectHashutil class to support multiple hashes computation.
- Parameters:
If the length is provided as algorithm, the length is also computed and returned.
- classmethod from_file(fobj, hash_names=frozenset({'blake2s256', 'sha1', 'sha1_git', 'sha256'}), length=None)[source]#
- classmethod from_path(path, hash_names=frozenset({'blake2s256', 'sha1', 'sha1_git', 'sha256'}))[source]#
- swh.model.hashutil.git_object_header(git_type: str, length: int) bytes[source]#
Returns the header for a git object of the given type and length.
- The header of a git object consists of:
The type of the object (encoded in ASCII)
One ASCII space ( )
The length of the object (decimal encoded in ASCII)
One NUL byte
- Parameters:
base_algo (str from
ALGORITHMS) – a hashlib-supported algorithmgit_type – the type of the git object (supposedly one of ‘blob’, ‘commit’, ‘tag’, ‘tree’)
length – the length of the git object you’re encoding
- Returns:
a hashutil.hash object
- swh.model.hashutil.hash_git_data(data, git_type, base_algo='sha1')[source]#
Hash the given data as a git object of type git_type.
- Parameters:
data – a bytes object
git_type – the git object type
base_algo – the base hashing algorithm used (default: sha1)
Returns: a dict mapping each algorithm to a bytes digest
- Raises:
ValueError if the git_type is unexpected. –
- swh.model.hashutil.hash_to_hex(hash: str | bytes) str[source]#
Converts a hash (in hex or bytes form) to its hexadecimal ascii form
- swh.model.hashutil.hash_to_bytehex(hash: bytes) bytes[source]#
Converts a hash to its hexadecimal bytes representation