swh.model.hashutil module#
Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.
Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.
This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.
Basic usage examples:
- file object: MultiHash.from_file(
file_object, hash_names=DEFAULT_ALGORITHMS).digest()
path (filepath): MultiHash.from_path(b’foo’).hexdigest()
data (bytes): MultiHash.from_data(b’foo’).bytehexdigest()
“Complex” usage, defining a swh hashlib instance first:
To compute length, integrate the length to the set of algorithms to compute, for example:
h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS)) with open(filepath, 'rb') as f: h.update(f.read(HASH_BLOCK_SIZE)) hashes = h.digest() # returns a dict of {hash_algo_name: hash_in_bytes}
Write alongside computing hashing algorithms (from a stream), example:
h = MultiHash(length=length) with open(filepath, 'wb') as f: for chunk in r.iter_content(): # r a stream of sort h.update(chunk) f.write(chunk) hashes = h.hexdigest() # returns a dict of {hash_algo_name: hash_in_hex}
- swh.model.hashutil.ALGORITHMS = {'blake2b512', 'blake2s256', 'md5', 'sha1', 'sha1_git', 'sha256', 'sha512'}#
Hashing algorithms supported by this module
- swh.model.hashutil.DEFAULT_ALGORITHMS = {'blake2s256', 'sha1', 'sha1_git', 'sha256'}#
Algorithms computed by default when calling the functions from this module.
Subset of
ALGORITHMS
.
- swh.model.hashutil.HASH_BLOCK_SIZE = 32768#
Block size for streaming hash computations made in this module
- class swh.model.hashutil.MultiHash(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]#
Bases:
object
Hashutil class to support multiple hashes computation.
- Parameters:
If the length is provided as algorithm, the length is also computed and returned.
- swh.model.hashutil.git_object_header(git_type: str, length: int) bytes [source]#
Returns the header for a git object of the given type and length.
- The header of a git object consists of:
The type of the object (encoded in ASCII)
One ASCII space ( )
The length of the object (decimal encoded in ASCII)
One NUL byte
- Parameters:
base_algo (str from
ALGORITHMS
) – a hashlib-supported algorithmgit_type – the type of the git object (supposedly one of ‘blob’, ‘commit’, ‘tag’, ‘tree’)
length – the length of the git object you’re encoding
- Returns:
a hashutil.hash object
- swh.model.hashutil.hash_git_data(data, git_type, base_algo='sha1')[source]#
Hash the given data as a git object of type git_type.
- Parameters:
data – a bytes object
git_type – the git object type
base_algo – the base hashing algorithm used (default: sha1)
Returns: a dict mapping each algorithm to a bytes digest
- Raises:
ValueError if the git_type is unexpected. –
- swh.model.hashutil.hash_to_hex(hash: str | bytes) str [source]#
Converts a hash (in hex or bytes form) to its hexadecimal ascii form
- swh.model.hashutil.hash_to_bytehex(hash: bytes) bytes [source]#
Converts a hash to its hexadecimal bytes representation