swh.model.hashutil module¶
Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.
Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.
This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.
Basic usage examples:
- file object: MultiHash.from_file(
file_object, hash_names=DEFAULT_ALGORITHMS).digest()
path (filepath): MultiHash.from_path(b’foo’).hexdigest()
data (bytes): MultiHash.from_data(b’foo’).bytehexdigest()
“Complex” usage, defining a swh hashlib instance first:
To compute length, integrate the length to the set of algorithms to compute, for example:
h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS)) with open(filepath, 'rb') as f: h.update(f.read(HASH_BLOCK_SIZE)) hashes = h.digest() # returns a dict of {hash_algo_name: hash_in_bytes}
Write alongside computing hashing algorithms (from a stream), example:
h = MultiHash(length=length) with open(filepath, 'wb') as f: for chunk in r.iter_content(): # r a stream of sort h.update(chunk) f.write(chunk) hashes = h.hexdigest() # returns a dict of {hash_algo_name: hash_in_hex}
-
swh.model.hashutil.
ALGORITHMS
= {'blake2b512', 'blake2s256', 'sha1', 'sha1_git', 'sha256'}¶ Hashing algorithms supported by this module
-
swh.model.hashutil.
DEFAULT_ALGORITHMS
= {'blake2s256', 'sha1', 'sha1_git', 'sha256'}¶ Algorithms computed by default when calling the functions from this module.
Subset of
ALGORITHMS
.
-
swh.model.hashutil.
HASH_BLOCK_SIZE
= 32768¶ Block size for streaming hash computations made in this module
-
class
swh.model.hashutil.
MultiHash
(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]¶ Bases:
object
Hashutil class to support multiple hashes computation.
- Parameters
hash_names (set) – Set of hash algorithms (+ optionally length) to compute hashes (cf. DEFAULT_ALGORITHMS)
length (int) – Length of the total sum of chunks to read
If the length is provided as algorithm, the length is also computed and returned.
-
swh.model.hashutil.
hash_git_data
(data, git_type, base_algo='sha1')[source]¶ Hash the given data as a git object of type git_type.
- Parameters
data – a bytes object
git_type – the git object type
base_algo – the base hashing algorithm used (default: sha1)
Returns: a dict mapping each algorithm to a bytes digest
- Raises
ValueError if the git_type is unexpected. –
-
swh.model.hashutil.
hash_to_hex
(hash)[source]¶ Converts a hash (in hex or bytes form) to its hexadecimal ascii form
- Parameters
hash (str or bytes) – a
bytes
hash or astr
containing the hexadecimal form of the hash- Returns
the hexadecimal form of the hash
- Return type
str
-
swh.model.hashutil.
hash_to_bytehex
(hash)[source]¶ Converts a hash to its hexadecimal bytes representation
- Parameters
hash (bytes) – a
bytes
hash- Returns
the hexadecimal form of the hash, as
bytes
- Return type
bytes