swh.model.hashutil module#

Module in charge of hashing function definitions. This is the base module use to compute swh’s hashes.

Only a subset of hashing algorithms is supported as defined in the ALGORITHMS set. Any provided algorithms not in that list will result in a ValueError explaining the error.

This module defines a MultiHash class to ease the softwareheritage hashing algorithms computation. This allows to compute hashes from file object, path, data using a similar interface as what the standard hashlib module provides.

Basic usage examples:

  • file object: MultiHash.from_file(

    file_object, hash_names=DEFAULT_ALGORITHMS).digest()

  • path (filepath): MultiHash.from_path(b’foo’).hexdigest()

  • data (bytes): MultiHash.from_data(b’foo’).bytehexdigest()

“Complex” usage, defining a swh hashlib instance first:

  • To compute length, integrate the length to the set of algorithms to compute, for example:

    h = MultiHash(hash_names=set({'length'}).union(DEFAULT_ALGORITHMS))
    with open(filepath, 'rb') as f:
        h.update(f.read(HASH_BLOCK_SIZE))
    hashes = h.digest()  # returns a dict of {hash_algo_name: hash_in_bytes}
    
  • Write alongside computing hashing algorithms (from a stream), example:

    h = MultiHash(length=length)
    with open(filepath, 'wb') as f:
        for chunk in r.iter_content():  # r a stream of sort
            h.update(chunk)
            f.write(chunk)
    hashes = h.hexdigest()  # returns a dict of {hash_algo_name: hash_in_hex}
    
swh.model.hashutil.ALGORITHMS = {'blake2b512', 'blake2s256', 'md5', 'sha1', 'sha1_git', 'sha256', 'sha512'}#

Hashing algorithms supported by this module

swh.model.hashutil.DEFAULT_ALGORITHMS = {'blake2s256', 'sha1', 'sha1_git', 'sha256'}#

Algorithms computed by default when calling the functions from this module.

Subset of ALGORITHMS.

swh.model.hashutil.HASH_BLOCK_SIZE = 32768#

Block size for streaming hash computations made in this module

class swh.model.hashutil.MultiHash(hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]#

Bases: object

Hashutil class to support multiple hashes computation.

Parameters:
  • hash_names (set) – Set of hash algorithms (+ optionally length) to compute hashes (cf. DEFAULT_ALGORITHMS)

  • length (int) – Length of the total sum of chunks to read

If the length is provided as algorithm, the length is also computed and returned.

classmethod from_state(state, track_length)[source]#
classmethod from_file(fobj, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'}, length=None)[source]#
classmethod from_path(path, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'})[source]#
classmethod from_data(data, hash_names={'blake2s256', 'sha1', 'sha1_git', 'sha256'})[source]#
update(chunk)[source]#
digest()[source]#
hexdigest()[source]#
bytehexdigest()[source]#
copy()[source]#
swh.model.hashutil.git_object_header(git_type: str, length: int) bytes[source]#

Returns the header for a git object of the given type and length.

The header of a git object consists of:
  • The type of the object (encoded in ASCII)

  • One ASCII space ( )

  • The length of the object (decimal encoded in ASCII)

  • One NUL byte

Parameters:
  • base_algo (str from ALGORITHMS) – a hashlib-supported algorithm

  • git_type – the type of the git object (supposedly one of ‘blob’, ‘commit’, ‘tag’, ‘tree’)

  • length – the length of the git object you’re encoding

Returns:

a hashutil.hash object

swh.model.hashutil.hash_git_data(data, git_type, base_algo='sha1')[source]#

Hash the given data as a git object of type git_type.

Parameters:
  • data – a bytes object

  • git_type – the git object type

  • base_algo – the base hashing algorithm used (default: sha1)

Returns: a dict mapping each algorithm to a bytes digest

Raises:

ValueError if the git_type is unexpected.

swh.model.hashutil.hash_to_hex(hash: str | bytes) str[source]#

Converts a hash (in hex or bytes form) to its hexadecimal ascii form

Parameters:

hash (str or bytes) – a bytes hash or a str containing the hexadecimal form of the hash

Returns:

the hexadecimal form of the hash

Return type:

str

swh.model.hashutil.hash_to_bytehex(hash: bytes) bytes[source]#

Converts a hash to its hexadecimal bytes representation

Parameters:

hash (bytes) – a bytes hash

Returns:

the hexadecimal form of the hash, as bytes

Return type:

bytes

swh.model.hashutil.hash_to_bytes(hash: str | bytes) bytes[source]#

Converts a hash (in hex or bytes form) to its raw bytes form

Parameters:

hash (str or bytes) – a bytes hash or a str containing the hexadecimal form of the hash

Returns:

the bytes form of the hash

Return type:

bytes

swh.model.hashutil.bytehex_to_hash(hex: bytes) bytes[source]#

Converts a hexadecimal bytes representation of a hash to that hash

Parameters:

hash (bytes) – a bytes containing the hexadecimal form of the hash encoded in ascii

Returns:

the bytes form of the hash

Return type:

bytes