swh.model.merkle module

Merkle tree data structure

swh.model.merkle.deep_update(left, right)[source]

Recursively update the left mapping with deeply nested values from the right mapping.

This function is useful to merge the results of several calls to MerkleNode.collect().

Parameters
  • left – a mapping (modified by the update operation)

  • right – a mapping

Returns

the left mapping, updated with nested values from the right mapping

Example

>>> a = {
...     'key1': {
...         'key2': {
...              'key3': 'value1/2/3',
...         },
...     },
... }
>>> deep_update(a, {
...     'key1': {
...         'key2': {
...              'key4': 'value1/2/4',
...         },
...     },
... }) == {
...     'key1': {
...         'key2': {
...             'key3': 'value1/2/3',
...             'key4': 'value1/2/4',
...         },
...     },
... }
True
>>> deep_update(a, {
...     'key1': {
...         'key2': {
...              'key3': 'newvalue1/2/3',
...         },
...     },
... }) == {
...     'key1': {
...         'key2': {
...             'key3': 'newvalue1/2/3',
...             'key4': 'value1/2/4',
...         },
...     },
... }
True
class swh.model.merkle.MerkleNode(data=None)[source]

Bases: dict

Representation of a node in a Merkle Tree.

A (generalized) Merkle Tree is a tree in which every node is labeled with a hash of its own data and the hash of its children.

In pseudocode:

node.hash = hash(node.data
                 + sum(child.hash for child in node.children))

This class efficiently implements the Merkle Tree data structure on top of a Python dict, minimizing hash computations and new data collections when updating nodes.

Node data is stored in the data attribute, while (named) children are stored as items of the underlying dictionary.

Addition, update and removal of objects are instrumented to automatically invalidate the hashes of the current node as well as its registered parents; It also resets the collection status of the objects so the updated objects can be collected.

The collection of updated data from the tree is implemented through the collect() function and associated helpers.

data

data associated to the current node

Type

dict

parents

known parents of the current node

Type

list

collected

whether the current node has been collected

Type

bool

parents
data
collected
invalidate_hash()[source]

Invalidate the cached hash of the current node.

update_hash(*, force=False)[source]

Recursively compute the hash of the current node.

Parameters

force (bool) – invalidate the cache and force the computation for this node and all children.

property hash

The hash of the current node, as calculated by compute_hash().

abstract compute_hash()[source]

Compute the hash of the current node.

The hash should depend on the data of the node, as well as on hashes of the children nodes.

update(new_children)[source]

Add several named children from a dictionary

get_data(**kwargs)[source]

Retrieve and format the collected data for the current node, for use by collect().

Can be overridden, for instance when you want the collected data to contain information about the child nodes.

Parameters

kwargs – allow subclasses to alter behaviour depending on how collect() is called.

Returns

data formatted for collect()

collect_node(**kwargs)[source]

Collect the data for the current node, for use by collect().

Parameters

kwargs – passed as-is to get_data().

Returns

A dict compatible with collect().

collect(**kwargs)[source]

Collect the data for all nodes in the subtree rooted at self.

The data is deduplicated by type and by hash.

Parameters

kwargs – passed as-is to get_data().

Returns

A dict with the following structure:

{
  'typeA': {
    node1.hash: node1.get_data(),
    node2.hash: node2.get_data(),
  },
  'typeB': {
    node3.hash: node3.get_data(),
    ...
  },
  ...
}

reset_collect()[source]

Recursively unmark collected nodes in the subtree rooted at self.

This lets the caller use collect() again.

iter_tree() → Iterator[swh.model.merkle.MerkleNode][source]

Yields all children nodes, recursively. Common nodes are deduplicated.

class swh.model.merkle.MerkleLeaf(data=None)[source]

Bases: swh.model.merkle.MerkleNode

A leaf to a Merkle tree.

A Merkle leaf is simply a Merkle node with children disabled.

update(new_children)[source]

Children update operation. Disabled for leaves.