swh.model.merkle module¶
Merkle tree data structure
-
swh.model.merkle.
deep_update
(left, right)[source]¶ Recursively update the left mapping with deeply nested values from the right mapping.
This function is useful to merge the results of several calls to
MerkleNode.collect()
.- Parameters
left – a mapping (modified by the update operation)
right – a mapping
- Returns
the left mapping, updated with nested values from the right mapping
Example
>>> a = { ... 'key1': { ... 'key2': { ... 'key3': 'value1/2/3', ... }, ... }, ... } >>> deep_update(a, { ... 'key1': { ... 'key2': { ... 'key4': 'value1/2/4', ... }, ... }, ... }) == { ... 'key1': { ... 'key2': { ... 'key3': 'value1/2/3', ... 'key4': 'value1/2/4', ... }, ... }, ... } True >>> deep_update(a, { ... 'key1': { ... 'key2': { ... 'key3': 'newvalue1/2/3', ... }, ... }, ... }) == { ... 'key1': { ... 'key2': { ... 'key3': 'newvalue1/2/3', ... 'key4': 'value1/2/4', ... }, ... }, ... } True
-
class
swh.model.merkle.
MerkleNode
(data=None)[source]¶ Bases:
dict
Representation of a node in a Merkle Tree.
A (generalized) Merkle Tree is a tree in which every node is labeled with a hash of its own data and the hash of its children.
In pseudocode:
node.hash = hash(node.data + sum(child.hash for child in node.children))
This class efficiently implements the Merkle Tree data structure on top of a Python
dict
, minimizing hash computations and new data collections when updating nodes.Node data is stored in the
data
attribute, while (named) children are stored as items of the underlying dictionary.Addition, update and removal of objects are instrumented to automatically invalidate the hashes of the current node as well as its registered parents; It also resets the collection status of the objects so the updated objects can be collected.
The collection of updated data from the tree is implemented through the
collect()
function and associated helpers.-
data
¶ data associated to the current node
- Type
dict
-
parents
¶ known parents of the current node
- Type
list
-
collected
¶ whether the current node has been collected
- Type
bool
-
parents
¶
-
data
¶
-
collected
¶
-
update_hash
(*, force=False)[source]¶ Recursively compute the hash of the current node.
- Parameters
force (bool) – invalidate the cache and force the computation for this node and all children.
-
property
hash
¶ The hash of the current node, as calculated by
compute_hash()
.
-
abstract
compute_hash
()[source]¶ Compute the hash of the current node.
The hash should depend on the data of the node, as well as on hashes of the children nodes.
-
get_data
(**kwargs)[source]¶ Retrieve and format the collected data for the current node, for use by
collect()
.Can be overridden, for instance when you want the collected data to contain information about the child nodes.
-
collect_node
(**kwargs)[source]¶ Collect the data for the current node, for use by
collect()
.- Parameters
kwargs – passed as-is to
get_data()
.- Returns
A
dict
compatible withcollect()
.
-
collect
(**kwargs)[source]¶ Collect the data for all nodes in the subtree rooted at self.
The data is deduplicated by type and by hash.
- Parameters
kwargs – passed as-is to
get_data()
.- Returns
A
dict
with the following structure:{ 'typeA': { node1.hash: node1.get_data(), node2.hash: node2.get_data(), }, 'typeB': { node3.hash: node3.get_data(), ... }, ... }
-
reset_collect
()[source]¶ Recursively unmark collected nodes in the subtree rooted at self.
This lets the caller use
collect()
again.
-
iter_tree
() → Iterator[swh.model.merkle.MerkleNode][source]¶ Yields all children nodes, recursively. Common nodes are deduplicated.
-
-
class
swh.model.merkle.
MerkleLeaf
(data=None)[source]¶ Bases:
swh.model.merkle.MerkleNode
A leaf to a Merkle tree.
A Merkle leaf is simply a Merkle node with children disabled.