swh.model.merkle module#

Merkle tree data structure

class swh.model.merkle.MerkleNode(data=None)[source]#

Bases: dict

Representation of a node in a Merkle Tree.

A (generalized) Merkle Tree is a tree in which every node is labeled with a hash of its own data and the hash of its children.

In pseudocode:

node.hash = hash(node.data
                 + sum(child.hash for child in node.children))

This class efficiently implements the Merkle Tree data structure on top of a Python dict, minimizing hash computations and new data collections when updating nodes.

Node data is stored in the data attribute, while (named) children are stored as items of the underlying dictionary.

Addition, update and removal of objects are instrumented to automatically invalidate the hashes of the current node as well as its registered parents; It also resets the collection status of the objects so the updated objects can be collected.

The collection of updated data from the tree is implemented through the collect() function and associated helpers.

parents: List#

known parents of the current node

data: Dict#

data associated to the current node

collected: bool#

whether the current node has been collected

invalidate_hash()[source]#

Invalidate the cached hash of the current node.

update_hash(*, force=False) Any[source]#

Recursively compute the hash of the current node.

Parameters:

force (bool) – invalidate the cache and force the computation for this node and all children.

property hash: Any#

The hash of the current node, as calculated by compute_hash().

abstract compute_hash() Any[source]#

Compute the hash of the current node.

The hash should depend on the data of the node, as well as on hashes of the children nodes.

update(new_children)[source]#

Add several named children from a dictionary

get_data(**kwargs)[source]#

Retrieve and format the collected data for the current node, for use by collect().

Can be overridden, for instance when you want the collected data to contain information about the child nodes.

Parameters:

kwargs – allow subclasses to alter behaviour depending on how collect() is called.

Returns:

data formatted for collect()

collect_node() Set[MerkleNode][source]#

Collect the current node if it has not been yet, for use by collect().

collect() Set[MerkleNode][source]#

Collect the added and modified nodes in the subtree rooted at self since the last collect operation.

Returns:

A set of collected nodes

reset_collect()[source]#

Recursively unmark collected nodes in the subtree rooted at self.

This lets the caller use collect() again.

iter_tree(dedup=True) Iterator[MerkleNode][source]#

Yields all children nodes, recursively. Common nodes are deduplicated by default (deduplication can be turned off setting the given argument ‘dedup’ to False).

class swh.model.merkle.MerkleLeaf(data=None)[source]#

Bases: MerkleNode

A leaf to a Merkle tree.

A Merkle leaf is simply a Merkle node with children disabled.

update(new_children)[source]#

Children update operation. Disabled for leaves.