swh.storage.algos.dir_iterators module#

class swh.storage.algos.dir_iterators.DirectoryIterator(storage: StorageInterface, dir_id: bytes | None, base_path: bytes = b'')[source]#

Bases: object

Helper class used to iterate on a directory tree in a depth-first search way with some additional features:

  • sibling nodes are iterated in lexicographic order by name

  • it is possible to skip the visit of sub-directories nodes for efficiency reasons when comparing two trees (no need to go deeper if two directories have the same hash)

Parameters:
restart() None[source]#

Restart the iteration at the beginning.

top() List[Dict[str, Any]] | None[source]#
Returns:

The top frame of the main directories stack

current() Dict[str, Any] | None[source]#
Returns:

The current visited directory entry, i.e. the top element from the top frame

Return type:

dict

current_hash() bytes[source]#
Returns:

The hash value of the currently visited directory entry

current_perms() int[source]#
Returns:

The permissions value of the currently visited directory entry

current_path() bytes | None[source]#
Returns:

The absolute path from the root directory of the currently visited directory entry

current_is_dir() bool[source]#
Returns:

If the currently visited directory entry is a directory

next() Dict[str, Any] | None[source]#

Advance the tree iteration by dropping the current visited directory entry from the top frame. If the top frame ends up empty, the operation is recursively applied to remove all empty frames as the tree is climbed up towards its root.

Returns:

The description of the newly visited directory entry

step() Dict[str, Any] | None[source]#

Advance the tree iteration like the next operation with the difference that if the current visited element is a sub-directory a new frame representing its content is pushed to the main stack.

Returns:

The description of the newly visited directory entry

drop() None[source]#

Drop the current visited element from the top frame. If the frame ends up empty, the operation is recursively applied.

swh.storage.algos.dir_iterators.dir_iterator(storage: StorageInterface, dir_id: bytes) DirectoryIterator[source]#

Return an iterator for recursively visiting a directory and its sub-directories. The associated paths are visited in lexicographic depth-first search order.

Parameters:
  • storage – an instance of a swh storage

  • dir_id – a directory identifier

Returns:

an iterator returning a dict at each iteration step describing a directory entry. A path field is added in that dict to store the absolute path of the entry.

class swh.storage.algos.dir_iterators.Remaining(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Enum to represent the current state when iterating on both directory trees at the same time.

NoMoreFiles = 0#
OnlyToFilesRemain = 1#
OnlyFromFilesRemain = 2#
BothHaveFiles = 3#
class swh.storage.algos.dir_iterators.DoubleDirectoryIterator(storage: StorageInterface, dir_from: bytes | None, dir_to: bytes)[source]#

Bases: object

Helper class to traverse two directory trees at the same time and compare their contents to detect changes between them.

Parameters:
  • storage – instance of swh storage

  • dir_from – hash identifier of the from directory

  • dir_to – hash identifier of the to directory

restart() None[source]#

Restart the double iteration at the beginning.

next_from() None[source]#

Apply the next operation on the from iterator.

next_to() None[source]#

Apply the next operation on the to iterator.

next_both() None[source]#

Apply the next operation on both iterators.

step_from() None[source]#

Apply the step operation on the from iterator.

step_to() None[source]#

Apply the step operation on the from iterator.

step_both() None[source]#

Apply the step operation on the both iterators.

remaining() Remaining[source]#
Returns:

the current state of the double iteration

Return type:

Remaining

compare() Dict[str, Any][source]#

Compare the current iterated directory entries in both iterators and return the comparison status.

Returns:

  • same_hash: indicates if the two entries have the same hash

  • same_perms: indicates if the two entries have the same permissions

  • both_are_dirs: indicates if the two entries are directories

  • both_are_files: indicates if the two entries are regular files

  • file_and_dir: indicates if one of the entry is a directory and the other a regular file

  • from_is_empty_dir: indicates if the from entry is the empty directory

  • to_is_empty_dir: indicates if the to entry is the empty directory

Return type:

The status of the comparison with the following bool values