swh.storage.algos.dir_iterators module

class swh.storage.algos.dir_iterators.DirectoryIterator(storage, dir_id, base_path=b'')[source]

Bases: object

Helper class used to iterate on a directory tree in a depth-first search way with some additional features:

  • sibling nodes are iterated in lexicographic order by name

  • it is possible to skip the visit of sub-directories nodes for efficiency reasons when comparing two trees (no need to go deeper if two directories have the same hash)

restart()[source]

Restart the iteration at the beginning.

top()[source]
Returns

The top frame of the main directories stack

Return type

list

current()[source]
Returns

The current visited directory entry, i.e. the top element from the top frame

Return type

dict

current_hash()[source]
Returns

The hash value of the currently visited directory entry

Return type

bytes

current_perms()[source]
Returns

The permissions value of the currently visited directory entry

Return type

int

current_path()[source]
Returns

The absolute path from the root directory of the currently visited directory entry

Return type

str

current_is_dir()[source]
Returns

If the currently visited directory entry is a directory

Return type

bool

next()[source]

Advance the tree iteration by dropping the current visited directory entry from the top frame. If the top frame ends up empty, the operation is recursively applied to remove all empty frames as the tree is climbed up towards its root.

Returns

The description of the newly visited directory entry

Return type

dict

step()[source]

Advance the tree iteration like the next operation with the difference that if the current visited element is a sub-directory a new frame representing its content is pushed to the main stack.

Returns

The description of the newly visited directory entry

Return type

dict

drop()[source]

Drop the current visited element from the top frame. If the frame ends up empty, the operation is recursively applied.

swh.storage.algos.dir_iterators.dir_iterator(storage, dir_id)[source]

Return an iterator for recursively visiting a directory and its sub-directories. The associated paths are visited in lexicographic depth-first search order.

Parameters
  • storage (swh.storage.Storage) – an instance of a swh storage

  • dir_id (bytes) – a directory identifier

Returns

an iterator

returning a dict at each iteration step describing a directory entry. A ‘path’ field is added in that dict to store the absolute path of the entry.

Return type

swh.storage.algos.dir_iterators.DirectoryIterator

class swh.storage.algos.dir_iterators.Remaining(value)[source]

Bases: enum.Enum

Enum to represent the current state when iterating on both directory trees at the same time.

NoMoreFiles = 0
OnlyToFilesRemain = 1
OnlyFromFilesRemain = 2
BothHaveFiles = 3
class swh.storage.algos.dir_iterators.DoubleDirectoryIterator(storage, dir_from, dir_to)[source]

Bases: object

Helper class to traverse two directory trees at the same time and compare their contents to detect changes between them.

restart()[source]

Restart the double iteration at the beginning.

next_from()[source]

Apply the next operation on the from iterator.

next_to()[source]

Apply the next operation on the to iterator.

next_both()[source]

Apply the next operation on both iterators.

step_from()[source]

Apply the step operation on the from iterator.

step_to()[source]

Apply the step operation on the from iterator.

step_both()[source]

Apply the step operation on the both iterators.

remaining()[source]
Returns

the current state of the double iteration

Return type

Remaining

compare()[source]

Compare the current iterated directory entries in both iterators and return the comparison status.

Returns

The status of the comparison with the following bool values:
  • same_hash: indicates if the two entries have the same hash

  • same_perms: indicates if the two entries have the same permissions

  • both_are_dirs: indicates if the two entries are directories

  • both_are_files: indicates if the two entries are regular files

  • file_and_dir: indicates if one of the entry is a directory and the other a regular file

  • from_is_empty_dir: indicates if the from entry is the empty directory

  • from_is_empty_dir: indicates if the to entry is the empty directory

Return type

dict