swh.storage.algos.diff module#
- swh.storage.algos.diff.diff_directories(storage: StorageInterface, from_dir: bytes | None, to_dir: bytes, track_renaming: bool = False) List[Dict[str, Any]] [source]#
Compute the differential between two directories, i.e. the list of file changes (insertion / deletion / modification / renaming) between them.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
from_dir – the swh identifier of the directory to compare from
to_dir – the swh identifier of the directory to compare to
track_renaming – whether or not to track files renaming
- Returns:
A list of dict representing the changes between the two revisions. Each dict contains the following entries:
type
: a string describing the type of change (insert
/delete
/modify
/rename
)from
: a dict containing the directory entry metadata in the from revision (None
in case of an insertion)from_path
: bytes string corresponding to the absolute path of the from revision entry (None
in case of an insertion)to
: a dict containing the directory entry metadata in the to revision (None
in case of a deletion)to_path
: bytes string corresponding to the absolute path of the to revision entry (None
in case of a deletion)
The returned list is sorted in lexicographic depth-first order according to the value of the
to_path
field.Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.
- swh.storage.algos.diff.diff_revisions(storage: StorageInterface, from_rev: bytes | None, to_rev: bytes, track_renaming: bool = False) List[Dict[str, Any]] [source]#
Compute the differential between two revisions, i.e. the list of file changes between the two associated directories.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
from_rev – the identifier of the revision to compare from
to_rev – the identifier of the revision to compare to
track_renaming – whether or not to track files renaming
- Returns:
A list of dict describing the introduced file changes (see
swh.storage.algos.diff.diff_directories()
).
Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.
- swh.storage.algos.diff.diff_revision(storage: StorageInterface, revision: bytes, track_renaming: bool = False) List[Dict[str, Any]] [source]#
Computes the differential between a revision and its first parent. If the revision has no parents, the directory to compare from is considered as empty. In other words, it computes the file changes introduced in a specific revision.
- Parameters:
storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)
revision – the identifier of the revision from which to compute the introduced changes.
track_renaming – whether or not to track files renaming
- Returns:
A list of dict describing the introduced file changes (see
swh.storage.algos.diff.diff_directories()
).
Warning
The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.