swh.storage.algos.diff module#

swh.storage.algos.diff.diff_directories(storage: StorageInterface, from_dir: bytes | None, to_dir: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#

Compute the differential between two directories, i.e. the list of file changes (insertion / deletion / modification / renaming) between them.

Parameters:
  • storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)

  • from_dir – the swh identifier of the directory to compare from

  • to_dir – the swh identifier of the directory to compare to

  • track_renaming – whether or not to track files renaming

Returns:

A list of dict representing the changes between the two revisions. Each dict contains the following entries:

  • type: a string describing the type of change (insert / delete / modify / rename)

  • from: a dict containing the directory entry metadata in the from revision (None in case of an insertion)

  • from_path: bytes string corresponding to the absolute path of the from revision entry (None in case of an insertion)

  • to: a dict containing the directory entry metadata in the to revision (None in case of a deletion)

  • to_path: bytes string corresponding to the absolute path of the to revision entry (None in case of a deletion)

The returned list is sorted in lexicographic depth-first order according to the value of the to_path field.

Warning

The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.

swh.storage.algos.diff.diff_revisions(storage: StorageInterface, from_rev: bytes | None, to_rev: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#

Compute the differential between two revisions, i.e. the list of file changes between the two associated directories.

Parameters:
  • storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)

  • from_rev – the identifier of the revision to compare from

  • to_rev – the identifier of the revision to compare to

  • track_renaming – whether or not to track files renaming

Returns:

A list of dict describing the introduced file changes (see swh.storage.algos.diff.diff_directories()).

Warning

The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.

swh.storage.algos.diff.diff_revision(storage: StorageInterface, revision: bytes, track_renaming: bool = False) List[Dict[str, Any]][source]#

Computes the differential between a revision and its first parent. If the revision has no parents, the directory to compare from is considered as empty. In other words, it computes the file changes introduced in a specific revision.

Parameters:
  • storage – instance of a swh storage (either local or remote, for optimal performance the use of a local storage is recommended)

  • revision – the identifier of the revision from which to compute the introduced changes.

  • track_renaming – whether or not to track files renaming

Returns:

A list of dict describing the introduced file changes (see swh.storage.algos.diff.diff_directories()).

Warning

The algorithm used to track files renaming is quite naive (it compares hashes between deleted and inserted files) and might fail to detect all renamings for some edge cases.