swh.indexer.rehash module¶
-
class
swh.indexer.rehash.
RecomputeChecksums
[source]¶ Bases:
object
Class in charge of (re)computing content’s hashes.
Hashes to compute are defined across 2 configuration options:
- compute_checksums ([str])
list of hash algorithms that py:func:swh.model.hashutil.MultiHash.from_data function should be able to deal with. For variable-length checksums, a desired checksum length should also be provided. Their format is <algorithm’s name>:<variable-length> e.g: blake2:512
- recompute_checksums (bool)
a boolean to notify that we also want to recompute potential existing hashes specified in compute_checksums. Default to False.
-
get_new_contents_metadata
(all_contents: List[Dict[str, Any]]) → Generator[Tuple[Dict[str, Any], List[Any]], Any, None][source]¶ - Retrieve raw contents and compute new checksums on the
contents. Unknown or corrupted contents are skipped.
- Parameters
all_contents – List of contents as dictionary with the necessary primary keys
- Yields
tuple – tuple of (content to update, list of checksums computed)
-
run
(contents: List[Dict[str, Any]]) → Dict[source]¶ Given a list of content:
(re)compute a given set of checksums on contents available in our object storage
update those contents with the new metadata
- Parameters
contents – contents as dictionary with necessary keys. key present in such dictionary should be the ones defined in the ‘primary_key’ option.
- Returns
A summary dict with key ‘status’, task’ status and ‘count’ the number of updated contents.