Class in charge of (re)computing content’s hashes.
Hashes to compute are defined across 2 configuration options:
- compute_checksums ([str])
list of hash algorithms that py:func:swh.model.hashutil.MultiHash.from_data function should be able to deal with. For variable-length checksums, a desired checksum length should also be provided. Their format is <algorithm’s name>:<variable-length> e.g: blake2:512
- recompute_checksums (bool)
a boolean to notify that we also want to recompute potential existing hashes specified in compute_checksums. Default to False.
get_new_contents_metadata(all_contents: List[Dict[str, Any]]) → Generator[Tuple[Dict[str, Any], List[Any]], Any, None]¶
- Retrieve raw contents and compute new checksums on the
contents. Unknown or corrupted contents are skipped.
all_contents – List of contents as dictionary with the necessary primary keys
tuple – tuple of (content to update, list of checksums computed)
run(contents: List[Dict[str, Any]]) → Dict¶
Given a list of content:
(re)compute a given set of checksums on contents available in our object storage
update those contents with the new metadata
contents – contents as dictionary with necessary keys. key present in such dictionary should be the ones defined in the ‘primary_key’ option.
A summary dict with key ‘status’, task’ status and ‘count’ the number of updated contents.