swh.storage.proxies.tenacious module

class swh.storage.proxies.tenacious.RateQueue(size: int, max_errors: int)[source]

Bases: object

add_ok(n_ok: int = 1) None[source]
add_error(n_error: int = 1) None[source]
limit_reached() bool[source]
class swh.storage.proxies.tenacious.TenaciousProxyStorage(storage, error_rate_limit: Optional[Dict[str, int]] = None, retries: int = 3)[source]

Bases: object

Storage proxy that have a tenacious insertion behavior.

When an xxx_add method is called, it’s first attempted as is against the backend storage. If a failure occurs, split the list of inserted objects in pieces until erroneous objects have been identified, so all the valid objects are guaranteed to be inserted.

Also provides a error-rate limit feature: if more than n errors occurred during the insertion of the last p (window_size) objects, stop accepting any insertion.

The number of insertion retries for a single object can be specified via the ‘retries’ parameter.

This proxy is mainly intended to be used in a replayer configuration (aka a mirror stack), where insertion errors are mostly unexpected (which explains the low default ratio errors/window_size).

Conversely, it should not be used in a loader configuration, as it may drop objects without stopping the loader, which leads to holes in the graph.

Deployments using this proxy should carefully monitor their logs to check any failure is expected (because the failed object is corrupted), not because of transient errors or issues with the storage backend.

Sample configuration use case for tenacious storage:

  cls: tenacious
  cls: remote
  args: http://storage.internal.staging.swh.network:5002/
  errors: 10
  window_size: 1000
tenacious_methods: Dict[str, str] = {'content_add': 'content', 'content_add_metadata': 'content', 'directory_add': 'directory', 'extid_add': 'extid', 'origin_add': 'origin', 'release_add': 'release', 'revision_add': 'revision', 'skipped_content_add': 'skipped_content', 'snapshot_add': 'snapshot'}