swh.storage.algos.origin module

swh.storage.algos.origin.iter_origins(storage, origin_from=1, origin_to=None, batch_size=10000)[source]

Iterates over all origins in the storage.

Parameters
  • storage – the storage object used for queries.

  • batch_size – number of origins per query

Yields

dict – the origin dictionary with the keys:

  • type: origin’s type

  • url: origin’s url

swh.storage.algos.origin.origin_get_latest_visit_status(storage, origin_url: str, type: Optional[str] = None, allowed_statuses: Optional[Iterable[str]] = None, require_snapshot: bool = False) → Optional[Tuple[swh.model.model.OriginVisit, swh.model.model.OriginVisitStatus]][source]

Get the latest origin visit (and status) of an origin. Optionally, a combination of criteria can be provided, origin type, allowed statuses or if a visit has a snapshot.

If no visit matching the criteria is found, returns None. Otherwise, returns a tuple of origin visit, origin visit status.

Parameters
  • storage – A storage backend

  • origin – origin URL

  • type – Optional visit type to filter on (e.g git, tar, dsc, svn, hg, npm, pypi, …)

  • allowed_statuses – list of visit statuses considered to find the latest visit. For instance, allowed_statuses=['full'] will only consider visits that have successfully run to completion.

  • require_snapshot – If True, only a visit with a snapshot will be returned.

Returns

a tuple of (visit, visit_status) model object if the visit and the visit status exist (and match the search criteria), None otherwise.