swh.provenance.api.client module#

class swh.provenance.api.client.RemoteProvenance(url: str, timeout: None | Tuple[float, float] | List[float] | float = None, chunk_size: int = 4096, max_retries: int = 3, pool_connections: int = 20, pool_maxsize: int = 100, adapter_kwargs: Dict[str, Any] | None = None, api_exception: Type[Exception] | None = None, reraise_exceptions: List[Type[Exception]] | None = None, **kwargs)[source]#

Bases: RPCClient

Proxy to a remote provenance API

backend_class#

alias of ProvenanceInterface

reraise_exceptions: List[Type[Exception]] = [<class 'swh.provenance.exc.ProvenanceException'>]#

On server errors, if any of the exception classes in this list has the same name as the error name, then the exception will be instantiated and raised instead of a generic RemoteException.

extra_type_decoders: Dict[str, Callable] = {'core_swhid': <bound method _BaseSWHID.from_string of <class 'swh.model.swhids.CoreSWHID'>>, 'extended_swhid': <bound method _BaseSWHID.from_string of <class 'swh.model.swhids.ExtendedSWHID'>>, 'qualified_swhid': <bound method QualifiedSWHID.from_string of <class 'swh.model.swhids.QualifiedSWHID'>>}#

Value of extra_decoders passed to json_loads or msgpack_loads to be able to deserialize more object types.

extra_type_encoders: List[Tuple[type, str, Callable]] = [(<class 'swh.model.swhids.CoreSWHID'>, 'core_swhid', <class 'str'>), (<class 'swh.model.swhids.ExtendedSWHID'>, 'extended_swhid', <class 'str'>), (<class 'swh.model.swhids.QualifiedSWHID'>, 'qualified_swhid', <class 'str'>)]#

Value of extra_encoders passed to json_dumps or msgpack_dumps to be able to serialize more object types.

check_config() bool#

Check that the storage is configured and ready to go.

whereare(*, swhids: List[CoreSWHID]) List[QualifiedSWHID | None]#

Given a SWHID list return a list of provenance info:

See whereis documentation for details on the provenance info.

whereis(*, swhid: CoreSWHID) QualifiedSWHID | None#

Given a SWHID return a QualifiedSWHID with some provenance info:

  • the release or revision containing that content or directory

  • the url of the origin containing that content or directory

This can also be called for revision, release or snapshot to retrieve origin url information if any. When using a revision, the anchor will be an association release if any.

If no anchor could be found, this function return None.

note: The quality of the result is not guaranteed whatsoever. Since the definition of “best” likely vary from one usage to the next, this API will evolve in the futur when this notion get better defined.

For example, if we are looking for provenance information to detect prior art. We search for the first appearance of a content. So the “best answer” is the oldest content, something a bit tricky to determine as we can’t fully trust the date of revision. On the other hand, if we try to know which library are used and at which version, to detect CVE or outdated dependencies, the best answer is the most recent release/revision in the authoritative origin relevant to a content. Finding the authoritative origin is a challenge in itself.