swh.search.api.client module#
- class swh.search.api.client.RemoteSearch(url: str, timeout: None | Tuple[float, float] | List[float] | float = None, chunk_size: int = 4096, max_retries: int = 3, pool_connections: int = 20, pool_maxsize: int = 100, adapter_kwargs: Dict[str, Any] | None = None, api_exception: Type[Exception] | None = None, reraise_exceptions: List[Type[Exception]] | None = None, enable_requests_retry: bool | None = None, **kwargs)[source]#
Bases:
RPCClient
Proxy to a remote search API
- backend_class#
alias of
SearchInterface
- reraise_exceptions: List[Type[Exception]] = [<class 'swh.search.exc.SearchException'>, <class 'swh.search.exc.SearchQuerySyntaxError'>]#
On server errors, if any of the exception classes in this list has the same name as the error name, then the exception will be instantiated and raised instead of a generic RemoteException.
- check()#
Dedicated method to execute some specific check per implementation.
- origin_delete(url: str) bool #
Remove the documents associated with the given origin URL.
- Returns:
True if the document was removed, False if it could not be found.
- origin_get(url: str) Dict[str, Any] | None #
Returns the full documents associated to the given origin URLs.
Order is arbitrary; unknown origins are not returned.
- origin_search(*, query: str = '', url_pattern: str | None = None, metadata_pattern: str | None = None, with_visit: bool = False, visit_types: List[str] | None = None, min_nb_visits: int = 0, min_last_visit_date: str = '', min_last_eventful_visit_date: str = '', min_last_revision_date: str = '', min_last_release_date: str = '', min_date_created: str = '', min_date_modified: str = '', min_date_published: str = '', programming_languages: List[str] | None = None, licenses: List[str] | None = None, keywords: List[str] | None = None, fork_weight: float | None = 0.5, sort_by: List[str] | None = None, page_token: str | None = None, limit: int = 50) PagedResult[OriginDict, str] #
Searches for origins matching the url_pattern.
- Parameters:
query – Find origins according the queries written as per the swh-search query language syntax, if empty return all origins
url_pattern – Part of the URL to search for, if empty and no filter parameters used return all origins
metadata_pattern – Keywords to look for (across all the fields of “jsonld”)
with_visit – Whether origins with no visits are to be filtered out
visit_types – Only origins having any of the provided visit types (e.g. git, svn, pypi) will be returned
min_nb_visits – Filter origins that have number of visits >= the provided value
min_last_visit_date – Filter origins that have last_visit_date on or after the provided date(ISO format)
min_last_eventful_visit_date – Filter origins that have last_eventful_visit_date (eventful = snapshot_id changed) on or after the provided date(ISO format)
min_last_revision_date – Filter origins that have last_revision_date on or after the provided date(ISO format)
min_last_release_date – Filter origins that have last_release_date on or after the provided date(ISO format)
min_date_created – Filter origins that have date_created from
jsonld
on or after the provided datemin_date_modified – Filter origins that have date_modified from
jsonld
on or after the provided datemin_date_published – Filter origins that have date_published from
jsonld
on or after the provided dateprogramming_languages – Filter origins with programming languages present in the given list (based on instrinsic_metadata)
licenses – Filter origins with licenses present in the given list (based on instrinsic_metadata)
keywords – Filter origins having description/keywords (extracted from instrinsic_metadata) that match given values
fork_weight – Multiplicative factor to apply to all origins known to be forks (<1 penalizes them, >1 boosts them)
sort_by – Sort results based on a list of fields mentioned in SORT_BY_OPTIONS (nb_visits,last_visit_date, last_eventful_visit_date, last_revision_date, last_release_date). Return results in descending order if “-” is present at the beginning otherwise in ascending order.
page_token – Opaque value used for pagination
limit – number of results to return
- Returns:
PagedResult of origin dicts matching the search criteria. If next_page_token is None, there is no longer data to retrieve.
- origin_update(documents: Iterable[OriginDict]) None #
Persist documents to the search backend.