swh.search.api.client module#

class swh.search.api.client.RemoteSearch(url, api_exception=None, timeout=None, chunk_size=4096, reraise_exceptions=None, **kwargs)[source]#

Bases: RPCClient

Proxy to a remote search API


alias of SearchInterface

reraise_exceptions: ClassVar[List[Type[Exception]]] = [<class 'swh.search.exc.SearchException'>, <class 'swh.search.exc.SearchQuerySyntaxError'>]#

On server errors, if any of the exception classes in this list has the same name as the error name, then the exception will be instantiated and raised instead of a generic RemoteException.


Dedicated method to execute some specific check per implementation.

flush() None#

Blocks until all previous calls to _update() are completely applied.

origin_delete(url: str) bool#

Remove the documents associated with the given origin URL.


True if the document was removed, False if it could not be found.

origin_get(url: str) Dict[str, Any] | None#

Returns the full documents associated to the given origin URLs.

Order is arbitrary; unknown origins are not returned.

Searches for origins matching the url_pattern.

  • query – Find origins according the queries written as per the swh-search query language syntax, if empty return all origins

  • url_pattern – Part of the URL to search for, if empty and no filter parameters used return all origins

  • metadata_pattern – Keywords to look for (across all the fields of “jsonld”)

  • with_visit – Whether origins with no visits are to be filtered out

  • visit_types – Only origins having any of the provided visit types (e.g. git, svn, pypi) will be returned

  • min_nb_visits – Filter origins that have number of visits >= the provided value

  • min_last_visit_date – Filter origins that have last_visit_date on or after the provided date(ISO format)

  • min_last_eventful_visit_date – Filter origins that have last_eventful_visit_date (eventful = snapshot_id changed) on or after the provided date(ISO format)

  • min_last_revision_date – Filter origins that have last_revision_date on or after the provided date(ISO format)

  • min_last_release_date – Filter origins that have last_release_date on or after the provided date(ISO format)

  • min_date_created – Filter origins that have date_created from jsonld on or after the provided date

  • min_date_modified – Filter origins that have date_modified from jsonld on or after the provided date

  • min_date_published – Filter origins that have date_published from jsonld on or after the provided date

  • programming_languages – Filter origins with programming languages present in the given list (based on instrinsic_metadata)

  • licenses – Filter origins with licenses present in the given list (based on instrinsic_metadata)

  • keywords – Filter origins having description/keywords (extracted from instrinsic_metadata) that match given values

  • fork_weight – Multiplicative factor to apply to all origins known to be forks (<1 penalizes them, >1 boosts them)

  • sort_by – Sort results based on a list of fields mentioned in SORT_BY_OPTIONS (nb_visits,last_visit_date, last_eventful_visit_date, last_revision_date, last_release_date). Return results in descending order if “-” is present at the beginning otherwise in ascending order.

  • page_token – Opaque value used for pagination

  • limit – number of results to return


PagedResult of origin dicts matching the search criteria. If next_page_token is None, there is no longer data to retrieve.

origin_update(documents: Iterable[OriginDict]) None#

Persist documents to the search backend.

visit_types_count() Counter#

Returns origin counts per visit type (git, hg, svn, …).