swh.search.elasticsearch module

class swh.search.elasticsearch.ElasticSearch(hosts: List[str])[source]

Bases: object

check()[source]
deinitialize() → None[source]

Removes all indices from the Elasticsearch backend

initialize() → None[source]

Declare Elasticsearch indices and mappings

flush() → None[source]

Blocks until all previous calls to _update() are completely applied.

origin_update(documents: Iterable[dict]) → None[source]
origin_dump() → Iterator[swh.model.model.Origin][source]

Returns all content in Elasticsearch’s index. Not exposed publicly; but useful for tests.

Searches for origins matching the url_pattern.

Parameters
  • url_pattern (str) – Part of thr URL to search for

  • with_visit (bool) – Whether origins with no visit are to be filtered out

  • page_token (str) – Opaque value used for pagination.

  • count (int) – number of results to return.

Returns

  • next_page_token: opaque value used for fetching more results. None if there are no more result.

  • results: list of dictionaries with key: * url: URL of a matching origin

Return type

a dictionary with keys