swh.indexer.storage.in_memory module

swh.indexer.storage.in_memory.check_id_types(data: List[Dict[str, Any]])[source]

Checks all elements of the list have an ‘id’ whose type is ‘bytes’.

class swh.indexer.storage.in_memory.SubStorage(tools)[source]

Bases: object

Implements common missing/get/add logic for each indexer type.

missing(ids)[source]

List data missing from storage.

Parameters

data (iterable) –

dictionaries with keys:

  • id (bytes): sha1 identifier

  • indexer_configuration_id (int): tool used to compute the results

Yields

missing sha1s

get(ids)[source]

Retrieve data per id.

Parameters

ids (iterable) – sha1 checksums

Yields

dict

dictionaries with the following keys:

  • id (bytes)

  • tool (dict): tool used to compute metadata

  • arbitrary data (as provided to add)

get_all()[source]
get_range(start, end, indexer_configuration_id, limit)[source]

Retrieve data within range [start, end] bound by limit.

Parameters
  • **start** (bytes) – Starting identifier range (expected smaller than end)

  • **end** (bytes) – Ending identifier range (expected larger than start)

  • **indexer_configuration_id** (int) – The tool used to index data

  • **limit** (int) – Limit result

Raises

IndexerStorageArgumentException for limit to None

Returns

  • ids [bytes]: iterable of content ids within the range.

  • next (Optional[bytes]): The next range of sha1 starts at

    this sha1 if any

Return type

a dict with keys

add(data: List[Dict], conflict_update: bool) → int[source]

Add data not present in storage.

Parameters
  • data (iterable) –

    dictionaries with keys:

    • id: sha1

    • indexer_configuration_id: tool used to compute the results

    • arbitrary data

  • conflict_update (bool) – Flag to determine if we want to overwrite (true) or skip duplicates (false)

add_merge(new_data: List[Dict], conflict_update: bool, merged_key: str) → int[source]
delete(entries: List[Dict]) → int[source]

Delete entries and return the number of entries deleted.

class swh.indexer.storage.in_memory.IndexerStorage[source]

Bases: object

In-memory SWH indexer storage.

check_config(*, check_write)[source]
content_mimetype_missing(mimetypes)[source]
content_mimetype_get_range(start, end, indexer_configuration_id, limit=1000)[source]
content_mimetype_add(mimetypes: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_mimetype_get(ids)[source]
content_language_missing(languages)[source]
content_language_get(ids)[source]
content_language_add(languages: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_ctags_missing(ctags)[source]
content_ctags_get(ids)[source]
content_ctags_add(ctags: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_fossology_license_get(ids)[source]
content_fossology_license_add(licenses: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_fossology_license_get_range(start, end, indexer_configuration_id, limit=1000)[source]
content_metadata_missing(metadata)[source]
content_metadata_get(ids)[source]
content_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
revision_intrinsic_metadata_missing(metadata)[source]
revision_intrinsic_metadata_get(ids)[source]
revision_intrinsic_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
revision_intrinsic_metadata_delete(entries: List[Dict]) → Dict[source]
origin_intrinsic_metadata_get(ids)[source]
origin_intrinsic_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
origin_intrinsic_metadata_delete(entries: List[Dict]) → Dict[source]
origin_intrinsic_metadata_search_fulltext(conjunction, limit=100)[source]
origin_intrinsic_metadata_search_by_producer(page_token='', limit=100, ids_only=False, mappings=None, tool_ids=None)[source]
origin_intrinsic_metadata_stats()[source]
indexer_configuration_add(tools)[source]
indexer_configuration_get(tool)[source]