swh.indexer.storage.in_memory module#
- swh.indexer.storage.in_memory.check_id_types(data: List[Dict[str, Any]])[source]#
Checks all elements of the list have an ‘id’ whose type is ‘bytes’.
- class swh.indexer.storage.in_memory.SubStorage(row_class: Type[TValue], tools, journal_writer)[source]#
Bases:
Generic
[TValue
]Implements common missing/get/add logic for each indexer type.
- missing(keys: Iterable[Dict]) List[bytes] [source]#
List data missing from storage.
- Parameters:
data (iterable) –
dictionaries with keys:
id (bytes): sha1 identifier
indexer_configuration_id (int): tool used to compute the results
- Yields:
missing sha1s
- get(ids: Iterable[bytes]) List[TValue] [source]#
Retrieve data per id.
- Parameters:
ids (iterable) – sha1 checksums
- Yields:
dict –
dictionaries with the following keys:
id (bytes)
tool (dict): tool used to compute metadata
arbitrary data (as provided to add)
- get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str] [source]#
Retrieve ids of content with indexer_type within partition partition_id bound by limit.
- Parameters:
**indexer_type** – Type of data content to index (mimetype, etc…)
**indexer_configuration_id** – The tool used to index data
**partition_id** – index of the partition to fetch
**nb_partitions** – total number of partitions to split into
**page_token** – opaque token used for pagination
**limit** – Limit result (default to 1000)
**with_textual_data** (bool) – Deal with only textual content (True) or all content (all contents by defaults, False)
- Raises:
IndexerStorageArgumentException for; –
- limit to None –
- wrong indexer_type provided –
- Returns:
PagedResult of Sha1. If next_page_token is None, there is no more data to fetch
- class swh.indexer.storage.in_memory.IndexerStorage(journal_writer=None)[source]#
Bases:
object
In-memory SWH indexer storage.
- content_mimetype_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- content_fossology_license_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- directory_intrinsic_metadata_get(ids: Iterable[bytes]) List[DirectoryIntrinsicMetadataRow] [source]#
- directory_intrinsic_metadata_add(metadata: List[DirectoryIntrinsicMetadataRow]) Dict[str, int] [source]#
- origin_intrinsic_metadata_search_fulltext(conjunction: List[str], limit: int = 100) List[OriginIntrinsicMetadataRow] [source]#