swh.indexer.storage.in_memory module#

swh.indexer.storage.in_memory.check_id_types(data: List[Dict[str, Any]])[source]#: Checks all elements of the list have an ‘id’ whose type is ‘bytes’.

class swh.indexer.storage.in_memory.SubStorage(row_class: Type[TValue], tools, journal_writer)[source]#

Bases: Generic[TValue]

Implements common missing/get/add logic for each indexer type.

missing(keys: Iterable[Dict]) → List[bytes][source]#

List data missing from storage.

Parameters:

data (iterable) –

dictionaries with keys:

id (bytes): sha1 identifier
indexer_configuration_id (int): tool used to compute the results

Yields:

missing sha1s

get(ids: Iterable[bytes]) → List[TValue][source]#

Retrieve data per id.

Parameters:

ids (iterable) – sha1 checksums

Yields:

dict –

dictionaries with the following keys:

id (bytes)

tool (dict): tool used to compute metadata

arbitrary data (as provided to add)

get_all() → List[TValue][source]#

get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) → PagedResult[bytes, str][source]#

Retrieve ids of content with indexer_type within partition partition_id bound by limit.

Parameters:

**indexer_type** – Type of data content to index (mimetype, etc…)
**indexer_configuration_id** – The tool used to index data
**partition_id** – index of the partition to fetch
**nb_partitions** – total number of partitions to split into
**page_token** – opaque token used for pagination
**limit** – Limit result (default to 1000)
**with_textual_data** (bool) – Deal with only textual content (True) or all content (all contents by defaults, False)

Raises:

IndexerStorageArgumentException for; –
- limit to None –
- wrong indexer_type provided –

Returns:

PagedResult of Sha1. If next_page_token is None, there is no more data to fetch

add(data: Iterable[TValue]) → int[source]#

Add data not present in storage.

Parameters:

data (iterable) –

dictionaries with keys:

id: sha1
indexer_configuration_id: tool used to compute the results
arbitrary data

class swh.indexer.storage.in_memory.IndexerStorage(journal_writer=None)[source]#

Bases: object

In-memory SWH indexer storage.

check_config(*, check_write)[source]#

content_mimetype_missing(mimetypes: Iterable[Dict]) → List[Tuple[bytes, int]][source]#

content_mimetype_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) → PagedResult[bytes, str][source]#

content_mimetype_add(mimetypes: List[ContentMimetypeRow]) → Dict[str, int][source]#

content_mimetype_get(ids: Iterable[bytes]) → List[ContentMimetypeRow][source]#

content_fossology_license_get(ids: Iterable[bytes]) → List[ContentLicenseRow][source]#

content_fossology_license_add(licenses: List[ContentLicenseRow]) → Dict[str, int][source]#

content_fossology_license_get_partition(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) → PagedResult[bytes, str][source]#

content_metadata_missing(metadata: Iterable[Dict]) → List[Tuple[bytes, int]][source]#

content_metadata_get(ids: Iterable[bytes]) → List[ContentMetadataRow][source]#

content_metadata_add(metadata: List[ContentMetadataRow]) → Dict[str, int][source]#

directory_intrinsic_metadata_missing(metadata: Iterable[Dict]) → List[Tuple[bytes, int]][source]#

directory_intrinsic_metadata_get(ids: Iterable[bytes]) → List[DirectoryIntrinsicMetadataRow][source]#

directory_intrinsic_metadata_add(metadata: List[DirectoryIntrinsicMetadataRow]) → Dict[str, int][source]#

origin_intrinsic_metadata_get(urls: Iterable[str]) → List[OriginIntrinsicMetadataRow][source]#

origin_intrinsic_metadata_add(metadata: List[OriginIntrinsicMetadataRow]) → Dict[str, int][source]#

origin_intrinsic_metadata_search_fulltext(conjunction: List[str], limit: int = 100) → List[OriginIntrinsicMetadataRow][source]#

origin_intrinsic_metadata_search_by_producer(page_token: str = '', limit: int = 100, ids_only: bool = False, mappings: List[str] | None = None, tool_ids: List[int] | None = None) → PagedResult[str | OriginIntrinsicMetadataRow, str][source]#

origin_intrinsic_metadata_stats()[source]#

origin_extrinsic_metadata_get(urls: Iterable[str]) → List[OriginExtrinsicMetadataRow][source]#

origin_extrinsic_metadata_add(metadata: List[OriginExtrinsicMetadataRow]) → Dict[str, int][source]#

indexer_configuration_add(tools)[source]#

indexer_configuration_get(tool)[source]#

swh.indexer.storage.in_memory module#

This Page