swh.indexer.storage.in_memory module¶
-
swh.indexer.storage.in_memory.
check_id_types
(data: List[Dict[str, Any]])[source]¶ Checks all elements of the list have an ‘id’ whose type is ‘bytes’.
-
class
swh.indexer.storage.in_memory.
SubStorage
(row_class: Type[TValue], tools, journal_writer)[source]¶ Bases:
Generic
[swh.indexer.storage.in_memory.TValue
]Implements common missing/get/add logic for each indexer type.
-
missing
(keys: Iterable[Dict]) → List[bytes][source]¶ List data missing from storage.
- Parameters
data (iterable) –
dictionaries with keys:
id (bytes): sha1 identifier
indexer_configuration_id (int): tool used to compute the results
- Yields
missing sha1s
-
get
(ids: Iterable[bytes]) → List[TValue][source]¶ Retrieve data per id.
- Parameters
ids (iterable) – sha1 checksums
- Yields
dict –
dictionaries with the following keys:
id (bytes)
tool (dict): tool used to compute metadata
arbitrary data (as provided to add)
-
get_partition
(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) → swh.core.api.classes.PagedResult[bytes, str][source]¶ Retrieve ids of content with indexer_type within partition partition_id bound by limit.
- Parameters
**indexer_type** – Type of data content to index (mimetype, language, etc…)
**indexer_configuration_id** – The tool used to index data
**partition_id** – index of the partition to fetch
**nb_partitions** – total number of partitions to split into
**page_token** – opaque token used for pagination
**limit** – Limit result (default to 1000)
**with_textual_data** (bool) – Deal with only textual content (True) or all content (all contents by defaults, False)
- Raises
IndexerStorageArgumentException for; –
- limit to None –
- wrong indexer_type provided –
- Returns
PagedResult of Sha1. If next_page_token is None, there is no more data to fetch
-
-
class
swh.indexer.storage.in_memory.
IndexerStorage
(journal_writer=None)[source]¶ Bases:
object
In-memory SWH indexer storage.
-
content_mimetype_get_partition
(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) → swh.core.api.classes.PagedResult[bytes, str][source]¶
-
content_mimetype_add
(mimetypes: List[swh.indexer.storage.model.ContentMimetypeRow]) → Dict[str, int][source]¶
-
content_mimetype_get
(ids: Iterable[bytes]) → List[swh.indexer.storage.model.ContentMimetypeRow][source]¶
-
content_language_get
(ids: Iterable[bytes]) → List[swh.indexer.storage.model.ContentLanguageRow][source]¶
-
content_language_add
(languages: List[swh.indexer.storage.model.ContentLanguageRow]) → Dict[str, int][source]¶
-
content_fossology_license_get
(ids: Iterable[bytes]) → List[swh.indexer.storage.model.ContentLicenseRow][source]¶
-
content_fossology_license_add
(licenses: List[swh.indexer.storage.model.ContentLicenseRow]) → Dict[str, int][source]¶
-
content_fossology_license_get_partition
(indexer_configuration_id: int, partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) → swh.core.api.classes.PagedResult[bytes, str][source]¶
-
content_metadata_get
(ids: Iterable[bytes]) → List[swh.indexer.storage.model.ContentMetadataRow][source]¶
-
content_metadata_add
(metadata: List[swh.indexer.storage.model.ContentMetadataRow]) → Dict[str, int][source]¶
-
revision_intrinsic_metadata_get
(ids: Iterable[bytes]) → List[swh.indexer.storage.model.RevisionIntrinsicMetadataRow][source]¶
-
revision_intrinsic_metadata_add
(metadata: List[swh.indexer.storage.model.RevisionIntrinsicMetadataRow]) → Dict[str, int][source]¶
-
origin_intrinsic_metadata_get
(urls: Iterable[str]) → List[swh.indexer.storage.model.OriginIntrinsicMetadataRow][source]¶
-
origin_intrinsic_metadata_add
(metadata: List[swh.indexer.storage.model.OriginIntrinsicMetadataRow]) → Dict[str, int][source]¶
-
origin_intrinsic_metadata_search_fulltext
(conjunction: List[str], limit: int = 100) → List[swh.indexer.storage.model.OriginIntrinsicMetadataRow][source]¶
-
origin_intrinsic_metadata_search_by_producer
(page_token: str = '', limit: int = 100, ids_only: bool = False, mappings: Optional[List[str]] = None, tool_ids: Optional[List[int]] = None) → swh.core.api.classes.PagedResult[Union[str, swh.indexer.storage.model.OriginIntrinsicMetadataRow], str][source]¶
-