swh.indexer.storage package

Module contents

swh.indexer.storage.get_indexer_storage(cls, args)[source]

Get an indexer storage object of class storage_class with arguments storage_args.

Parameters
  • cls (str) – storage’s class, either ‘local’ or ‘remote’

  • args (dict) – dictionary of arguments passed to the storage class constructor

Returns

an instance of swh.indexer’s storage (either local or remote)

Raises

ValueError if passed an unknown storage class.

swh.indexer.storage.check_id_duplicates(data)[source]

If any two dictionaries in data have the same id, raises a ValueError.

Values associated to the key must be hashable.

Parameters

data (List[dict]) – List of dictionaries to be inserted

>>> check_id_duplicates([
...     {'id': 'foo', 'data': 'spam'},
...     {'id': 'bar', 'data': 'egg'},
... ])
>>> check_id_duplicates([
...     {'id': 'foo', 'data': 'spam'},
...     {'id': 'foo', 'data': 'egg'},
... ])
Traceback (most recent call last):
  ...
swh.indexer.storage.exc.DuplicateId: ['foo']
class swh.indexer.storage.IndexerStorage(db, min_pool_conns=1, max_pool_conns=10)[source]

Bases: object

SWH Indexer Storage

get_db()[source]
put_db(db)[source]
check_config(*, check_write)[source]
content_mimetype_missing(mimetypes)[source]
content_mimetype_get_range(start, end, indexer_configuration_id, limit=1000)[source]
content_mimetype_add(mimetypes: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
Add mimetypes to the storage (if conflict_update is True, this will

override existing data if any).

Returns

A dict with the number of new elements added to the storage.

content_mimetype_get(ids)[source]
content_language_missing(languages)[source]
content_language_get(ids)[source]
content_language_add(languages: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_ctags_missing(ctags)[source]
content_ctags_get(ids)[source]
content_ctags_add(ctags: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_fossology_license_get(ids)[source]
content_fossology_license_add(licenses: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
content_fossology_license_get_range(start, end, indexer_configuration_id, limit=1000)[source]
content_metadata_missing(metadata)[source]
content_metadata_get(ids)[source]
content_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
revision_intrinsic_metadata_missing(metadata)[source]
revision_intrinsic_metadata_get(ids)[source]
revision_intrinsic_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
revision_intrinsic_metadata_delete(entries: List[Dict]) → Dict[source]
origin_intrinsic_metadata_get(ids)[source]
origin_intrinsic_metadata_add(metadata: List[Dict], conflict_update: bool = False) → Dict[str, int][source]
origin_intrinsic_metadata_delete(entries: List[Dict]) → Dict[source]
origin_intrinsic_metadata_search_fulltext(conjunction, limit=100)[source]
origin_intrinsic_metadata_search_by_producer(page_token='', limit=100, ids_only=False, mappings=None, tool_ids=None)[source]
origin_intrinsic_metadata_stats()[source]
indexer_configuration_add(tools)[source]
indexer_configuration_get(tool)[source]