swh.indexer.storage.db module#

swh.indexer.storage.db.execute_values_generator(cur: Cursor, query: str, values: Iterable[Any]) Iterator[Any][source]#
class swh.indexer.storage.db.Db(conn: Connection[Any], pool: ConnectionPool | None = None)[source]#

Bases: BaseDb

Proxy to the SWH Indexer DB, with wrappers around stored procedures

create a DB proxy

Parameters:
  • conn – psycopg connection to the SWH DB

  • pool – psycopg pool of connections

content_mimetype_hash_keys = ['id', 'indexer_configuration_id']#
content_mimetype_missing_from_list(mimetypes: Iterable[Dict], cur=None) Iterator[bytes][source]#

List missing mimetypes.

content_mimetype_cols = ['id', 'mimetype', 'encoding', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
mktemp_content_mimetype(cur=None)[source]#
content_mimetype_add_from_temp(cur=None)[source]#
content_indexer_names = {'fossology_license': 'content_fossology_license', 'mimetype': 'content_mimetype'}#
content_get_range(content_type, start, end, indexer_configuration_id, limit=1000, with_textual_data=False, cur=None)[source]#

Retrieve contents with content_type, within range [start, end] bound by limit and associated to the given indexer configuration id.

When asking to work on textual content, that filters on the mimetype table with any mimetype that is not binary.

content_mimetype_get_from_list(ids, cur=None)[source]#
content_fossology_license_cols = ['id', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration', 'license']#
mktemp_content_fossology_license(cur=None)[source]#
content_fossology_license_add_from_temp(cur=None)[source]#

Add new licenses per content.

content_fossology_license_get_from_list(ids, cur=None)[source]#

Retrieve licenses per id.

content_metadata_hash_keys = ['id', 'indexer_configuration_id']#
content_metadata_missing_from_list(metadata, cur=None)[source]#

List missing metadata.

content_metadata_cols = ['id', 'metadata', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
mktemp_content_metadata(cur=None)[source]#
content_metadata_add_from_temp(cur=None)[source]#
content_metadata_get_from_list(ids, cur=None)[source]#
directory_intrinsic_metadata_hash_keys = ['id', 'indexer_configuration_id']#
directory_intrinsic_metadata_missing_from_list(metadata, cur=None)[source]#

List missing metadata.

directory_intrinsic_metadata_cols = ['id', 'metadata', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
mktemp_directory_intrinsic_metadata(cur=None)[source]#
directory_intrinsic_metadata_add_from_temp(cur=None)[source]#
directory_intrinsic_metadata_get_from_list(ids, cur=None)[source]#
directory_intrinsic_metadata_filter_by_tool(ids, tool_id, cur=None)[source]#

Return the filtered by tool list of DirectoryIntrinsicMetadata

origin_intrinsic_metadata_cols = ['id', 'metadata', 'from_directory', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
origin_intrinsic_metadata_regconfig = 'pg_catalog.simple'#

The dictionary used to normalize ‘metadata’ and queries. ‘pg_catalog.simple’ provides no stopword, so it should be suitable for proper names and non-English content. When updating this value, make sure to add a new index on origin_intrinsic_metadata.metadata.

mktemp_origin_intrinsic_metadata(cur=None)[source]#
origin_intrinsic_metadata_add_from_temp(cur=None)[source]#
origin_intrinsic_metadata_get_from_list(ids, cur=None)[source]#
origin_intrinsic_metadata_filter_by_tool(ids, tool_id, cur=None)[source]#

Return the filtered list of DirectoryIntrinsicMetadata by tool

origin_intrinsic_metadata_search_fulltext(terms, *, limit, cur)[source]#
origin_intrinsic_metadata_search_by_producer(last, limit, ids_only, mappings, tool_ids, cur)[source]#
origin_extrinsic_metadata_cols = ['id', 'metadata', 'from_remd_id', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
mktemp_origin_extrinsic_metadata(cur=None)[source]#
origin_extrinsic_metadata_add_from_temp(cur=None)[source]#
origin_extrinsic_metadata_get_from_list(ids, cur=None)[source]#
indexer_configuration_cols = ['id', 'tool_name', 'tool_version', 'tool_configuration']#
mktemp_indexer_configuration(cur=None)[source]#
indexer_configuration_add_from_temp(cur=None)[source]#
indexer_configuration_get(tool_name, tool_version, tool_configuration, cur=None)[source]#
indexer_configuration_get_from_id(id_, cur=None)[source]#