swh.indexer.storage.db module#
- class swh.indexer.storage.db.Db(conn: connection, pool: AbstractConnectionPool | None = None)[source]#
Bases:
BaseDb
Proxy to the SWH Indexer DB, with wrappers around stored procedures
create a DB proxy
- Parameters:
conn – psycopg2 connection to the SWH DB
pool – psycopg2 pool of connections
- content_mimetype_hash_keys = ['id', 'indexer_configuration_id']#
- content_mimetype_missing_from_list(mimetypes: Iterable[Dict], cur=None) Iterator[bytes] [source]#
List missing mimetypes.
- content_mimetype_cols = ['id', 'mimetype', 'encoding', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
- content_indexer_names = {'fossology_license': 'content_fossology_license', 'mimetype': 'content_mimetype'}#
- content_get_range(content_type, start, end, indexer_configuration_id, limit=1000, with_textual_data=False, cur=None)[source]#
Retrieve contents with content_type, within range [start, end] bound by limit and associated to the given indexer configuration id.
When asking to work on textual content, that filters on the mimetype table with any mimetype that is not binary.
- content_fossology_license_cols = ['id', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration', 'license']#
- content_metadata_hash_keys = ['id', 'indexer_configuration_id']#
- content_metadata_cols = ['id', 'metadata', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
- directory_intrinsic_metadata_hash_keys = ['id', 'indexer_configuration_id']#
- directory_intrinsic_metadata_cols = ['id', 'metadata', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
- origin_intrinsic_metadata_cols = ['id', 'metadata', 'from_directory', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
- origin_intrinsic_metadata_regconfig = 'pg_catalog.simple'#
The dictionary used to normalize ‘metadata’ and queries. ‘pg_catalog.simple’ provides no stopword, so it should be suitable for proper names and non-English content. When updating this value, make sure to add a new index on origin_intrinsic_metadata.metadata.
- origin_intrinsic_metadata_search_by_producer(last, limit, ids_only, mappings, tool_ids, cur)[source]#
- origin_extrinsic_metadata_cols = ['id', 'metadata', 'from_remd_id', 'mappings', 'tool_id', 'tool_name', 'tool_version', 'tool_configuration']#
- indexer_configuration_cols = ['id', 'tool_name', 'tool_version', 'tool_configuration']#