swh.storage.postgresql.storage module#
- swh.storage.postgresql.storage.EMPTY_SNAPSHOT_ID = b'\x1a\x88\x93\xe6\xa8oDN\x8b\xe8\xe7\xbd\xa6\xcb4\xfb\x175\xa0\x0e'#
Identifier for the empty snapshot
- swh.storage.postgresql.storage.VALIDATION_EXCEPTIONS = (<class 'KeyError'>, <class 'TypeError'>, <class 'ValueError'>, <class 'psycopg.errors.CheckViolation'>, <class 'psycopg.IntegrityError'>, <class 'psycopg.errors.InvalidTextRepresentation'>, <class 'psycopg.errors.NotNullViolation'>, <class 'psycopg.errors.NumericValueOutOfRange'>, <class 'psycopg.errors.UndefinedFunction'>, <class 'psycopg.errors.ProgramLimitExceeded'>)#
Exceptions raised by postgresql when validation of the arguments failed.
- swh.storage.postgresql.storage.convert_validation_exceptions()[source]#
Catches postgresql errors related to invalid arguments, and re-raises a StorageArgumentException.
- class swh.storage.postgresql.storage.Storage(db: str | Connection[Any], objstorage: Dict | None = None, min_pool_conns: int = 1, max_pool_conns: int = 10, journal_writer: Dict[str, Any] | None = None, query_options: Dict[str, Dict[str, Any]] | None = None)[source]#
Bases:
objectSWH storage datastore proxy, encompassing DB and object storage
Instantiate a storage instance backed by a PostgreSQL database and an objstorage.
When
dbis passed as a connection string, then this module automatically manages a connection pool betweenmin_pool_connsandmax_pool_conns. Whendbis an explicit psycopg connection, thenmin_pool_connsandmax_pool_connsare ignored and the connection is used directly.- Parameters:
db – either a libpq connection string, or a psycopg connection
objstorage – configuration for the backend
ObjStorage; if unset, use a NoopObjStoragemin_pool_conns – min number of connections in the psycopg pool
max_pool_conns – max number of connections in the psycopg pool
journal_writer – configuration for the
JournalWriterquery_options –
configuration for the sql connections; keys of the dict are the method names decorated with
db_transaction()ordb_transaction_generator()(eg.content_find()), and values are dicts (config_name, config_value) used to configure the sql connection for the method_name. For example, using:{"content_get": {"statement_timeout": 5000}}
will override the default statement timeout for the
content_get()endpoint from 500ms to 5000ms.See
swh.core.db.commonfor more details.
- content_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Content, str][source]#
- skipped_content_find(content: HashDict) List[SkippedContent][source]#
- directory_get_entries(directory_id: bytes, page_token: bytes | None = None, limit: int = 1000) PagedResult[DirectoryEntry, str] | None[source]#
- directory_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
- revision_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Revision, str][source]#
- revision_get(revision_ids: List[bytes], ignore_displayname: bool = False) List[Revision | None][source]#
- revision_log(revisions: List[bytes], ignore_displayname: bool = False, limit: int | None = None) Iterable[Dict[str, Any] | None][source]#
- revision_shortlog(revisions: List[bytes], limit: int | None = None) Iterable[Tuple[bytes, Tuple[bytes, ...]] | None][source]#
- extid_get_from_extid(id_type: str, ids: List[bytes], version: int | None = None) List[ExtID][source]#
- extid_get_from_target(target_type: ObjectType, ids: List[bytes], extid_type: str | None = None, extid_version: int | None = None) List[ExtID][source]#
- release_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Release, str][source]#
- snapshot_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str][source]#
- snapshot_count_branches(snapshot_id: bytes, branch_name_exclude_prefix: bytes | None = None) Dict[str | None, int] | None[source]#
- snapshot_get_branches(snapshot_id: bytes, branches_from: bytes = b'', branches_count: int = 1000, target_types: List[str] | None = None, branch_name_include_substring: bytes | None = None, branch_name_exclude_prefix: bytes | None = None) PartialBranches | None[source]#
- snapshot_branch_get_by_name(snapshot_id: bytes, branch_name: bytes, follow_alias_chain: bool = True, max_alias_chain_length: int = 100) SnapshotBranchByNameResponse | None[source]#
- origin_visit_add(visits: List[OriginVisit]) Iterable[OriginVisit][source]#
- origin_visit_status_get_latest(origin_url: str, visit: int, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisitStatus | None[source]#
- origin_visit_get(origin: str, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisit, str][source]#
- origin_visit_get_with_statuses(origin: str, allowed_statuses: List[str] | None = None, require_snapshot: bool = False, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitWithStatuses, str][source]#
- origin_visit_find_by_date(origin: str, visit_date: datetime, type: str | None = None) OriginVisit | None[source]#
- origin_visit_get_latest(origin: str, type: str | None = None, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisit | None[source]#
- origin_visit_status_get(origin: str, visit: int, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitStatus, str][source]#
- origin_visit_status_get_random(type: str) OriginVisitStatus | None[source]#
- origin_search(url_pattern: str, page_token: str | None = None, limit: int = 50, regexp: bool = False, with_visit: bool = False, visit_types: List[str] | None = None) PagedResult[Origin, str][source]#
- raw_extrinsic_metadata_get(target: ExtendedSWHID, authority: MetadataAuthority, after: datetime | None = None, page_token: bytes | None = None, limit: int = 1000) PagedResult[RawExtrinsicMetadata, str][source]#
- raw_extrinsic_metadata_get_authorities(target: ExtendedSWHID) List[MetadataAuthority][source]#
- metadata_authority_get(type: MetadataAuthorityType, url: str) MetadataAuthority | None[source]#
- object_find_recent_references(target_swhid: ExtendedSWHID, limit: int) List[ExtendedSWHID][source]#
- object_delete(swhids: List[ExtendedSWHID]) Dict[str, int][source]#
Delete objects from the storage
All skipped content objects matching the given SWHID will be removed, including those who have the same SWHID due to hash collisions.
Origin objects are removed alongside their associated origin visit and origin visit status objects.
- Parameters:
swhids – list of SWHID of the objects to remove
- Returns:
number of objects removed. Details of each key:
- content:delete
Number of content objects removed
- content:delete:bytes
Sum of the removed contents’ data length
- skipped_content:delete
Number of skipped content objects removed
- directory:delete
Number of directory objects removed
- revision:delete
Number of revision objects removed
- release:delete
Number of release objects removed
- snapshot:delete
Number of snapshot objects removed
- origin:delete
Number of origin objects removed
- origin_visit:delete
Number of origin visit objects removed
- origin_visit_status:delete
Number of origin visit status objects removed
- ori_metadata:delete
Number of raw extrinsic metadata objects targeting an origin that have been removed
- snp_metadata:delete
Number of raw extrinsic metadata objects targeting a snapshot that have been removed
- rev_metadata:delete
Number of raw extrinsic metadata objects targeting a revision that have been removed
- rel_metadata:delete
Number of raw extrinsic metadata objects targeting a release that have been removed
- dir_metadata:delete
Number ef raw extrinsic metadata objects targeting a directory that have been removed
- cnt_metadata:delete
Number of raw extrinsic metadata objects targeting a content that have been removed
- emd_metadata:delete
Number of raw extrinsic metadata objects targeting a raw extrinsic metadata object that have been removed
- Return type:
- extid_delete_for_target(target_swhids: List[CoreSWHID]) Dict[str, int][source]#
Delete ExtID objects from the storage
- Parameters:
target_swhids – list of SWHIDs targeted by the ExtID objects to remove
- Returns:
extid:delete: Number of ExtID objects removed
- Return type:
Summary dict with the following keys and associated values
- object_references_create_partition(year: int, week: int) Tuple[date, date][source]#
Create the partition of the object_references table for the given ISO
yearandweek.
- object_references_drop_partition(partition: ObjectReferencesPartition) None[source]#
Delete the partition of the object_references table for the given partition.
- object_references_list_partitions() List[ObjectReferencesPartition][source]#
List existing partitions of the object_references table, ordered from oldest to the most recent.