swh.storage.postgresql.storage module#
- swh.storage.postgresql.storage.EMPTY_SNAPSHOT_ID = b'\x1a\x88\x93\xe6\xa8oDN\x8b\xe8\xe7\xbd\xa6\xcb4\xfb\x175\xa0\x0e'#
Identifier for the empty snapshot
- swh.storage.postgresql.storage.VALIDATION_EXCEPTIONS = (<class 'KeyError'>, <class 'TypeError'>, <class 'ValueError'>, <class 'psycopg2.errors.CheckViolation'>, <class 'psycopg2.IntegrityError'>, <class 'psycopg2.errors.InvalidTextRepresentation'>, <class 'psycopg2.errors.NotNullViolation'>, <class 'psycopg2.errors.NumericValueOutOfRange'>, <class 'psycopg2.errors.UndefinedFunction'>, <class 'psycopg2.errors.ProgramLimitExceeded'>)#
Exceptions raised by postgresql when validation of the arguments failed.
- swh.storage.postgresql.storage.convert_validation_exceptions()[source]#
Catches postgresql errors related to invalid arguments, and re-raises a StorageArgumentException.
- class swh.storage.postgresql.storage.Storage(db, objstorage=None, min_pool_conns=1, max_pool_conns=10, journal_writer=None, query_options=None)[source]#
Bases:
object
SWH storage datastore proxy, encompassing DB and object storage
Instantiate a storage instance backed by a PostgreSQL database and an objstorage.
When
db
is passed as a connection string, then this module automatically manages a connection pool betweenmin_pool_conns
andmax_pool_conns
. Whendb
is an explicit psycopg2 connection, thenmin_pool_conns
andmax_pool_conns
are ignored and the connection is used directly.- Parameters:
db – either a libpq connection string, or a psycopg2 connection
objstorage – configuration for the backend
ObjStorage
; if unset, use a NoopObjStoragemin_pool_conns – min number of connections in the psycopg2 pool
max_pool_conns – max number of connections in the psycopg2 pool
journal_writer – configuration for the
JournalWriter
query_options –
configuration for the sql connections; keys of the dict are the method names decorated with
db_transaction()
ordb_transaction_generator()
(eg.content_find()
), and values are dicts (config_name, config_value) used to configure the sql connection for the method_name. For example, using:{"content_get": {"statement_timeout": 5000}}
will override the default statement timeout for the
content_get()
endpoint from 500ms to 5000ms.See
swh.core.db.common
for more details.
- content_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Content, str] [source]#
- skipped_content_find(content: HashDict) List[SkippedContent] [source]#
- directory_get_entries(directory_id: bytes, page_token: bytes | None = None, limit: int = 1000) PagedResult[DirectoryEntry, str] | None [source]#
- directory_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- revision_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Revision, str] [source]#
- revision_get(revision_ids: List[bytes], ignore_displayname: bool = False) List[Revision | None] [source]#
- revision_log(revisions: List[bytes], ignore_displayname: bool = False, limit: int | None = None) Iterable[Dict[str, Any] | None] [source]#
- revision_shortlog(revisions: List[bytes], limit: int | None = None) Iterable[Tuple[bytes, Tuple[bytes, ...]] | None] [source]#
- extid_get_from_extid(id_type: str, ids: List[bytes], version: int | None = None) List[ExtID] [source]#
- extid_get_from_target(target_type: ObjectType, ids: List[bytes], extid_type: str | None = None, extid_version: int | None = None) List[ExtID] [source]#
- release_get_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[Release, str] [source]#
- snapshot_get_id_partition(partition_id: int, nb_partitions: int, page_token: str | None = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- snapshot_count_branches(snapshot_id: bytes, branch_name_exclude_prefix: bytes | None = None) Dict[str | None, int] | None [source]#
- snapshot_get_branches(snapshot_id: bytes, branches_from: bytes = b'', branches_count: int = 1000, target_types: List[str] | None = None, branch_name_include_substring: bytes | None = None, branch_name_exclude_prefix: bytes | None = None) PartialBranches | None [source]#
- snapshot_branch_get_by_name(snapshot_id: bytes, branch_name: bytes, follow_alias_chain: bool = True, max_alias_chain_length: int = 100) SnapshotBranchByNameResponse | None [source]#
- origin_visit_add(visits: List[OriginVisit]) Iterable[OriginVisit] [source]#
- origin_visit_status_get_latest(origin_url: str, visit: int, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisitStatus | None [source]#
- origin_visit_get(origin: str, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisit, str] [source]#
- origin_visit_get_with_statuses(origin: str, allowed_statuses: List[str] | None = None, require_snapshot: bool = False, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitWithStatuses, str] [source]#
- origin_visit_find_by_date(origin: str, visit_date: datetime, type: str | None = None) OriginVisit | None [source]#
- origin_visit_get_latest(origin: str, type: str | None = None, allowed_statuses: List[str] | None = None, require_snapshot: bool = False) OriginVisit | None [source]#
- origin_visit_status_get(origin: str, visit: int, page_token: str | None = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitStatus, str] [source]#
- origin_visit_status_get_random(type: str) OriginVisitStatus | None [source]#
- origin_search(url_pattern: str, page_token: str | None = None, limit: int = 50, regexp: bool = False, with_visit: bool = False, visit_types: List[str] | None = None) PagedResult[Origin, str] [source]#
- raw_extrinsic_metadata_get(target: ExtendedSWHID, authority: MetadataAuthority, after: datetime | None = None, page_token: bytes | None = None, limit: int = 1000) PagedResult[RawExtrinsicMetadata, str] [source]#
- raw_extrinsic_metadata_get_authorities(target: ExtendedSWHID) List[MetadataAuthority] [source]#
- metadata_authority_get(type: MetadataAuthorityType, url: str) MetadataAuthority | None [source]#
- object_find_recent_references(target_swhid: ExtendedSWHID, limit: int) List[ExtendedSWHID] [source]#
- object_delete(swhids: List[ExtendedSWHID]) Dict[str, int] [source]#
Delete objects from the storage
All skipped content objects matching the given SWHID will be removed, including those who have the same SWHID due to hash collisions.
Origin objects are removed alongside their associated origin visit and origin visit status objects.
- Parameters:
swhids – list of SWHID of the objects to remove
- Returns:
content:delete: Number of content objects removed content:delete:bytes: Sum of the removed contents’ data length skipped_content:delete: Number of skipped content objects removed directory:delete: Number of directory objects removed revision:delete: Number of revision objects removed release:delete: Number of release objects removed snapshot:delete: Number of snapshot objects removed origin:delete: Number of origin objects removed origin_visit:delete: Number of origin visit objects removed origin_visit_status:delete: Number of origin visit status objects removed
- Return type:
Summary dict with the following keys and associated values