swh.storage.postgresql.db module¶
-
class
swh.storage.postgresql.db.
Db
(conn: psycopg2.extensions.connection, pool: Optional[psycopg2.pool.AbstractConnectionPool] = None)[source]¶ Bases:
swh.core.db.BaseDb
Proxy to the SWH DB, with wrappers around stored procedures
-
current_version
= 164¶
-
register_listener
(notify_queue, cur=None)[source]¶ Register a listener for NOTIFY queue notify_queue
-
content_get_metadata_keys
= ['sha1', 'sha1_git', 'sha256', 'blake2s256', 'length', 'status']¶
-
content_add_keys
= ['sha1', 'sha1_git', 'sha256', 'blake2s256', 'length', 'status', 'ctime']¶
-
skipped_content_keys
= ['sha1', 'sha1_git', 'sha256', 'blake2s256', 'length', 'reason', 'status', 'origin']¶
-
content_get_range
(start, end, limit=None, cur=None)[source]¶ Retrieve contents within range [start, end].
-
content_hash_keys
= ['sha1', 'sha1_git', 'sha256', 'blake2s256']¶
-
snapshot_count_cols
= ['target_type', 'count']¶
-
snapshot_get_cols
= ['snapshot_id', 'name', 'target', 'target_type']¶
-
snapshot_get_by_id
(snapshot_id, branches_from=b'', branches_count=None, target_types=None, cur=None)[source]¶
-
content_find_cols
= ['sha1', 'sha1_git', 'sha256', 'blake2s256', 'length', 'ctime', 'status']¶
-
content_find
(sha1: Optional[bytes] = None, sha1_git: Optional[bytes] = None, sha256: Optional[bytes] = None, blake2s256: Optional[bytes] = None, cur=None)[source]¶ Find the content optionally on a combination of the following checksums sha1, sha1_git, sha256 or blake2s256.
- Parameters
sha1 – sha1 content
git_sha1 – the sha1 computed a la git sha1 of the content
sha256 – sha256 content
blake2s256 – blake2s256 content
- Returns
The tuple (sha1, sha1_git, sha256, blake2s256) if found or None.
-
directory_ls_cols
= ['dir_id', 'type', 'target', 'name', 'perms', 'status', 'sha1', 'sha1_git', 'sha256', 'length']¶
-
directory_entry_get_by_path
(directory, paths, cur=None)[source]¶ Retrieve a directory entry by path.
-
revision_add_cols
= ['id', 'date', 'date_offset', 'date_neg_utc_offset', 'committer_date', 'committer_date_offset', 'committer_date_neg_utc_offset', 'type', 'directory', 'message', 'author_fullname', 'author_name', 'author_email', 'committer_fullname', 'committer_name', 'committer_email', 'metadata', 'synthetic', 'extra_headers']¶
-
revision_get_cols
= ['id', 'date', 'date_offset', 'date_neg_utc_offset', 'committer_date', 'committer_date_offset', 'committer_date_neg_utc_offset', 'type', 'directory', 'message', 'author_fullname', 'author_name', 'author_email', 'committer_fullname', 'committer_name', 'committer_email', 'metadata', 'synthetic', 'extra_headers', 'parents']¶
-
origin_visit_add
(origin, ts, type, cur=None)[source]¶ Add a new origin_visit for origin origin at timestamp ts.
- Parameters
origin – origin concerned by the visit
ts – the date of the visit
type – type of loader for the visit
- Returns
The new visit index step for that origin
-
origin_visit_status_cols
= ['origin', 'visit', 'date', 'status', 'snapshot', 'metadata']¶
-
origin_visit_status_add
(visit_status: swh.model.model.OriginVisitStatus, cur=None) → None[source]¶ Add new origin visit status
-
origin_visit_cols
= ['origin', 'visit', 'date', 'type']¶
-
origin_visit_add_with_id
(origin_visit: swh.model.model.OriginVisit, cur=None) → None[source]¶ Insert origin visit when id are already set
-
origin_visit_get_cols
= ['origin', 'visit', 'date', 'type', 'status', 'metadata', 'snapshot']¶
-
origin_visit_select_cols
= ['o.url AS origin', 'ov.visit', 'ov.date', 'ov.type AS type', 'ovs.status', 'ovs.metadata', 'ovs.snapshot']¶
-
origin_visit_status_select_cols
= ['o.url AS origin', 'ovs.visit', 'ovs.date', 'ovs.status', 'ovs.snapshot', 'ovs.metadata']¶
-
origin_visit_status_get_latest
(origin_url: str, visit: int, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False, cur=None) → Optional[Dict[str, Any]][source]¶ Given an origin visit id, return its latest origin_visit_status
-
origin_visit_status_get_range
(origin: str, visit: int, date_from: Optional[datetime.datetime], order: swh.storage.interface.ListOrder, limit: int, cur=None)[source]¶ Retrieve visit_status rows for visit (origin, visit) in a paginated way.
-
origin_visit_get_range
(origin: str, visit_from: int, order: swh.storage.interface.ListOrder, limit: int, cur=None)[source]¶
-
origin_visit_get
(origin_id, visit_id, cur=None)[source]¶ Retrieve information on visit visit_id of origin origin_id.
- Parameters
origin_id – the origin concerned
visit_id – The visit step for that origin
- Returns
The origin_visit information
-
origin_visit_exists
(origin_id, visit_id, cur=None)[source]¶ Check whether an origin visit with the given ids exists
-
origin_visit_get_latest
(origin_id: str, type: Optional[str], allowed_statuses: Optional[Iterable[str]], require_snapshot: bool, cur=None)[source]¶ Retrieve the most recent origin_visit of the given origin, with optional filters.
- Parameters
origin_id – the origin concerned
type – Optional visit type to filter on
allowed_statuses – the visit statuses allowed for the returned visit
require_snapshot (bool) – If True, only a visit with a known snapshot will be returned.
- Returns
The origin_visit information, or None if no visit matches.
-
origin_visit_get_random
(type, cur=None)[source]¶ Randomly select one origin visit that was full and in the last 3 months
-
revision_shortlog_cols
= ['id', 'parents']¶
-
object_find_by_sha1_git_cols
= ['sha1_git', 'type']¶
-
origin_cols
= ['url']¶
-
origin_get_range_cols
= ['id', 'url']¶
-
origin_get_range
(origin_from: int = 1, origin_count: int = 100, cur=None)[source]¶ Retrieve
origin_count
origins whose ids are greater or equal thanorigin_from
.Origins are sorted by id before retrieving them.
- Parameters
origin_from – the minimum id of origins to retrieve
origin_count – the maximum number of origins to retrieve
-
origin_search
(url_pattern: str, offset: int = 0, limit: int = 50, regexp: bool = False, with_visit: bool = False, cur=None)[source]¶ Search for origins whose urls contain a provided string pattern or match a provided regular expression. The search is performed in a case insensitive way.
- Parameters
url_pattern – the string pattern to search for in origin urls
offset – number of found origins to skip before returning results
limit – the maximum number of found origins to return
regexp – if True, consider the provided pattern as a regular expression and returns origins whose urls match it
with_visit – if True, filter out origins with no visit
-
origin_count
(url_pattern, regexp=False, with_visit=False, cur=None)[source]¶ Count origins whose urls contain a provided string pattern or match a provided regular expression. The pattern search in origin urls is performed in a case insensitive way.
- Parameters
url_pattern (str) – the string pattern to search for in origin urls
regexp (bool) – if True, consider the provided pattern as a regular expression and returns origins whose urls match it
with_visit (bool) – if True, filter out origins with no visit
-
release_add_cols
= ['id', 'target', 'target_type', 'date', 'date_offset', 'date_neg_utc_offset', 'name', 'comment', 'synthetic', 'author_fullname', 'author_name', 'author_email']¶
-
release_get_cols
= ['id', 'target', 'target_type', 'date', 'date_offset', 'date_neg_utc_offset', 'name', 'comment', 'synthetic', 'author_fullname', 'author_name', 'author_email']¶
-
raw_extrinsic_metadata_get_cols
= ['raw_extrinsic_metadata.target', 'raw_extrinsic_metadata.type', 'discovery_date', 'metadata_authority.type', 'metadata_authority.url', 'metadata_fetcher.id', 'metadata_fetcher.name', 'metadata_fetcher.version', 'origin', 'visit', 'snapshot', 'release', 'revision', 'path', 'directory', 'format', 'raw_extrinsic_metadata.metadata']¶ List of columns of the raw_extrinsic_metadata, metadata_authority, and metadata_fetcher tables, used when reading object metadata.
-
raw_extrinsic_metadata_add
(type: str, target: str, discovery_date: datetime.datetime, authority_id: int, fetcher_id: int, format: str, metadata: bytes, origin: Optional[str], visit: Optional[int], snapshot: Optional[str], release: Optional[str], revision: Optional[str], path: Optional[bytes], directory: Optional[str], cur)[source]¶
-
raw_extrinsic_metadata_get
(type: str, target: str, authority_id: int, after_time: Optional[datetime.datetime], after_fetcher: Optional[int], limit: int, cur)[source]¶
-
metadata_fetcher_cols
= ['name', 'version', 'metadata']¶
-
dbversion_cols
= ['version', 'release', 'description']¶
-