swh.storage.cassandra.cql module

swh.storage.cassandra.cql.PARTITION_KEY_RESTRICTION_MAX_SIZE = 100

Maximum number of restrictions in a single query. Usually this is a very low number (eg. SELECT … FROM … WHERE x=?), but some queries can request arbitrarily many (eg. SELECT … FROM … WHERE x IN ?).

This can cause performance issues, as the node getting the query need to coordinate with other nodes to get the complete results. See <https://github.com/scylladb/scylla/pull/4797> for details and rationale.

swh.storage.cassandra.cql.get_execution_profiles(consistency_level: str = 'ONE') Dict[object, cassandra.cluster.ExecutionProfile][source]
swh.storage.cassandra.cql.create_keyspace(hosts: List[str], keyspace: str, port: int = 9042, *, durable_writes=True)[source]
class swh.storage.cassandra.cql.CqlRunner(hosts: List[str], keyspace: str, port: int, consistency_level: str)[source]

Bases: object

Class managing prepared statements and building queries to be sent to Cassandra.

MAX_RETRIES = 3
content_add_prepare(content: swh.storage.cassandra.model.ContentRow, *, statement) Tuple[int, Callable[[], None]][source]

Prepares insertion of a Content to the main ‘content’ table. Returns a token (to be used in secondary tables), and a function to be called to perform the insertion in the main table.

content_get_from_pk(content_hashes: Dict[str, bytes], *, statement) Optional[swh.storage.cassandra.model.ContentRow][source]
content_missing_from_hashes(contents_hashes: List[Dict[str, bytes]]) Iterator[Dict[str, bytes]][source]
content_get_from_tokens(tokens, *, statement) Iterable[swh.storage.cassandra.model.ContentRow][source]
content_get_random(*, statement) Optional[swh.storage.cassandra.model.ContentRow][source]
content_get_token_range(start: int, end: int, limit: int, *, statement) Iterable[Tuple[int, swh.storage.cassandra.model.ContentRow]][source]

Returns an iterable of (token, row)

content_missing_by_sha1_git(ids: List[bytes], *, statement) List[bytes][source]
content_index_add_one(algo: str, content: swh.model.model.Content, token: int) None[source]

Adds a row mapping content[algo] to the token of the Content in the main ‘content’ table.

content_get_tokens_from_single_algo(algo: str, hashes: List[bytes]) Iterable[int][source]
skipped_content_add_prepare(content, *, statement) Tuple[int, Callable[[], None]][source]

Prepares insertion of a Content to the main ‘skipped_content’ table. Returns a token (to be used in secondary tables), and a function to be called to perform the insertion in the main table.

skipped_content_get_from_pk(content_hashes: Dict[str, bytes], *, statement) Optional[swh.storage.cassandra.model.SkippedContentRow][source]
skipped_content_get_from_token(token, *, statement) Iterable[swh.storage.cassandra.model.SkippedContentRow][source]
skipped_content_index_add_one(algo: str, content: swh.model.model.SkippedContent, token: int) None[source]

Adds a row mapping content[algo] to the token of the SkippedContent in the main ‘skipped_content’ table.

skipped_content_get_tokens_from_single_hash(algo: str, hash_: bytes) Iterable[int][source]
revision_missing(ids: List[bytes], *, statement) List[bytes][source]
revision_add_one(revision: swh.storage.cassandra.model.RevisionRow, *, statement) None[source]
revision_get_ids(revision_ids, *, statement) Iterable[int][source]
revision_get(revision_ids: List[bytes], *, statement) Iterable[swh.storage.cassandra.model.RevisionRow][source]
revision_get_random(*, statement) Optional[swh.storage.cassandra.model.RevisionRow][source]
revision_parent_add_one(revision_parent: swh.storage.cassandra.model.RevisionParentRow, *, statement) None[source]
revision_parent_get(revision_id: bytes, *, statement) Iterable[bytes][source]
release_missing(ids: List[bytes], *, statement) List[bytes][source]
release_add_one(release: swh.storage.cassandra.model.ReleaseRow, *, statement) None[source]
release_get(release_ids: List[str], *, statement) Iterable[swh.storage.cassandra.model.ReleaseRow][source]
release_get_random(*, statement) Optional[swh.storage.cassandra.model.ReleaseRow][source]
directory_missing(ids: List[bytes], *, statement) List[bytes][source]
directory_add_one(directory: swh.storage.cassandra.model.DirectoryRow, *, statement) None[source]

Called after all calls to directory_entry_add_one, to commit/finalize the directory.

directory_get_random(*, statement) Optional[swh.storage.cassandra.model.DirectoryRow][source]
directory_entry_add_one(entry: swh.storage.cassandra.model.DirectoryEntryRow, *, statement) None[source]
directory_entry_add_concurrent(entries: List[swh.storage.cassandra.model.DirectoryEntryRow], *, statement) None[source]
directory_entry_add_batch(entries: List[swh.storage.cassandra.model.DirectoryEntryRow], *, statement) None[source]
directory_entry_get(directory_ids, *, statement) Iterable[swh.storage.cassandra.model.DirectoryEntryRow][source]
directory_entry_get_from_name(directory_id: bytes, from_: bytes, limit: int, *, statement) Iterable[swh.storage.cassandra.model.DirectoryEntryRow][source]
snapshot_missing(ids: List[bytes], *, statement) List[bytes][source]
snapshot_add_one(snapshot: swh.storage.cassandra.model.SnapshotRow, *, statement) None[source]
snapshot_get_random(*, statement) Optional[swh.storage.cassandra.model.SnapshotRow][source]
snapshot_branch_add_one(branch: swh.storage.cassandra.model.SnapshotBranchRow, *, statement) None[source]
snapshot_count_branches_from_name(snapshot_id: bytes, from_: bytes, *, statement) Dict[Optional[str], int][source]
snapshot_count_branches_before_name(snapshot_id: bytes, before: bytes, *, statement) Dict[Optional[str], int][source]
snapshot_count_branches(snapshot_id: bytes, branch_name_exclude_prefix: Optional[bytes] = None) Dict[Optional[str], int][source]

Returns a dictionary from type names to the number of branches of that type.

snapshot_branch_get_from_name(snapshot_id: bytes, from_: bytes, limit: int, *, statement) Iterable[swh.storage.cassandra.model.SnapshotBranchRow][source]
snapshot_branch_get_range(snapshot_id: bytes, from_: bytes, before: bytes, limit: int, *, statement) Iterable[swh.storage.cassandra.model.SnapshotBranchRow][source]
snapshot_branch_get(snapshot_id: bytes, from_: bytes, limit: int, branch_name_exclude_prefix: Optional[bytes] = None) Iterable[swh.storage.cassandra.model.SnapshotBranchRow][source]
origin_add_one(origin: swh.storage.cassandra.model.OriginRow, *, statement) None[source]
origin_get_by_sha1(sha1: bytes, *, statement) Iterable[swh.storage.cassandra.model.OriginRow][source]
origin_get_by_url(url: str) Iterable[swh.storage.cassandra.model.OriginRow][source]
origin_list(start_token: int, limit: int, *, statement) Iterable[Tuple[int, swh.storage.cassandra.model.OriginRow]][source]

Returns an iterable of (token, origin)

origin_iter_all(*, statement) Iterable[swh.storage.cassandra.model.OriginRow][source]
origin_bump_next_visit_id(origin_url: str, visit_id: int, *, statement) None[source]
origin_generate_unique_visit_id(origin_url: str, *, statement) int[source]
origin_visit_get(origin_url: str, last_visit: Optional[int], limit: int, order: swh.storage.interface.ListOrder, *, statements) Iterable[swh.storage.cassandra.model.OriginVisitRow][source]
origin_visit_add_one(visit: swh.storage.cassandra.model.OriginVisitRow, *, statement) None[source]
origin_visit_get_one(origin_url: str, visit_id: int, *, statement) Optional[swh.storage.cassandra.model.OriginVisitRow][source]
origin_visit_get_all(origin_url: str, *, statement) Iterable[swh.storage.cassandra.model.OriginVisitRow][source]
origin_visit_iter(start_token: int) Iterator[swh.storage.cassandra.model.OriginVisitRow][source]

Returns all origin visits in order from this token, and wraps around the token space.

origin_visit_status_get_range(origin: str, visit: int, date_from: Optional[datetime.datetime], limit: int, order: swh.storage.interface.ListOrder, *, statements) Iterable[swh.storage.cassandra.model.OriginVisitStatusRow][source]
origin_visit_status_add_one(visit_update: swh.storage.cassandra.model.OriginVisitStatusRow, *, statement) None[source]
origin_visit_status_get_latest(origin: str, visit: int) Optional[swh.storage.cassandra.model.OriginVisitStatusRow][source]

Given an origin visit id, return its latest origin_visit_status

origin_visit_status_get(origin: str, visit: int, *, statement) Iterator[swh.storage.cassandra.model.OriginVisitStatusRow][source]

Return all origin visit statuses for a given visit

metadata_authority_add(authority: swh.storage.cassandra.model.MetadataAuthorityRow, *, statement)[source]
metadata_authority_get(type, url, *, statement) Optional[swh.storage.cassandra.model.MetadataAuthorityRow][source]
metadata_fetcher_add(fetcher, *, statement)[source]
metadata_fetcher_get(name, version, *, statement) Optional[swh.storage.cassandra.model.MetadataFetcherRow][source]
raw_extrinsic_metadata_by_id_add(row, *, statement)[source]
raw_extrinsic_metadata_get_by_ids(ids: List[bytes], *, statement) Iterable[swh.storage.cassandra.model.RawExtrinsicMetadataByIdRow][source]
raw_extrinsic_metadata_add(raw_extrinsic_metadata, *, statement)[source]
raw_extrinsic_metadata_get_after_date(target: str, authority_type: str, authority_url: str, after: datetime.datetime, *, statement) Iterable[swh.storage.cassandra.model.RawExtrinsicMetadataRow][source]
raw_extrinsic_metadata_get_after_date_and_id(target: str, authority_type: str, authority_url: str, after_date: datetime.datetime, after_id: bytes, *, statement) Iterable[swh.storage.cassandra.model.RawExtrinsicMetadataRow][source]
raw_extrinsic_metadata_get(target: str, authority_type: str, authority_url: str, *, statement) Iterable[swh.storage.cassandra.model.RawExtrinsicMetadataRow][source]
raw_extrinsic_metadata_get_authorities(target: str, *, statement) Iterable[Tuple[str, str]][source]
extid_add_prepare(extid: swh.storage.cassandra.model.ExtIDRow, *, statement) Tuple[int, Callable[[], None]][source]
extid_get_from_pk(extid_type: str, extid: bytes, extid_version: int, target: swh.model.swhids.CoreSWHID, *, statement) Optional[swh.storage.cassandra.model.ExtIDRow][source]
extid_get_from_token(token: int, *, statement) Iterable[swh.storage.cassandra.model.ExtIDRow][source]
extid_get_from_token_and_extid_version(token: int, extid_version: int, *, statement) Iterable[swh.storage.cassandra.model.ExtIDRow][source]
extid_get_from_extid(extid_type: str, extid: bytes, *, statement) Iterable[swh.storage.cassandra.model.ExtIDRow][source]
extid_get_from_extid_and_version(extid_type: str, extid: bytes, extid_version: int, *, statement) Iterable[swh.storage.cassandra.model.ExtIDRow][source]
extid_get_from_target(target_type: str, target: bytes, extid_type: Optional[str] = None, extid_version: Optional[int] = None) Iterable[swh.storage.cassandra.model.ExtIDRow][source]
extid_index_add_one(row: swh.storage.cassandra.model.ExtIDByTargetRow, *, statement) None[source]

Adds a row mapping extid[target_type, target] to the token of the ExtID in the main ‘extid’ table.

stat_counters() Iterable[swh.storage.cassandra.model.ObjectCountRow][source]
check_read(*, statement)[source]