swh.storage.cassandra.cql module

swh.storage.cassandra.cql.create_keyspace(hosts: List[str], keyspace: str, port: int = 9042, *, durable_writes=True)[source]
class swh.storage.cassandra.cql.CqlRunner(hosts: List[str], keyspace: str, port: int)[source]

Bases: object

Class managing prepared statements and building queries to be sent to Cassandra.

content_add_prepare(content, *, statement) → Tuple[int, Callable[[], None]][source]

Prepares insertion of a Content to the main ‘content’ table. Returns a token (to be used in secondary tables), and a function to be called to perform the insertion in the main table.

content_get_from_pk(content_hashes: Dict[str, bytes], *, statement) → Optional[tuple][source]
content_get_from_token(token, *, statement) → Iterable[tuple][source]
content_get_random(*, statement) → Optional[tuple][source]
content_get_token_range(start: int, end: int, limit: int, *, statement) → Iterable[tuple][source]
content_missing_by_sha1_git(ids: List[bytes], *, statement) → List[bytes][source]
content_index_add_one(algo: str, content: swh.model.model.Content, token: int) → None[source]

Adds a row mapping content[algo] to the token of the Content in the main ‘content’ table.

content_get_tokens_from_single_hash(algo: str, hash_: bytes) → Iterable[int][source]
skipped_content_add_prepare(content, *, statement) → Tuple[int, Callable[[], None]][source]

Prepares insertion of a Content to the main ‘skipped_content’ table. Returns a token (to be used in secondary tables), and a function to be called to perform the insertion in the main table.

skipped_content_get_from_pk(content_hashes: Dict[str, bytes], *, statement) → Optional[tuple][source]
skipped_content_index_add_one(algo: str, content: swh.model.model.SkippedContent, token: int) → None[source]

Adds a row mapping content[algo] to the token of the SkippedContent in the main ‘skipped_content’ table.

revision_missing(ids: List[bytes], *, statement) → List[bytes][source]
revision_add_one(revision: Dict[str, Any], *, statement) → None[source]
revision_get_ids(revision_ids, *, statement) → cassandra.cluster.ResultSet[source]
revision_get(revision_ids, *, statement) → cassandra.cluster.ResultSet[source]
revision_get_random(*, statement) → Optional[tuple][source]
revision_parent_add_one(id_: bytes, parent_rank: int, parent_id: bytes, *, statement) → None[source]
revision_parent_get(revision_id: bytes, *, statement) → cassandra.cluster.ResultSet[source]
release_missing(ids: List[bytes], *, statement) → List[bytes][source]
release_add_one(release: Dict[str, Any], *, statement) → None[source]
release_get(release_ids: List[str], *, statement) → None[source]
release_get_random(*, statement) → Optional[tuple][source]
directory_missing(ids: List[bytes], *, statement) → List[bytes][source]
directory_add_one(directory_id: bytes, *, statement) → None[source]

Called after all calls to directory_entry_add_one, to commit/finalize the directory.

directory_get_random(*, statement) → Optional[tuple][source]
directory_entry_add_one(entry: Dict[str, Any], *, statement) → None[source]
directory_entry_get(directory_ids, *, statement) → cassandra.cluster.ResultSet[source]
snapshot_missing(ids: List[bytes], *, statement) → List[bytes][source]
snapshot_add_one(snapshot_id: bytes, *, statement) → None[source]
snapshot_get(snapshot_id: bytes, *, statement) → cassandra.cluster.ResultSet[source]
snapshot_get_random(*, statement) → Optional[tuple][source]
snapshot_branch_add_one(branch: Dict[str, Any], *, statement) → None[source]
snapshot_count_branches(snapshot_id: bytes, *, statement) → cassandra.cluster.ResultSet[source]
snapshot_branch_get(snapshot_id: bytes, from_: bytes, limit: int, *, statement) → None[source]
origin_keys = ['sha1', 'url', 'type', 'next_visit_id']
origin_add_one(origin: swh.model.model.Origin, *, statement) → None[source]
origin_get_by_sha1(sha1: bytes, *, statement) → cassandra.cluster.ResultSet[source]
origin_get_by_url(url: str) → cassandra.cluster.ResultSet[source]
origin_list(start_token: int, limit: int, *, statement) → cassandra.cluster.ResultSet[source]
origin_iter_all(*, statement) → cassandra.cluster.ResultSet[source]
origin_generate_unique_visit_id(origin_url: str, *, statement) → int[source]
origin_visit_get(origin_url: str, last_visit: Optional[int], limit: Optional[int], order: str = 'asc') → cassandra.cluster.ResultSet[source]
origin_visit_add_one(visit: swh.model.model.OriginVisit, *, statement) → None[source]
origin_visit_status_add_one(visit_update: swh.model.model.OriginVisitStatus, *, statement) → None[source]
origin_visit_status_get_latest(origin: str, visit: int) → Optional[tuple][source]

Given an origin visit id, return its latest origin_visit_status

origin_visit_status_get(origin: str, visit: int, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False, *, statement) → List[tuple][source]

Return all origin visit statuses for a given visit

origin_visit_get_one(origin_url: str, visit_id: int, *, statement) → Optional[tuple][source]
origin_visit_get_all(origin_url: str, *, statement) → cassandra.cluster.ResultSet[source]
origin_visit_iter(start_token: int) → Iterator[tuple][source]

Returns all origin visits in order from this token, and wraps around the token space.

metadata_authority_add(url, type, metadata, *, statement)[source]
metadata_authority_get(type, url, *, statement) → Optional[tuple][source]
metadata_fetcher_add(name, version, metadata, *, statement)[source]
metadata_fetcher_get(name, version, *, statement) → Optional[tuple][source]
object_metadata_add(object_type: str, id: str, authority_type, authority_url, discovery_date, fetcher_name, fetcher_version, format, metadata, context: Dict[str, Union[str, bytes, int]], *, statement)[source]
object_metadata_get_after_date(id: str, authority_type: str, authority_url: str, after: datetime.datetime, *, statement)[source]
object_metadata_get_after_date_and_fetcher(id: str, authority_type: str, authority_url: str, after_date: datetime.datetime, after_fetcher_name: str, after_fetcher_version: str, *, statement)[source]
object_metadata_get(id: str, authority_type: str, authority_url: str, *, statement) → Iterable[tuple][source]
check_read(*, statement)[source]
stat_counters(*, statement) → cassandra.cluster.ResultSet[source]