swh.storage.cassandra.storage module#
- class swh.storage.cassandra.storage.CassandraStorage(hosts, keyspace, objstorage=None, port=9042, journal_writer=None, allow_overwrite=False, consistency_level='ONE', directory_entries_insert_algo='one-by-one', auth_provider: Optional[Dict] = None)[source]#
Bases:
object
A backend of swh-storage backed by Cassandra
- Parameters:
hosts – Seed Cassandra nodes, to start connecting to the cluster
keyspace – Name of the Cassandra database to use
objstorage – Passed as argument to
ObjStorage
; if unset, use a NoopObjStorageport – Cassandra port
journal_writer – Passed as argument to
JournalWriter
allow_overwrite – Whether
*_add
functions will check if an object already exists in the database before sending it in an INSERT.False
is the default as it is more efficient when there is a moderately high probability the object is already known, butTrue
can be useful to overwrite existing objects (eg. when applying a schema update), or when the database is known to be mostly empty. Note that aFalse
value does not guarantee there won’t be any overwrite.consistency_level – The default read/write consistency to use
directory_entries_insert_algo – Must be one of: * one-by-one: naive, one INSERT per directory entry, serialized * concurrent: one INSERT per directory entry, concurrent * batch: using UNLOGGED BATCH to insert many entries in a few statements
auth_provider –
An optional dict describing the authentication provider to use. Must contain at least a
cls
entry and the parameters to pass to the constructor. For example:auth_provider: cls: cassandra.auth.PlainTextAuthProvider username: myusername password: mypassword
- content_get_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) PagedResult[Content, str] [source]#
- skipped_content_find(content: HashDict) List[SkippedContent] [source]#
- directory_entry_get_by_path(directory: bytes, paths: List[bytes]) Optional[Dict[str, Any]] [source]#
- directory_get_entries(directory_id: bytes, page_token: Optional[bytes] = None, limit: int = 1000) Optional[PagedResult[DirectoryEntry, str]] [source]#
- directory_get_id_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- revision_get(revision_ids: List[bytes], ignore_displayname: bool = False) List[Optional[Revision]] [source]#
- revision_get_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) PagedResult[Revision, str] [source]#
- revision_log(revisions: List[bytes], ignore_displayname: bool = False, limit: Optional[int] = None) Iterable[Optional[Dict[str, Any]]] [source]#
- revision_shortlog(revisions: List[bytes], limit: Optional[int] = None) Iterable[Optional[Tuple[bytes, Tuple[bytes, ...]]]] [source]#
- release_get(releases: List[bytes], ignore_displayname: bool = False) List[Optional[Release]] [source]#
- release_get_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) PagedResult[Release, str] [source]#
- snapshot_get_id_partition(partition_id: int, nb_partitions: int, page_token: Optional[str] = None, limit: int = 1000) PagedResult[bytes, str] [source]#
- snapshot_count_branches(snapshot_id: bytes, branch_name_exclude_prefix: Optional[bytes] = None) Optional[Dict[Optional[str], int]] [source]#
- snapshot_get_branches(snapshot_id: bytes, branches_from: bytes = b'', branches_count: int = 1000, target_types: Optional[List[str]] = None, branch_name_include_substring: Optional[bytes] = None, branch_name_exclude_prefix: Optional[bytes] = None) Optional[PartialBranches] [source]#
- snapshot_branch_get_by_name(snapshot_id: bytes, branch_name: bytes, follow_alias_chain: bool = True, max_alias_chain_length: int = 100) Optional[SnapshotBranchByNameResponse] [source]#
- origin_get_one(origin_url: str) Optional[Origin] [source]#
Given an origin url, return the origin if it exists, None otherwise
- origin_search(url_pattern: str, page_token: Optional[str] = None, limit: int = 50, regexp: bool = False, with_visit: bool = False, visit_types: Optional[List[str]] = None) PagedResult[Origin, str] [source]#
- origin_visit_add(visits: List[OriginVisit]) Iterable[OriginVisit] [source]#
- origin_visit_get(origin: str, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisit, str] [source]#
- origin_visit_get_with_statuses(origin: str, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitWithStatuses, str] [source]#
- origin_visit_status_get(origin: str, visit: int, page_token: Optional[str] = None, order: ListOrder = ListOrder.ASC, limit: int = 10) PagedResult[OriginVisitStatus, str] [source]#
- origin_visit_get_latest(origin: str, type: Optional[str] = None, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False) Optional[OriginVisit] [source]#
- origin_visit_status_get_latest(origin_url: str, visit: int, allowed_statuses: Optional[List[str]] = None, require_snapshot: bool = False) Optional[OriginVisitStatus] [source]#
- origin_visit_status_get_random(type: str) Optional[OriginVisitStatus] [source]#
- raw_extrinsic_metadata_get(target: ExtendedSWHID, authority: MetadataAuthority, after: Optional[datetime] = None, page_token: Optional[bytes] = None, limit: int = 1000) PagedResult[RawExtrinsicMetadata, str] [source]#
- raw_extrinsic_metadata_get_authorities(target: ExtendedSWHID) List[MetadataAuthority] [source]#
- metadata_authority_get(type: MetadataAuthorityType, url: str) Optional[MetadataAuthority] [source]#
- extid_get_from_extid(id_type: str, ids: List[bytes], version: Optional[int] = None) List[ExtID] [source]#
- extid_get_from_target(target_type: ObjectType, ids: List[bytes], extid_type: Optional[str] = None, extid_version: Optional[int] = None) List[ExtID] [source]#
- object_find_recent_references(target_swhid: ExtendedSWHID, limit: int) List[ExtendedSWHID] [source]#