Software Heritage - Storage#
Abstraction layer over the archive, allowing to access all stored source code artifacts as well as their metadata.
Quick start#
Dependencies#
Python tests for this module include tests that cannot be run without a local Postgresql database, so you need the Postgresql server executable on your machine (no need to have a running Postgresql server). They also expect a cassandra server.
Debian-like host#
$ sudo apt install libpq-dev postgresql-11 cassandra
Non Debian-like host#
The tests expect the path to cassandra
to either be unspecified, it is then
looked up at /usr/sbin/cassandra
, either specified through the environment
variable SWH_CASSANDRA_BIN
.
Optionally, you can avoid running the cassandra tests.
(swh) :~/swh-storage$ tox -- -m 'not cassandra'
Installation#
It is strongly recommended to use a virtualenv. In the following, we
consider you work in a virtualenv named swh
. See the
developer setup guide
for a more details on how to setup a working environment.
You can install the package directly from pypi:
(swh) :~$ pip install swh.storage
[...]
Or from sources:
(swh) :~$ git clone https://forge.softwareheritage.org/source/swh-storage.git
[...]
(swh) :~$ cd swh-storage
(swh) :~/swh-storage$ pip install .
[...]
Then you can check it’s properly installed:
(swh) :~$ swh storage --help
Usage: swh storage [OPTIONS] COMMAND [ARGS]...
Software Heritage Storage tools.
Options:
-h, --help Show this message and exit.
Commands:
rpc-serve Software Heritage Storage RPC server.
Tests#
The best way of running Python tests for this module is to use tox.
(swh) :~$ pip install tox
tox#
From the sources directory, simply use tox:
(swh) :~/swh-storage$ tox
[...]
========= 315 passed, 6 skipped, 15 warnings in 40.86 seconds ==========
_______________________________ summary ________________________________
flake8: commands succeeded
py3: commands succeeded
congratulations :)
Note: it is possible to set the JAVA_HOME
environment variable to specify the
version of the JVM to be used by Cassandra. For example, at the time of writing
this, Cassandra is meant to be run with Java 11. On Debian bookworm, one needs
to manually install openjdk-11-jre-headless from bullseye or unstable and
set the appropriate environment variable:
(swh) :~/swh-storage$ export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64
(swh) :~/swh-storage$ tox
[...]
Development#
The storage server can be locally started. It requires a configuration file and a running Postgresql database.
Sample configuration#
A typical configuration storage.yml
file is:
storage:
cls: postgresql
db: "dbname=softwareheritage-dev user=<user> password=<pwd>"
objstorage:
cls: pathslicing
root: /tmp/swh-storage/
slicing: 0:2/2:4/4:6
which means, this uses:
a local storage instance whose db connection is to
softwareheritage-dev
local instance,the objstorage uses a local objstorage instance whose:
root
path is /tmp/swh-storage,slicing scheme is
0:2/2:4/4:6
. This means that the identifier of the content (sha1) which will be stored on disk at first level with the first 2 hex characters, the second level with the next 2 hex characters and the third level with the next 2 hex characters. And finally the complete hash file holding the raw content. For example:00062f8bd330715c4f819373653d97b3cd34394c
will be stored at00/06/2f/00062f8bd330715c4f819373653d97b3cd34394c
Note that the root
path should exist on disk before starting the server.
Starting the storage server#
If the python package has been properly installed (e.g. in a virtual env), you should be able to use the command:
(swh) :~/swh-storage$ swh storage -C storage.yml rpc-serve
This runs a local swh-storage api at 5002 port.
(swh) :~/swh-storage$ curl http://127.0.0.1:5002
<html>
<head><title>Software Heritage storage server</title></head>
<body>
<p>You have reached the
<a href="https://www.softwareheritage.org/">Software Heritage</a>
storage server.<br />
See its
<a href="https://docs.softwareheritage.org/devel/swh-storage/">documentation
and API</a> for more information</p>
And then what?#
In your upper layer (loader-git, loader-svn, etc…), you can define a remote storage with this snippet of yaml configuration.
storage:
cls: remote
url: http://localhost:5002/
You could directly define a postgresql storage with the following snippet:
storage:
cls: postgresql
db: service=swh-dev
objstorage:
cls: pathslicing
root: /home/storage/swh-storage/
slicing: 0:2/2:4/4:6
Cassandra#
As an alternative to PostgreSQL, swh-storage can use Cassandra as a database backend. It can be used like this:
storage:
cls: cassandra
hosts:
- localhost
keyspace: swh
objstorage:
cls: pathslicing
root: /home/storage/swh-storage/
slicing: 0:2/2:4/4:6
The Cassandra swh-storage implementation supports both Cassandra >= 4.0-alpha2 and ScyllaDB >= 4.4 (and possibly earlier versions, but this is untested).
While the main code supports both transparently, running tests
or configuring the schema requires specific code when using ScyllaDB,
enabled by setting the SWH_USE_SCYLLADB=1
environment variable.
The Software Heritage storage consist of a high-level storage layer
(swh.storage
) that exposes a client/server API
(swh.storage.api
). The API is exposed by a server
(swh.storage.api.server
) and accessible via a client
(swh.storage.api.client
).
The low-level implementation of the storage is split between an object storage
(swh.objstorage), which stores all “blobs” (i.e., the
leaves of the Data model) and a SQL representation of the rest of the
graph (swh.storage.storage
).
Using swh-storage
#
First, note that swh-storage
is an internal API of Software Heritage, that
is only available to software running on the SWH infrastructure and developers
running their own Software Heritage.
If you want to access the Software Heritage archive without running your own,
you should use the Web API instead.
As swh-storage
has multiple backends, it is instantiated via the
swh.storage.get_storage()
function, which takes as argument the
backend type (usually remote
, if you already have access to a running
swh-storage).
It returns an instance of a class implementing
swh.storage.interface.StorageInterface
; which is mostly a set of
key-value stores, one for each object type.
Many of the arguments and return types are “model objects”, ie. immutable
objects that are instances of the classes defined in swh.model.model
.
Methods returning long lists of arguments are paginated; by returning both a
list of results and an opaque token to get the next page of results. For
example, to list all the visits of an origin using origin_visit_get
ten
visits at a time, you can do:
storage = get_storage("remote", url="http://localhost:5002")
while True:
page = storage.origin_visit_get(origin="https://github.com/torvalds/linux")
for visit in page.results:
print(visit)
if page.next_page_token is None:
break
Or, using swh.core.api.classes.stream_results()
for convenience:
storage = get_storage("remote", url="http://localhost:5002")
visits = stream_results(
storage.origin_visit_get, origin="https://github.com/torvalds/linux"
)
for visit in visits:
print(visit)
Database schema#
Archive copies#
Specifications#
- Extrinsic metadata specification
- Object Masking
- swh.storage package
- Subpackages
- swh.storage.algos package
- Submodules
- swh.storage.algos.diff module
- swh.storage.algos.dir_iterators module
DirectoryIterator
dir_iterator()
Remaining
DoubleDirectoryIterator
DoubleDirectoryIterator.restart()
DoubleDirectoryIterator.next_from()
DoubleDirectoryIterator.next_to()
DoubleDirectoryIterator.next_both()
DoubleDirectoryIterator.step_from()
DoubleDirectoryIterator.step_to()
DoubleDirectoryIterator.step_both()
DoubleDirectoryIterator.remaining()
DoubleDirectoryIterator.compare()
- swh.storage.algos.directory module
- swh.storage.algos.discovery module
- swh.storage.algos.origin module
- swh.storage.algos.revisions_walker module
- swh.storage.algos.snapshot module
- Module contents
- Submodules
- swh.storage.api package
- Submodules
- swh.storage.api.client module
RemoteStorage
RemoteStorage.api_exception
RemoteStorage.backend_class
RemoteStorage.reraise_exceptions
RemoteStorage.extra_type_decoders
RemoteStorage.extra_type_encoders
RemoteStorage.raise_for_status()
RemoteStorage.content_add()
RemoteStorage.reset()
RemoteStorage.stat_counters()
RemoteStorage.refresh_stat_counters()
RemoteStorage.check_config()
RemoteStorage.clear_buffers()
RemoteStorage.content_add_metadata()
RemoteStorage.content_find()
RemoteStorage.content_get()
RemoteStorage.content_get_data()
RemoteStorage.content_get_partition()
RemoteStorage.content_get_random()
RemoteStorage.content_missing()
RemoteStorage.content_missing_per_sha1()
RemoteStorage.content_missing_per_sha1_git()
RemoteStorage.content_update()
RemoteStorage.directory_add()
RemoteStorage.directory_entry_get_by_path()
RemoteStorage.directory_get_entries()
RemoteStorage.directory_get_id_partition()
RemoteStorage.directory_get_random()
RemoteStorage.directory_get_raw_manifest()
RemoteStorage.directory_ls()
RemoteStorage.directory_missing()
RemoteStorage.extid_add()
RemoteStorage.extid_get_from_extid()
RemoteStorage.extid_get_from_target()
RemoteStorage.flush()
RemoteStorage.metadata_authority_add()
RemoteStorage.metadata_authority_get()
RemoteStorage.metadata_fetcher_add()
RemoteStorage.metadata_fetcher_get()
RemoteStorage.object_find_by_sha1_git()
RemoteStorage.object_find_recent_references()
RemoteStorage.object_references_add()
RemoteStorage.origin_add()
RemoteStorage.origin_count()
RemoteStorage.origin_get()
RemoteStorage.origin_get_by_sha1()
RemoteStorage.origin_list()
RemoteStorage.origin_search()
RemoteStorage.origin_snapshot_get_all()
RemoteStorage.origin_visit_add()
RemoteStorage.origin_visit_find_by_date()
RemoteStorage.origin_visit_get()
RemoteStorage.origin_visit_get_by()
RemoteStorage.origin_visit_get_latest()
RemoteStorage.origin_visit_get_with_statuses()
RemoteStorage.origin_visit_status_add()
RemoteStorage.origin_visit_status_get()
RemoteStorage.origin_visit_status_get_latest()
RemoteStorage.origin_visit_status_get_random()
RemoteStorage.raw_extrinsic_metadata_add()
RemoteStorage.raw_extrinsic_metadata_get()
RemoteStorage.raw_extrinsic_metadata_get_authorities()
RemoteStorage.raw_extrinsic_metadata_get_by_ids()
RemoteStorage.release_add()
RemoteStorage.release_get()
RemoteStorage.release_get_partition()
RemoteStorage.release_get_random()
RemoteStorage.release_missing()
RemoteStorage.revision_add()
RemoteStorage.revision_get()
RemoteStorage.revision_get_partition()
RemoteStorage.revision_get_random()
RemoteStorage.revision_log()
RemoteStorage.revision_missing()
RemoteStorage.revision_shortlog()
RemoteStorage.skipped_content_add()
RemoteStorage.skipped_content_find()
RemoteStorage.skipped_content_missing()
RemoteStorage.snapshot_add()
RemoteStorage.snapshot_branch_get_by_name()
RemoteStorage.snapshot_count_branches()
RemoteStorage.snapshot_get()
RemoteStorage.snapshot_get_branches()
RemoteStorage.snapshot_get_id_partition()
RemoteStorage.snapshot_get_random()
RemoteStorage.snapshot_missing()
- swh.storage.api.serializers module
- swh.storage.api.server module
get_storage()
StorageServerApp
StorageServerApp.extra_type_decoders
StorageServerApp.extra_type_encoders
StorageServerApp.method_decorators
StorageServerApp.post_content_add()
StorageServerApp.post_content_add_metadata()
StorageServerApp.post_skipped_content_add()
StorageServerApp.post_directory_add()
StorageServerApp.post_revision_add()
StorageServerApp.post_release_add()
StorageServerApp.post_snapshot_add()
StorageServerApp.post_origin_visit_status_add()
StorageServerApp.post_origin_add()
StorageServerApp.post_raw_extrinsic_metadata_add()
StorageServerApp.post_metadata_fetcher_add()
StorageServerApp.post_metadata_authority_add()
StorageServerApp.post_extid_add()
StorageServerApp.post_origin_visit_add()
non_retryable_error_handler()
default_error_handler()
index()
stat_counters()
refresh_stat_counters()
load_and_check_config()
make_app_from_configfile()
- swh.storage.api.client module
- Module contents
- Submodules
- swh.storage.cassandra package
- Submodules
- swh.storage.cassandra.common module
- swh.storage.cassandra.converters module
- swh.storage.cassandra.cql module
PARTITION_KEY_RESTRICTION_MAX_SIZE
get_execution_profiles()
create_keyspace()
CqlRunner
CqlRunner.MAX_RETRIES
CqlRunner.content_add_prepare()
CqlRunner.content_get_from_pk()
CqlRunner.content_missing_from_all_hashes()
CqlRunner.content_get_from_tokens()
CqlRunner.content_get_random()
CqlRunner.content_get_token_range()
CqlRunner.content_delete()
CqlRunner.content_index_add_one()
CqlRunner.content_get_tokens_from_single_algo()
CqlRunner.skipped_content_add_prepare()
CqlRunner.skipped_content_get_from_pk()
CqlRunner.skipped_content_get_from_token()
CqlRunner.skipped_content_delete()
CqlRunner.skipped_content_index_add_one()
CqlRunner.skipped_content_get_tokens_from_single_hash()
CqlRunner.directory_missing()
CqlRunner.directory_add_one()
CqlRunner.directory_get_random()
CqlRunner.directory_get()
CqlRunner.directory_get_token_range()
CqlRunner.directory_delete()
CqlRunner.directory_entry_add_one()
CqlRunner.directory_entry_add_concurrent()
CqlRunner.directory_entry_add_batch()
CqlRunner.directory_entry_get()
CqlRunner.directory_entry_get_from_name()
CqlRunner.directory_entry_delete()
CqlRunner.revision_missing()
CqlRunner.revision_add_one()
CqlRunner.revision_get_ids()
CqlRunner.revision_get()
CqlRunner.revision_get_random()
CqlRunner.revision_get_token_range()
CqlRunner.revision_delete()
CqlRunner.revision_parent_add_one()
CqlRunner.revision_parent_get()
CqlRunner.revision_parent_delete()
CqlRunner.release_missing()
CqlRunner.release_add_one()
CqlRunner.release_get()
CqlRunner.release_get_random()
CqlRunner.release_get_token_range()
CqlRunner.release_delete()
CqlRunner.snapshot_missing()
CqlRunner.snapshot_add_one()
CqlRunner.snapshot_get_random()
CqlRunner.snapshot_get_token_range()
CqlRunner.snapshot_delete()
CqlRunner.snapshot_branch_add_one()
CqlRunner.snapshot_count_branches_from_name()
CqlRunner.snapshot_count_branches_before_name()
CqlRunner.snapshot_count_branches()
CqlRunner.snapshot_branch_get_from_name()
CqlRunner.snapshot_branch_get_range()
CqlRunner.snapshot_branch_get()
CqlRunner.snapshot_branch_delete()
CqlRunner.origin_add_one()
CqlRunner.origin_get_by_sha1()
CqlRunner.origin_get_by_url()
CqlRunner.origin_list()
CqlRunner.origin_iter_all()
CqlRunner.origin_bump_next_visit_id()
CqlRunner.origin_generate_unique_visit_id()
CqlRunner.origin_delete()
CqlRunner.origin_visit_get()
CqlRunner.origin_visit_add_one()
CqlRunner.origin_visit_get_one()
CqlRunner.origin_visit_iter_all()
CqlRunner.origin_visit_iter()
CqlRunner.origin_visit_delete()
CqlRunner.origin_visit_status_get_range()
CqlRunner.origin_visit_status_get_all_range()
CqlRunner.origin_visit_status_add_one()
CqlRunner.origin_visit_status_get_latest()
CqlRunner.origin_visit_status_get()
CqlRunner.origin_snapshot_get_all()
CqlRunner.origin_visit_status_delete()
CqlRunner.raw_extrinsic_metadata_by_id_add()
CqlRunner.raw_extrinsic_metadata_get_by_ids()
CqlRunner.raw_extrinsic_metadata_by_id_delete()
CqlRunner.raw_extrinsic_metadata_add()
CqlRunner.raw_extrinsic_metadata_get_after_date()
CqlRunner.raw_extrinsic_metadata_get_after_date_and_id()
CqlRunner.raw_extrinsic_metadata_get()
CqlRunner.raw_extrinsic_metadata_get_authorities()
CqlRunner.raw_extrinsic_metadata_delete()
CqlRunner.metadata_authority_add()
CqlRunner.metadata_authority_get()
CqlRunner.metadata_fetcher_add()
CqlRunner.metadata_fetcher_get()
CqlRunner.extid_add_prepare()
CqlRunner.extid_get_from_pk()
CqlRunner.extid_get_from_token()
CqlRunner.extid_get_from_token_and_extid_version()
CqlRunner.extid_get_from_extid()
CqlRunner.extid_get_from_extid_and_version()
CqlRunner.extid_get_from_target()
CqlRunner.extid_delete()
CqlRunner.extid_index_add_one()
CqlRunner.extid_delete_from_by_target_table()
CqlRunner.object_reference_add_concurrent()
CqlRunner.object_reference_get()
CqlRunner.stat_counters()
CqlRunner.check_read()
- swh.storage.cassandra.model module
MAGIC_NULL_PK
content_index_table_name()
BaseRow
ContentRow
SkippedContentRow
SkippedContentRow.TABLE
SkippedContentRow.PARTITION_KEY
SkippedContentRow.sha1
SkippedContentRow.sha1_git
SkippedContentRow.sha256
SkippedContentRow.blake2s256
SkippedContentRow.length
SkippedContentRow.ctime
SkippedContentRow.status
SkippedContentRow.reason
SkippedContentRow.origin
SkippedContentRow.from_dict()
DirectoryRow
DirectoryEntryRow
RevisionRow
RevisionParentRow
ReleaseRow
SnapshotRow
SnapshotBranchRow
OriginVisitRow
OriginVisitStatusRow
OriginVisitStatusRow.TABLE
OriginVisitStatusRow.PARTITION_KEY
OriginVisitStatusRow.CLUSTERING_KEY
OriginVisitStatusRow.origin
OriginVisitStatusRow.visit
OriginVisitStatusRow.date
OriginVisitStatusRow.type
OriginVisitStatusRow.status
OriginVisitStatusRow.metadata
OriginVisitStatusRow.snapshot
OriginVisitStatusRow.from_dict()
OriginRow
MetadataAuthorityRow
MetadataFetcherRow
RawExtrinsicMetadataRow
RawExtrinsicMetadataRow.TABLE
RawExtrinsicMetadataRow.PARTITION_KEY
RawExtrinsicMetadataRow.CLUSTERING_KEY
RawExtrinsicMetadataRow.id
RawExtrinsicMetadataRow.type
RawExtrinsicMetadataRow.target
RawExtrinsicMetadataRow.authority_type
RawExtrinsicMetadataRow.authority_url
RawExtrinsicMetadataRow.discovery_date
RawExtrinsicMetadataRow.fetcher_name
RawExtrinsicMetadataRow.fetcher_version
RawExtrinsicMetadataRow.format
RawExtrinsicMetadataRow.metadata
RawExtrinsicMetadataRow.origin
RawExtrinsicMetadataRow.visit
RawExtrinsicMetadataRow.snapshot
RawExtrinsicMetadataRow.release
RawExtrinsicMetadataRow.revision
RawExtrinsicMetadataRow.path
RawExtrinsicMetadataRow.directory
RawExtrinsicMetadataByIdRow
ObjectCountRow
ExtIDRow
ExtIDByTargetRow
ObjectReferenceRow
- swh.storage.cassandra.schema module
- swh.storage.cassandra.storage module
CassandraStorage
CassandraStorage.hosts
CassandraStorage.keyspace
CassandraStorage.port
CassandraStorage.check_config()
CassandraStorage.content_add()
CassandraStorage.content_update()
CassandraStorage.content_add_metadata()
CassandraStorage.content_get_data()
CassandraStorage.content_get_partition()
CassandraStorage.content_get()
CassandraStorage.content_find()
CassandraStorage.content_missing()
CassandraStorage.content_missing_per_sha1()
CassandraStorage.content_missing_per_sha1_git()
CassandraStorage.content_get_random()
CassandraStorage.skipped_content_add()
CassandraStorage.skipped_content_find()
CassandraStorage.skipped_content_missing()
CassandraStorage.directory_add()
CassandraStorage.directory_missing()
CassandraStorage.directory_entry_get_by_path()
CassandraStorage.directory_ls()
CassandraStorage.directory_get_entries()
CassandraStorage.directory_get_raw_manifest()
CassandraStorage.directory_get_random()
CassandraStorage.directory_get_id_partition()
CassandraStorage.revision_add()
CassandraStorage.revision_missing()
CassandraStorage.revision_get()
CassandraStorage.revision_get_partition()
CassandraStorage.revision_log()
CassandraStorage.revision_shortlog()
CassandraStorage.revision_get_random()
CassandraStorage.release_add()
CassandraStorage.release_missing()
CassandraStorage.release_get()
CassandraStorage.release_get_partition()
CassandraStorage.release_get_random()
CassandraStorage.snapshot_add()
CassandraStorage.snapshot_missing()
CassandraStorage.snapshot_get()
CassandraStorage.snapshot_get_id_partition()
CassandraStorage.snapshot_count_branches()
CassandraStorage.snapshot_get_branches()
CassandraStorage.snapshot_get_random()
CassandraStorage.snapshot_branch_get_by_name()
CassandraStorage.origin_get()
CassandraStorage.origin_get_one()
CassandraStorage.origin_get_by_sha1()
CassandraStorage.origin_list()
CassandraStorage.origin_search()
CassandraStorage.origin_count()
CassandraStorage.origin_snapshot_get_all()
CassandraStorage.origin_add()
CassandraStorage.origin_visit_add()
CassandraStorage.origin_visit_status_add()
CassandraStorage.origin_visit_get()
CassandraStorage.origin_visit_get_with_statuses()
CassandraStorage.origin_visit_status_get()
CassandraStorage.origin_visit_find_by_date()
CassandraStorage.origin_visit_get_by()
CassandraStorage.origin_visit_get_latest()
CassandraStorage.origin_visit_status_get_latest()
CassandraStorage.origin_visit_status_get_random()
CassandraStorage.object_find_by_sha1_git()
CassandraStorage.stat_counters()
CassandraStorage.refresh_stat_counters()
CassandraStorage.raw_extrinsic_metadata_add()
CassandraStorage.raw_extrinsic_metadata_get()
CassandraStorage.raw_extrinsic_metadata_get_by_ids()
CassandraStorage.raw_extrinsic_metadata_get_authorities()
CassandraStorage.metadata_fetcher_add()
CassandraStorage.metadata_fetcher_get()
CassandraStorage.metadata_authority_add()
CassandraStorage.metadata_authority_get()
CassandraStorage.extid_add()
CassandraStorage.extid_get_from_extid()
CassandraStorage.extid_get_from_target()
CassandraStorage.object_find_recent_references()
CassandraStorage.object_references_add()
CassandraStorage.object_delete()
CassandraStorage.extid_delete_for_target()
CassandraStorage.clear_buffers()
CassandraStorage.flush()
- Module contents
create_keyspace()
CassandraStorage
CassandraStorage.hosts
CassandraStorage.keyspace
CassandraStorage.port
CassandraStorage.check_config()
CassandraStorage.content_add()
CassandraStorage.content_update()
CassandraStorage.content_add_metadata()
CassandraStorage.content_get_data()
CassandraStorage.content_get_partition()
CassandraStorage.content_get()
CassandraStorage.content_find()
CassandraStorage.content_missing()
CassandraStorage.content_missing_per_sha1()
CassandraStorage.content_missing_per_sha1_git()
CassandraStorage.content_get_random()
CassandraStorage.skipped_content_add()
CassandraStorage.skipped_content_find()
CassandraStorage.skipped_content_missing()
CassandraStorage.directory_add()
CassandraStorage.directory_missing()
CassandraStorage.directory_entry_get_by_path()
CassandraStorage.directory_ls()
CassandraStorage.directory_get_entries()
CassandraStorage.directory_get_raw_manifest()
CassandraStorage.directory_get_random()
CassandraStorage.directory_get_id_partition()
CassandraStorage.revision_add()
CassandraStorage.revision_missing()
CassandraStorage.revision_get()
CassandraStorage.revision_get_partition()
CassandraStorage.revision_log()
CassandraStorage.revision_shortlog()
CassandraStorage.revision_get_random()
CassandraStorage.release_add()
CassandraStorage.release_missing()
CassandraStorage.release_get()
CassandraStorage.release_get_partition()
CassandraStorage.release_get_random()
CassandraStorage.snapshot_add()
CassandraStorage.snapshot_missing()
CassandraStorage.snapshot_get()
CassandraStorage.snapshot_get_id_partition()
CassandraStorage.snapshot_count_branches()
CassandraStorage.snapshot_get_branches()
CassandraStorage.snapshot_get_random()
CassandraStorage.snapshot_branch_get_by_name()
CassandraStorage.origin_get()
CassandraStorage.origin_get_one()
CassandraStorage.origin_get_by_sha1()
CassandraStorage.origin_list()
CassandraStorage.origin_search()
CassandraStorage.origin_count()
CassandraStorage.origin_snapshot_get_all()
CassandraStorage.origin_add()
CassandraStorage.origin_visit_add()
CassandraStorage.origin_visit_status_add()
CassandraStorage.origin_visit_get()
CassandraStorage.origin_visit_get_with_statuses()
CassandraStorage.origin_visit_status_get()
CassandraStorage.origin_visit_find_by_date()
CassandraStorage.origin_visit_get_by()
CassandraStorage.origin_visit_get_latest()
CassandraStorage.origin_visit_status_get_latest()
CassandraStorage.origin_visit_status_get_random()
CassandraStorage.object_find_by_sha1_git()
CassandraStorage.stat_counters()
CassandraStorage.refresh_stat_counters()
CassandraStorage.raw_extrinsic_metadata_add()
CassandraStorage.raw_extrinsic_metadata_get()
CassandraStorage.raw_extrinsic_metadata_get_by_ids()
CassandraStorage.raw_extrinsic_metadata_get_authorities()
CassandraStorage.metadata_fetcher_add()
CassandraStorage.metadata_fetcher_get()
CassandraStorage.metadata_authority_add()
CassandraStorage.metadata_authority_get()
CassandraStorage.extid_add()
CassandraStorage.extid_get_from_extid()
CassandraStorage.extid_get_from_target()
CassandraStorage.object_find_recent_references()
CassandraStorage.object_references_add()
CassandraStorage.object_delete()
CassandraStorage.extid_delete_for_target()
CassandraStorage.clear_buffers()
CassandraStorage.flush()
- Submodules
- swh.storage.postgresql package
- Submodules
- swh.storage.postgresql.converters module
- swh.storage.postgresql.db module
jsonize()
QueryBuilder
ObjectReferencesPartition
Db
Db.mktemp_dir_entry()
Db.mktemp_revision()
Db.mktemp_release()
Db.mktemp_snapshot_branch()
Db.content_add_from_temp()
Db.directory_add_from_temp()
Db.skipped_content_add_from_temp()
Db.revision_add_from_temp()
Db.extid_add_from_temp()
Db.release_add_from_temp()
Db.content_update_from_temp()
Db.content_get_metadata_keys
Db.content_add_keys
Db.skipped_content_keys
Db.content_get_metadata_from_hashes()
Db.content_get_range()
Db.content_hash_keys
Db.content_missing_from_list()
Db.content_missing_per_sha1()
Db.content_missing_per_sha1_git()
Db.content_find_cols
Db.content_find()
Db.content_get_random()
Db.skipped_content_missing()
Db.skipped_content_find_cols
Db.skipped_content_find()
Db.directory_missing_from_list()
Db.directory_ls_cols
Db.directory_walk_one()
Db.directory_walk()
Db.directory_entry_get_by_path()
Db.directory_get_entries_cols
Db.directory_get_entries()
Db.directory_get_raw_manifest()
Db.directory_get_id_range()
Db.directory_get_random()
Db.revision_missing_from_list()
Db.revision_add_cols
Db.revision_get_cols
Db.mangle_query_key()
Db.revision_get_from_list()
Db.revision_get_range()
Db.revision_log()
Db.revision_shortlog_cols
Db.revision_shortlog()
Db.revision_get_random()
Db.extid_cols
Db.extid_get_from_extid_list()
Db.extid_get_from_swhid_list()
Db.extid_delete_for_target()
Db.release_missing_from_list()
Db.release_add_cols
Db.release_get_cols
Db.release_get_from_list()
Db.release_get_range()
Db.release_get_random()
Db.snapshot_exists()
Db.snapshot_missing_from_list()
Db.snapshot_add()
Db.snapshot_count_cols
Db.snapshot_count_branches()
Db.snapshot_get_cols
Db.snapshot_get_by_id()
Db.snapshot_branch_get_by_name()
Db.snapshot_get_id_range()
Db.snapshot_get_random()
Db.origin_visit_add()
Db.origin_visit_status_cols
Db.origin_visit_status_add()
Db.origin_visit_cols
Db.origin_visit_add_with_id()
Db.origin_visit_get_cols
Db.origin_visit_select_cols
Db.origin_visit_status_select_cols
Db.origin_visit_status_get_latest()
Db.origin_visit_status_get_range()
Db.origin_visit_get_range()
Db.origin_visit_status_get_all_in_range()
Db.origin_visit_get()
Db.origin_visit_find_by_date()
Db.origin_visit_exists()
Db.origin_visit_get_latest()
Db.origin_visit_get_random()
Db.origin_add()
Db.origin_cols
Db.origin_get_by_url()
Db.origin_get_by_sha1()
Db.origin_id_get_by_url()
Db.origin_get_range_cols
Db.origin_get_range()
Db.origin_search()
Db.origin_count()
Db.origin_snapshot_get_all()
Db.object_find_by_sha1_git_cols
Db.object_find_by_sha1_git()
Db.stat_counters()
Db.raw_extrinsic_metadata_get_cols
Db.raw_extrinsic_metadata_add()
Db.raw_extrinsic_metadata_get()
Db.raw_extrinsic_metadata_get_by_ids()
Db.raw_extrinsic_metadata_get_authorities()
Db.metadata_fetcher_cols
Db.metadata_fetcher_add()
Db.metadata_fetcher_get()
Db.metadata_fetcher_get_id()
Db.metadata_authority_cols
Db.metadata_authority_add()
Db.metadata_authority_get()
Db.metadata_authority_get_id()
Db.object_references_get()
Db.object_references_add()
Db.object_references_create_partition()
Db.object_references_drop_partition()
Db.object_references_list_partitions()
Db.object_delete()
- swh.storage.postgresql.storage module
EMPTY_SNAPSHOT_ID
VALIDATION_EXCEPTIONS
convert_validation_exceptions()
db_transaction_generator()
db_transaction()
Storage
Storage.current_version
Storage.get_db()
Storage.put_db()
Storage.db()
Storage.get_flavor()
Storage.flavor
Storage.check_config()
Storage.content_add()
Storage.content_update()
Storage.content_add_metadata()
Storage.content_get_data()
Storage.content_get_partition()
Storage.content_get()
Storage.content_missing()
Storage.content_missing_per_sha1()
Storage.content_missing_per_sha1_git()
Storage.content_find()
Storage.content_get_random()
Storage.skipped_content_add()
Storage.skipped_content_find()
Storage.skipped_content_missing()
Storage.directory_add()
Storage.directory_missing()
Storage.directory_ls()
Storage.directory_entry_get_by_path()
Storage.directory_get_random()
Storage.directory_get_entries()
Storage.directory_get_raw_manifest()
Storage.directory_get_id_partition()
Storage.revision_add()
Storage.revision_missing()
Storage.revision_get_partition()
Storage.revision_get()
Storage.revision_log()
Storage.revision_shortlog()
Storage.revision_get_random()
Storage.extid_get_from_extid()
Storage.extid_get_from_target()
Storage.extid_add()
Storage.release_add()
Storage.release_missing()
Storage.release_get()
Storage.release_get_partition()
Storage.release_get_random()
Storage.snapshot_add()
Storage.snapshot_missing()
Storage.snapshot_get()
Storage.snapshot_get_id_partition()
Storage.snapshot_count_branches()
Storage.snapshot_get_branches()
Storage.snapshot_get_random()
Storage.snapshot_branch_get_by_name()
Storage.origin_visit_add()
Storage.origin_visit_status_add()
Storage.origin_visit_status_get_latest()
Storage.origin_visit_get()
Storage.origin_visit_get_with_statuses()
Storage.origin_visit_find_by_date()
Storage.origin_visit_get_by()
Storage.origin_visit_get_latest()
Storage.origin_visit_status_get()
Storage.origin_visit_status_get_random()
Storage.origin_get()
Storage.origin_get_by_sha1()
Storage.origin_get_range()
Storage.origin_list()
Storage.origin_search()
Storage.origin_count()
Storage.origin_snapshot_get_all()
Storage.origin_add()
Storage.object_find_by_sha1_git()
Storage.stat_counters()
Storage.refresh_stat_counters()
Storage.raw_extrinsic_metadata_add()
Storage.raw_extrinsic_metadata_get()
Storage.raw_extrinsic_metadata_get_by_ids()
Storage.metadata_fetcher_add()
Storage.metadata_fetcher_get()
Storage.raw_extrinsic_metadata_get_authorities()
Storage.metadata_authority_add()
Storage.metadata_authority_get()
Storage.object_find_recent_references()
Storage.object_references_add()
Storage.clear_buffers()
Storage.flush()
Storage.object_delete()
Storage.extid_delete_for_target()
- Module contents
- Submodules
- swh.storage.proxies package
- Subpackages
- swh.storage.proxies.blocking package
- Submodules
- swh.storage.proxies.blocking.cli module
- swh.storage.proxies.blocking.db module
BlockingState
BlockingStatus
BlockingRequest
RequestHistory
BlockingLogEntry
BlockedOrigin
BlockingDb
get_urls_to_check()
BlockingAdmin
BlockingAdmin.create_request()
BlockingAdmin.find_request()
BlockingAdmin.find_request_by_id()
BlockingAdmin.get_requests()
BlockingAdmin.set_origins_state()
BlockingAdmin.get_states_for_request()
BlockingAdmin.find_blocking_states()
BlockingAdmin.delete_blocking_states()
BlockingAdmin.record_history()
BlockingAdmin.get_history()
BlockingAdmin.get_log()
BlockingQuery
- Module contents
- Submodules
- swh.storage.proxies.masking package
- Submodules
- swh.storage.proxies.masking.cli module
- swh.storage.proxies.masking.db module
DuplicateRequest
RequestNotFound
MaskedState
MaskedStatus
MaskingRequest
MaskingRequestHistory
MaskedObject
DisplayName
MaskingDb
MaskingAdmin
MaskingAdmin.create_request()
MaskingAdmin.find_request()
MaskingAdmin.find_request_by_id()
MaskingAdmin.get_requests()
MaskingAdmin.set_object_state()
MaskingAdmin.get_states_for_request()
MaskingAdmin.find_masks()
MaskingAdmin.delete_masks()
MaskingAdmin.record_history()
MaskingAdmin.get_history()
MaskingAdmin.set_display_name()
MaskingAdmin.set_display_names()
MaskingQuery
- Module contents
get_datastore()
masking_overhead_timer()
MaskingProxyStorage
MaskingProxyStorage.content_get_data()
MaskingProxyStorage.RANDOM_ATTEMPTS
MaskingProxyStorage.TRevision
MaskingProxyStorage.TRelease
MaskingProxyStorage.revision_get()
MaskingProxyStorage.revision_log()
MaskingProxyStorage.revision_get_partition()
MaskingProxyStorage.release_get()
MaskingProxyStorage.release_get_partition()
- Submodules
- swh.storage.proxies.blocking package
- Submodules
- swh.storage.proxies.buffer module
- swh.storage.proxies.counter module
- swh.storage.proxies.filter module
- swh.storage.proxies.record_references module
- swh.storage.proxies.retry module
- swh.storage.proxies.tenacious module
- swh.storage.proxies.validate module
- Module contents
- Subpackages
- swh.storage.algos package
- Submodules
- swh.storage.backfill module
- swh.storage.cli module
- swh.storage.common module
- swh.storage.exc module
- swh.storage.fixer module
- swh.storage.in_memory module
Table
InMemoryCqlRunner
InMemoryCqlRunner.increment_counter()
InMemoryCqlRunner.stat_counters()
InMemoryCqlRunner.content_add_prepare()
InMemoryCqlRunner.content_get_from_pk()
InMemoryCqlRunner.content_get_from_tokens()
InMemoryCqlRunner.content_get_random()
InMemoryCqlRunner.content_get_token_range()
InMemoryCqlRunner.content_missing_from_all_hashes()
InMemoryCqlRunner.content_missing_by_sha1_git()
InMemoryCqlRunner.content_index_add_one()
InMemoryCqlRunner.content_get_tokens_from_single_algo()
InMemoryCqlRunner.skipped_content_add_prepare()
InMemoryCqlRunner.skipped_content_get_from_pk()
InMemoryCqlRunner.skipped_content_get_from_token()
InMemoryCqlRunner.skipped_content_index_add_one()
InMemoryCqlRunner.skipped_content_get_tokens_from_single_hash()
InMemoryCqlRunner.directory_missing()
InMemoryCqlRunner.directory_add_one()
InMemoryCqlRunner.directory_get_random()
InMemoryCqlRunner.directory_get()
InMemoryCqlRunner.directory_get_token_range()
InMemoryCqlRunner.directory_entry_add_one()
InMemoryCqlRunner.directory_entry_get()
InMemoryCqlRunner.directory_entry_get_from_name()
InMemoryCqlRunner.revision_missing()
InMemoryCqlRunner.revision_add_one()
InMemoryCqlRunner.revision_get_ids()
InMemoryCqlRunner.revision_get()
InMemoryCqlRunner.revision_get_token_range()
InMemoryCqlRunner.revision_get_random()
InMemoryCqlRunner.revision_parent_add_one()
InMemoryCqlRunner.revision_parent_get()
InMemoryCqlRunner.release_missing()
InMemoryCqlRunner.release_add_one()
InMemoryCqlRunner.release_get()
InMemoryCqlRunner.release_get_token_range()
InMemoryCqlRunner.release_get_random()
InMemoryCqlRunner.snapshot_missing()
InMemoryCqlRunner.snapshot_add_one()
InMemoryCqlRunner.snapshot_get_token_range()
InMemoryCqlRunner.snapshot_get_random()
InMemoryCqlRunner.snapshot_branch_get_from_name()
InMemoryCqlRunner.snapshot_branch_add_one()
InMemoryCqlRunner.snapshot_count_branches()
InMemoryCqlRunner.snapshot_branch_get()
InMemoryCqlRunner.origin_add_one()
InMemoryCqlRunner.origin_get_by_sha1()
InMemoryCqlRunner.origin_get_by_url()
InMemoryCqlRunner.origin_list()
InMemoryCqlRunner.origin_iter_all()
InMemoryCqlRunner.origin_bump_next_visit_id()
InMemoryCqlRunner.origin_generate_unique_visit_id()
InMemoryCqlRunner.origin_visit_get()
InMemoryCqlRunner.origin_visit_add_one()
InMemoryCqlRunner.origin_visit_get_one()
InMemoryCqlRunner.origin_visit_iter_all()
InMemoryCqlRunner.origin_visit_iter()
InMemoryCqlRunner.origin_visit_status_get_range()
InMemoryCqlRunner.origin_visit_status_get_all_range()
InMemoryCqlRunner.origin_visit_status_add_one()
InMemoryCqlRunner.origin_visit_status_get_latest()
InMemoryCqlRunner.origin_visit_status_get()
InMemoryCqlRunner.origin_snapshot_get_all()
InMemoryCqlRunner.metadata_authority_add()
InMemoryCqlRunner.metadata_authority_get()
InMemoryCqlRunner.metadata_fetcher_add()
InMemoryCqlRunner.metadata_fetcher_get()
InMemoryCqlRunner.raw_extrinsic_metadata_by_id_add()
InMemoryCqlRunner.raw_extrinsic_metadata_get_by_ids()
InMemoryCqlRunner.raw_extrinsic_metadata_add()
InMemoryCqlRunner.raw_extrinsic_metadata_get_after_date()
InMemoryCqlRunner.raw_extrinsic_metadata_get_after_date_and_id()
InMemoryCqlRunner.raw_extrinsic_metadata_get()
InMemoryCqlRunner.raw_extrinsic_metadata_get_authorities()
InMemoryCqlRunner.extid_add_prepare()
InMemoryCqlRunner.extid_index_add_one()
InMemoryCqlRunner.extid_get_from_pk()
InMemoryCqlRunner.extid_get_from_extid()
InMemoryCqlRunner.extid_get_from_extid_and_version()
InMemoryCqlRunner.extid_get_from_target()
InMemoryCqlRunner.object_reference_add_concurrent()
InMemoryCqlRunner.object_reference_get()
InMemoryStorage
- swh.storage.interface module
ListOrder
PartialBranches
SnapshotBranchByNameResponse
HashDict
TotalHashDict
OriginVisitWithStatuses
ObjectReference
deprecated()
StorageInterface
StorageInterface.check_config()
StorageInterface.content_add()
StorageInterface.content_update()
StorageInterface.content_add_metadata()
StorageInterface.content_get_data()
StorageInterface.content_get_partition()
StorageInterface.content_get()
StorageInterface.content_missing()
StorageInterface.content_missing_per_sha1()
StorageInterface.content_missing_per_sha1_git()
StorageInterface.content_find()
StorageInterface.content_get_random()
StorageInterface.skipped_content_add()
StorageInterface.skipped_content_find()
StorageInterface.skipped_content_missing()
StorageInterface.directory_add()
StorageInterface.directory_missing()
StorageInterface.directory_ls()
StorageInterface.directory_entry_get_by_path()
StorageInterface.directory_get_entries()
StorageInterface.directory_get_raw_manifest()
StorageInterface.directory_get_random()
StorageInterface.directory_get_id_partition()
StorageInterface.revision_add()
StorageInterface.revision_missing()
StorageInterface.revision_get_partition()
StorageInterface.revision_get()
StorageInterface.revision_log()
StorageInterface.revision_shortlog()
StorageInterface.revision_get_random()
StorageInterface.extid_get_from_extid()
StorageInterface.extid_get_from_target()
StorageInterface.extid_add()
StorageInterface.release_add()
StorageInterface.release_missing()
StorageInterface.release_get()
StorageInterface.release_get_random()
StorageInterface.release_get_partition()
StorageInterface.snapshot_add()
StorageInterface.snapshot_missing()
StorageInterface.snapshot_get()
StorageInterface.snapshot_count_branches()
StorageInterface.snapshot_get_branches()
StorageInterface.snapshot_get_random()
StorageInterface.snapshot_branch_get_by_name()
StorageInterface.snapshot_get_id_partition()
StorageInterface.origin_visit_add()
StorageInterface.origin_visit_status_add()
StorageInterface.origin_visit_get()
StorageInterface.origin_visit_find_by_date()
StorageInterface.origin_visit_get_by()
StorageInterface.origin_visit_get_latest()
StorageInterface.origin_visit_status_get()
StorageInterface.origin_visit_status_get_latest()
StorageInterface.origin_visit_get_with_statuses()
StorageInterface.origin_visit_status_get_random()
StorageInterface.origin_get()
StorageInterface.origin_get_by_sha1()
StorageInterface.origin_list()
StorageInterface.origin_search()
StorageInterface.origin_count()
StorageInterface.origin_snapshot_get_all()
StorageInterface.origin_add()
StorageInterface.object_find_recent_references()
StorageInterface.object_references_add()
StorageInterface.object_find_by_sha1_git()
StorageInterface.stat_counters()
StorageInterface.refresh_stat_counters()
StorageInterface.raw_extrinsic_metadata_add()
StorageInterface.raw_extrinsic_metadata_get()
StorageInterface.raw_extrinsic_metadata_get_by_ids()
StorageInterface.raw_extrinsic_metadata_get_authorities()
StorageInterface.metadata_fetcher_add()
StorageInterface.metadata_fetcher_get()
StorageInterface.metadata_authority_add()
StorageInterface.metadata_authority_get()
StorageInterface.clear_buffers()
StorageInterface.flush()
ObjectDeletionInterface
- swh.storage.metrics module
- swh.storage.migrate_extrinsic_metadata module
pypi_project_from_filename()
pypi_origin_from_project_name()
pypi_origin_from_filename()
cran_package_from_url()
npm_package_from_source_url()
remove_atom_codemeta_metadata_with_xmlns()
remove_atom_codemeta_metadata_without_xmlns()
debian_origins_from_row()
assert_origin_exists()
check_origin_exists()
load_metadata()
handle_deposit_row()
handle_row()
create_fetchers()
iter_revision_rows()
main()
- swh.storage.objstorage module
- swh.storage.pytest_plugin module
- swh.storage.replay module
- swh.storage.utils module
- swh.storage.writer module
model_object_dict_sanitizer()
JournalWriter
JournalWriter.write_addition()
JournalWriter.write_additions()
JournalWriter.content_add()
JournalWriter.content_update()
JournalWriter.content_add_metadata()
JournalWriter.skipped_content_add()
JournalWriter.directory_add()
JournalWriter.revision_add()
JournalWriter.release_add()
JournalWriter.snapshot_add()
JournalWriter.origin_visit_add()
JournalWriter.origin_visit_status_add()
JournalWriter.origin_add()
JournalWriter.raw_extrinsic_metadata_add()
JournalWriter.metadata_fetcher_add()
JournalWriter.metadata_authority_add()
JournalWriter.extid_add()
- Module contents
- Subpackages