Common database utilities#
The swh.core.db
module offers a set of common (postgresql) database
handling utilities and features for other swh packages implementing a
datastore, aka a service responsible for providing a data store via a common
interface which can use a postgresql database as backend. Examples are
swh.storage
or swh.scheduler
.
Most of the time, this database-based data storage facility will depend on a data
schema (may be based in swh.model
or not) and provide a unified interface
based on an Python class to abstract access to this datastore.
Some packages may implement only a postgresql backend, some may provide more backends.
This swh.core.db
only deals with the postgresql part and provides common
features and tooling to manage the database lifecycle in a consistent and
unified way among all the swh
packages.
It comes with a few command line tools to manage the specific swh
package database.
As such, most of the database management cli commands require a configuration file holding the database connection information.
For example, for the swh.storage
package, one will be able to create,
initialize and upgrade the postgresql database using simple commands.
To create the database and perform superuser initialization steps (see below):
$ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
If the database already exists but lacks superuser level initialization steps, you may use:
$ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage
Then assuming the config.yml
file existence:
storage:
cls: postgresql
db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd
objstorage:
cls: memory
then you can run:
$ swh db --config-file=config.yml init storage
DONE database for storage initialized (flavor default) at version 182
Note: you can define the SWH_CONFIG_FILENAME
environment variable instead
of using the --config-name
command line option.
or check the actual data model version of this database:
$ swh db --config-file=config.yml version storage
module: storage
flavor: default
version: 182
as well as the migration history for the database:
$ swh db --config-file=config.yml version --all storage
module: storage
flavor: default
182 [2022-02-11 15:08:31.806070+01:00] Work In Progress
181 [2022-02-11 14:06:27.435010+01:00] Work In Progress
The database migration is done using the swh db upgrade
command.
Implementation of a swh.core.db datastore#
To use this database management tooling, in a swh
package, the following
conditions are expected:
the package should provide an
sql
directory in its root namespace providing initialization sql scripts. Scripts should be named likenn-xxx.sql
and are executed in order according to thenn
integer value. Scripts having-superuser-
in their name will be executed by theinit-admin
tool and are expected to require superuser access level, whereas scripts without-superuser-
in their name will be executed by theswh db init
command and are expected to require write access level (with no need for superuser access level).the package should provide a
sql/upgrade
directory with SQL migration scripts in its root namespace. Script names are expected to be of the formnnn.sql
where nnn is the version to which this script does the migration from a database at version nnn - 1.the initialization and migration scripts should not create nor fill the metadata related tables (
dbversion
anddbmodule
).the package should provide a
get_datastore
function in its root namespace returning an instance of the datastore object. Normally, this datastore object usesswh.core.db.BaseDb
to interact with the actual database.The datastore object should provide a
current_version
attribute returning the database version expected by the code.
See existing swh
packages like swh.storage
or swh.scheduler
for
usage examples.
Writing tests#
The swh.core.db.pytest_plugin
provides a few helper tools to write unit
tests for postgresql based datastores.
By default, when using these fixtures, a postgresql server will be started (by
the pytest_postgresql fixture) and a template database will be created using
the postgresql_proc
fixture factory provided by pytest_postgresql
.
Then a dedicated fixture must be declared to use the postgresql_proc
fixture generated by the fixture factory function.
This template database will then be used to create a new database for test using this dedicated fixture.
In order to help the database initialization process and make it consistent
with the database initialization tools from the swh db
cli, an
initialize_database_for_module()
function is provided to be used with the
fixture factory described above.
Typically, writing tests for a swh
package swh.example
would look like:
from functools import partial
from pytest_postgresql import factories
from swh.core.db.pytest_plugin import postgresql_fact
from swh.core.db.pytest_plugin import initialize_database_for_module
example_postgresql_proc = factories.postgresql_proc(
dbname="example",
load=[partial(initialize_database_for_module,
modname="example", version=1)]
)
postgresql_example = postgresql_fact("example_postgresql_proc")
def test_example(postgresql_example):
with postgresql_example.cursor() as c:
c.execute("select version from dbversion limit 1")
assert c.fecthone()[0] == 1
Note: most of the time, you will want to put the scaffolding part of the code
above in a conftest.py
file.
The load
argument of the factories.postgresql_proc
will be used to
initialize the template database that will be used to create a new database for
each test, while the load
argument of the postgresql_fact
fixture will
be executed before each test (in the database created from the template
database and dedicated to the test being executed).