Common database utilities

The swh.core.db module offers a set of common (postgresql) database handling utilities and features for other swh packages implementing a datastore, aka a service responsible for providing a data store via a common interface which can use a postgresql database as backend. Examples are swh.storage or swh.scheduler.

Most of the time, this database-based data storage facility will depend on a data schema (may be based in swh.model or not) and provide a unified interface based on an Python class to abstract access to this datastore.

Some packages may implement only a postgresql backend, some may provide more backends.

This swh.core.db only deals with the postgresql part and provides common features and tooling to manage the database lifecycle in a consistent and unified way among all the swh packages.

It comes with a few command line tools to manage the specific swh package database.

As such, most of the database management cli commands require a configuration file holding the database connection information.

For example, for the swh.storage package, one will be able to create, initialize and upgrade the postgresql database using simple commands.

To create the database and perform superuser initialization steps (see below):

$ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage

If the database already exists but lacks superuser level initialization steps, you may use:

$ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage

Then assuming the config.yml file existence:

storage:
  cls: postgresql
      db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd
      objstorage:
        cls: memory

then you can run:

$ swh db --config-file=config.yml init storage
DONE database for storage initialized (flavor default) at version 182

Note: you can define the SWH_CONFIG_FILENAME environment variable instead of using the --config-name command line option.

or check the actual data model version of this database:

$ swh db --config-file=config.yml version storage
module: storage
flavor: default
version: 182

as well as the migration history for the database:

$ swh db --config-file=config.yml version --all storage
module: storage
flavor: default
182 [2022-02-11 15:08:31.806070+01:00] Work In Progress
181 [2022-02-11 14:06:27.435010+01:00] Work In Progress

The database migration is done using the swh db upgrade command.

Implementation of a swh.core.db datastore

To use this database management tooling, in a swh package, the following conditions are expected:

  • the package should provide an sql directory in its root namespace providing initialization sql scripts. Scripts should be named like nn-xxx.sql and are executed in order according to the nn integer value. Scripts having -superuser- in their name will be executed by the init-admin tool and are expected to require superuser access level, whereas scripts without -superuser- in their name will be executed by the swh db init command and are expected to require write access level (with no need for superuser access level).

  • the package should provide a sql/upgrade directory with SQL migration scripts in its root namespace. Script names are expected to be of the form nnn.sql where nnn is the version to which this script does the migration from a database at version nnn - 1.

  • the initialization and migration scripts should not create nor fill the metadata related tables (dbversion and dbmodule).

  • the package should provide a get_datastore function in its root namespace returning an instance of the datastore object. Normally, this datastore object uses swh.core.db.BaseDb to interact with the actual database.

  • The datastore object should provide a current_version attribute returning the database version expected by the code.

See existing swh packages like swh.storage or swh.scheduler for usage examples.

Writing tests

The swh.core.db.pytest_plugin provides a few helper tools to write unit tests for postgresql based datastores.

By default, when using these fixtures, a posgresql server will be started (by the pytest_postgresql fixture) and a template database will be created using the postgresql_proc fixture factory provided by pytest_postgresql.

Then a dedicated fixture must be declared to use the postgresql_proc fixture generated by the fixture factory function.

This template database will then be used to create a new database for test using this dedicated fixture.

In order to help the database initialization process and make it consistent with the database initialization tools from the swh db cli, an initialize_database_for_module() function is provided to be used with the fixture factory described above.

Typically, writing tests for a swh package swh.example would look like:

from functools import partial

from pytest_postgresql import factories
from swh.core.db.pytest_plugin import postgresql_fact
from swh.core.db.pytest_plugin import initialize_database_for_module

example_postgresql_proc = factories.postgresql_proc(
  dbname="example",
  load=[partial(initialize_database_for_module,
                modname="example", version=1)]
  )

postgresql_example = postgresql_fact("example_postgresql_proc")

def test_example(postgresql_example):
    with postgresql_example.cursor() as c:
        c.execute("select version from dbversion limit 1")
        assert c.fecthone()[0] == 1

Note: most of the time, you will want to put the scaffolding part of the code above in a conftest.py file.

The load argument of the factories.postgresql_proc will be used to initialize the template database that will be used to create a new database for each test, while the load argument of the postgresql_fact fixture will be executed before each test (in the database created from the template database and dedicated to the test being executed).