.. _swh-core-db: Common database utilities ========================= The ``swh.core.db`` module offers a set of common (postgresql) database handling utilities and features for other swh packages implementing a `datastore`, aka a service responsible for providing a data store via a common interface which can use a postgresql database as backend. Examples are :mod:`swh.storage` or :mod:`swh.scheduler`. Most of the time, this database-based data storage facility will depend on a data schema (may be based in :mod:`swh.model` or not) and provide a unified interface based on an Python class to abstract access to this datastore. Some packages may implement only a postgresql backend, some may provide more backends. This :mod:`swh.core.db` only deals with the postgresql part and provides common features and tooling to manage the database lifecycle in a consistent and unified way among all the :mod:`swh` packages. It comes with a few command line tools to manage the specific :mod:`swh` package database. As such, most of the database management cli commands require a configuration file holding the database connection information. For example, for the :mod:`swh.storage` package, one will be able to create, initialize and upgrade the postgresql database using simple commands. To create the database and perform superuser initialization steps (see below): .. code-block:: bash $ swh db create storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage If the database already exists but lacks superuser level initialization steps, you may use: .. code-block:: bash $ swh db init-admin storage --dbname=postgresql://superuser:passwd@localhost:5433/test-storage Then assuming the ``config.yml`` file existence: .. code-block:: yaml storage: cls: postgresql db: host=localhost, port=5433, dbname=test-storage, username=normal-user, password=pwd objstorage: cls: memory then you can run: .. code-block:: bash $ swh db --config-file=config.yml init storage DONE database for storage initialized (flavor default) at version 182 Note: you can define the ``SWH_CONFIG_FILENAME`` environment variable instead of using the ``--config-name`` command line option. or check the actual data model version of this database: .. code-block:: bash $ swh db --config-file=config.yml version storage module: storage flavor: default version: 182 as well as the migration history for the database: .. code-block:: bash $ swh db --config-file=config.yml version --all storage module: storage flavor: default 182 [2022-02-11 15:08:31.806070+01:00] Work In Progress 181 [2022-02-11 14:06:27.435010+01:00] Work In Progress The database migration is done using the ``swh db upgrade`` command. Implementation of a swh.core.db datastore ----------------------------------------- To use this database management tooling, in a :mod:`swh` package, the following conditions are expected: - the package should provide an ``sql`` directory in its root namespace providing initialization sql scripts. Scripts should be named like ``nn-xxx.sql`` and are executed in order according to the ``nn`` integer value. Scripts having ``-superuser-`` in their name will be executed by the ``init-admin`` tool and are expected to require superuser access level, whereas scripts without ``-superuser-`` in their name will be executed by the ``swh db init`` command and are expected to require write access level (with no need for superuser access level). - the package should provide a ``sql/upgrade`` directory with SQL migration scripts in its root namespace. Script names are expected to be of the form ``nnn.sql`` where `nnn` is the version to which this script does the migration from a database at version `nnn - 1`. - the initialization and migration scripts should not create nor fill the metadata related tables (``dbversion`` and ``dbmodule``). - the package should provide a ``get_datastore`` function in its root namespace returning an instance of the datastore object. Normally, this datastore object uses ``swh.core.db.BaseDb`` to interact with the actual database. - The datastore object should provide a ``current_version`` attribute returning the database version expected by the code. See existing ``swh`` packages like ``swh.storage`` or ``swh.scheduler`` for usage examples. Writing tests ------------- The ``swh.core.db.pytest_plugin`` provides a few helper tools to write unit tests for postgresql based datastores. By default, when using these fixtures, a posgresql server will be started (by the pytest_postgresql fixture) and a template database will be created using the ``postgresql_proc`` fixture factory provided by ``pytest_postgresql``. Then a dedicated fixture must be declared to use the ``postgresql_proc`` fixture generated by the fixture factory function. This template database will then be used to create a new database for test using this dedicated fixture. In order to help the database initialization process and make it consistent with the database initialization tools from the ``swh db`` cli, an ``initialize_database_for_module()`` function is provided to be used with the fixture factory described above. Typically, writing tests for a ``swh`` package ``swh.example`` would look like: .. code-block:: python from functools import partial from pytest_postgresql import factories from swh.core.db.pytest_plugin import postgresql_fact from swh.core.db.pytest_plugin import initialize_database_for_module example_postgresql_proc = factories.postgresql_proc( dbname="example", load=[partial(initialize_database_for_module, modname="example", version=1)] ) postgresql_example = postgresql_fact("example_postgresql_proc") def test_example(postgresql_example): with postgresql_example.cursor() as c: c.execute("select version from dbversion limit 1") assert c.fecthone()[0] == 1 Note: most of the time, you will want to put the scaffolding part of the code above in a ``conftest.py`` file. The ``load`` argument of the ``factories.postgresql_proc`` will be used to initialize the template database that will be used to create a new database for each test, while the ``load`` argument of the ``postgresql_fact`` fixture will be executed before each test (in the database created from the template database and dedicated to the test being executed).