How to handle a storage database migration#
Intended audience
sysadm staff members
If a storage database upgrade is needed, a migration script should already exists in the swh-storage git repository.
Upgrade version#
Check the current database version (first one in desc order):
select dbversion from dbversion order by version desc limit 1;
Say, for example that the result is 159 here.
Check the migration script folder in swh-storage:/sql/upgrades/ (and find the next one, for example 160.sql). It’s previous version number + 1 from the given db version retrieved (so 160 with the current example).
Note: That you could need to run more than one migration. It depends on the current packaged version and the next version we want to deploy. Check the git history to determine that.
Requisite#
Ensure the migration script runs first in the staging database (db0.internal.staging.swh.network is the node holding the swh staging database). Then you can go ahead and run it in production database (belvedere.internal.softwareheritage.org).
Connect to the db with the user with write permission, then run the script:
$ psql -e ...
> \i sql/upgrades/160.sql
Note:
-e so you can see the queries currently running prior to its result
For long-running scripts, connect to the remote machine first [5] [6]
Adaptations#
Hopefully, in production, the script runs as is without adaptation…
Otherwise, if the data volume for a given table is large, you may want to adapt. See 160.sql and its adaptation
For such a case, consider working on ranges on the table id instead. So it uses index and keep the transaction short. Long-standing migration query (translates to long running transaction). This could create too many WALs accumulation (for the replication), thus disk space starvation issue, etc…
Note#
We use grafana to ensure everything is fine (for example, for the replication, we use the postgresql database dashboard, bottom page to the right).
We also use it to keep a reference of what happened for a given deployment. For this, Open a grafana dashboard (for example worker task processing dashboard) and add a tag deployment (so it’s shared across dashboards) with a description on what is the current deployment about. It’s usually a list of module names that gets deployed and associated version deployed.