Postgresql Upgrades and Maintenance with CloudNativePG cluster#

Intended audience

sysadm staff members

Regarding postgresql, there are 2 kinds of upgrade, either the postgresql cluster database’s version used or the cloudnative-pg operator’s version.

Note that it’s always necessary to look at the changelog of the new release to ensure no footgun can occur during the upgrade process.

Upgrade#

Whatever the tool to upgrade, the behavior of the postgresql clusters will remain the same.

When high availability is configured, a rollout update of the postgresql clusters will occur. That means, upgrade is starting first with standby(s) then the primary (when everything went fine with standbys). This ensures no downtime occurs for the database clusters.

When no high availability is configured, as there is no standby to relay the primary, then downtime can occur. It’s ok for testing environment like the staging-next-version environment.

Postgresql upgrade#

For the postgresql version, we are using an ImageCatalog kind object. This is a dictionary which references the major version of the postgresql images to use. Currently, all databases are using the version 17.

Major or minor version, it’s up to the cloudnative-pg operator to manage the database upgrades properly.

minor version#

Minor upgrade can happen when we update the image catalog.

To upgrade such ImageCatalog, retrieve the latest version and merge it in swh-charts’s repository file. Commit and push the commit changes. At the next argocd sync, this will trigger a rollout update of the postgresql clusters.

See the official documentation on minor version upgrades for more information.

major version#

The major version is declared in swh-charts’ repository and can be managed per cluster declaration. It’s currently declared globally but can be overridden per instance. Commit and push the commit changes. At the next argocd sync, this will trigger a rollout update of the postgresql clusters.

See the official documentation on major version upgrades for more information.

Postgresql Operator upgrade#

The cloudnative-pg operator is managed through the swh-charts repository, with the cluster-configuration chart. Update the version key entry either globally or in the specific cluster version first (to restrict changes).

In the same way described previously, there will be an update of the cloudnative-pg operator then a rollout update of the postgresql clusters.

Kubernetes Upgrade#

Usual kubernetes upgrade happens automatically when nodes are drained. The behavior will depend on how the postgresql cluster is configured. Overall, it’s the cloudnative-pg operator which drives the behavior of the clusters when this occurs.

The following chapters will described what’s expected per environment.

staging#

As mentioned in the document about postgresql instance, the staging kubernetes cluster has postgresql clusters configured with one primary and one standby.

When a node is drained, any primary from a database cluster, when present on that node, will be evicted. Ahead of this drain, the postgresql operator will switchover the standby(s) (present in another kubernetes cluster node) as primary [1]. This ensures no downtime as the database remains available. Any standby(s) are also evicted from the node. This will not create any downtime either since the primary will remain untouched (in that case, the primary runs elsewhere).

[1] Providing there is no issue in that cluster (inconsistent replication, …)

staging-next-version#

As there is no high availability in this kubernetes environment, there will be downtime when upgrading to a new version. The pod with the only database instance will be stopped for the upgrade part.

If we want to absolutely avoid the downtime, the simpler way is to first stop the next-version environment. And then proceed with the usual upgrade. That way, no services will be using the impacted databases during the upgrade.

production#

Nothing yet.