Frontends procedures#

Intended audience

sysadm staff members

Pacemaker maintenance mode#

In maintenance mode, pacemaker will not attempt to manage the service or switch the ips from one node to another.

  • Force the maintenance mode

crm_attribute --name maintenance-mode --update true
  • Go back to the nominal mode

crm_attribute --name maintenance-mode --delete
  • check the status

Nominal mode:

root@gloin001:~# crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-03-06 18:45:31 +01:00)
Cluster Summary:
   * Stack: corosync
   * Current DC: gloin001 (version 2.1.5-a3f44794f94) - MIXED-VERSION partition with quorum
   * Last updated: Wed Mar  6 18:45:31 2024
   * Last change:  Wed Mar  6 18:45:27 2024 by root via crm_attribute on gloin001
   * 2 nodes configured
   * 4 resource instances configured

Node List:
   * Online: [ gloin001 gloin002 ]

Full List of Resources:
   * r_vip_pub   (ocf:heartbeat:IPaddr2):         Started gloin001
   * r_vip_ha    (ocf:heartbeat:IPaddr2):         Started gloin001
   * Clone Set: ha_postgresql [r_postgresql] (promotable):
      * Promoted: [ gloin001 ]
      * Unpromoted: [ gloin002 ]

In maintenance:

root@gloin001:~# crm status
Status of pacemakerd: 'Pacemaker is running' (last updated 2024-03-06 18:43:58 +01:00)
Cluster Summary:
   * Stack: corosync
   * Current DC: gloin001 (version 2.1.5-a3f44794f94) - MIXED-VERSION partition with quorum
   * Last updated: Wed Mar  6 18:43:58 2024
   * Last change:  Wed Mar  6 18:41:47 2024 by root via crm_attribute on gloin001
   * 2 nodes configured
   * 4 resource instances configured

            *** Resource management is DISABLED ***
The cluster will not attempt to start, stop or recover services

Node List:
   * Online: [ gloin001 gloin002 ]

Full List of Resources:
   * r_vip_pub   (ocf:heartbeat:IPaddr2):         Started gloin001 (unmanaged)
   * r_vip_ha    (ocf:heartbeat:IPaddr2):         Started gloin001 (unmanaged)
   * Clone Set: ha_postgresql [r_postgresql] (promotable, unmanaged):
      * r_postgresql      (ocf:heartbeat:pgsqlms):         Unpromoted gloin002 (unmanaged)
      * r_postgresql      (ocf:heartbeat:pgsqlms):         Promoted gloin001 (unmanaged)

Clear the pacemaker error status of a resource#

For example:

crm_resource -r r_postgresql -H gloin002 -C

Restore a postgresql secondary from the primary#

crm --wait resource ban r_postgresql gloin002

Check the postgresql logs to check the status

If the postgresql doesn’t stop, it can be force with:

export VERSION=<version>
sudo -u postgres /usr/lib/postgresql/$VERSION/bin/pg_ctl -D /var/lib/postgresql/$VERSION/main stop
  • Delete or move the content of the postgresql data directory in /var/lib/postgresql/<version>/main

  • Launch the restoration from the master

sudo -u postgres pg_basebackup -h 10.25.1.1 -D /var/lib/postgresql/16/main/ -P -U replicator --wal-method=fetch

Postgresql should restart and recover its lag.

  • Check the pacemaker after the secondary is up to date