.. _cassandra_upgrade_cluster: How to upgrade a cassandra cluster ================================== .. admonition:: Intended audience :class: important sysadm staff members This page document the actions to `upgrade an online cassandra cluster `_. The overall plan is to upgrade each node of the cluster one at a time, in a rolling upgrade fashion. There are two ways to manage this upgrade procedure, either :ref:`manually ` or :ref:`automatically `. As our (static) cassandra clusters are managed through puppet. This implies we'll have some adaptations to do in the swh-site repository. Since our puppet manifest does not manage the restart of the service, it's ok to let puppet apply the changes in advance. Then identify the desired new version and retrieve its sha512 hash. https://archive.apache.org/dist/cassandra/4.0.15/apache-cassandra-4.0.15-bin.tar.gz https://archive.apache.org/dist/cassandra/4.0.15/apache-cassandra-4.0.15-bin.tar.gz.sha512 Read the changelog just in case some extra actions are required for the upgrade. In the swh-site repository, adapt the environment's common.yaml file with those values: .. code-block:: yaml $ echo $environment staging $ grep "cassandra::" .../swh-site/data/deployments/$environment/common.yaml cassandra::version: 4.0.15 cassandra::version_checksum: 9368639fe07613995fec2d50de13ba5b4a2d02e3da628daa1a3165aa009e356295d7f7aefde0dedaab385e9752755af8385679dd5f919902454df29114a3fcc0 Commit and push the changes. Connect to pergamon and deploy those changes. .. admonition:: Stop all repair jobs before upgrading :class: warning | All scheduled jobs must be paused and all running jobs must be stopped and aborted. | You can perform these actions from the web UI `reaper `_. | - `Reaper production `_ | - `Reaper staging `_ .. admonition:: Grafana tag :class: Note Set a Grafana tag to mark the start of the upgrade. .. _manual_cassandra_upgrade: Manual procedure ---------------- Then connect on each machine of the cluster in any order (lexicographic order is fine though). We'll need the nodetool access, so here is a simple alias to simplify the commands (used for the remaining part of the doc). .. code-block:: shell $ USER=$(awk '{print $1}' /etc/cassandra/jmxremote.password) $ PASS=$(awk '{print $2}' /etc/cassandra/jmxremote.password) $ alias nodetool="/opt/cassandra/bin/nodetool --username $USER --password $PASS" From another node in the cluster, connect and check the status of the cluster is fine during the migration. .. code-block:: shell $ period=10; while true; do \ date; nodetool status -r; echo; nodetool netstats; sleep $period; \ done Let's do a drain call first so the commitlog is flushed on disk sstables. It's recommended to do it before an upgrade to avoid any pending data in the commit log. .. code-block:: shell $ nodetool drain Lookup for the '- DRAINED' pattern in the service log to know it's done. .. code-block:: shell $ journalctl -e cassandra@instance1 | grep DRAINED Nov 27 14:09:06 cassandra01 cassandra[769383]: INFO [RMI TCP Connection(20949)-192.168.100.181] 2024-11-27 14:09:06,084 StorageService.java:1635 - DRAINED We stop the cassandra service. .. code-block:: shell $ systemctl stop cassandra@instance1 In the output of the ``nodetool status``, the node whose service is stopped should be marked as DN (Down and Normal): $ nodetool -h cassandra02 status -r | grep DN DN cassandra01.internal.softwareheritage.org 8.63 TiB 16 22.7% cb0695ee-b7f1-4b31-ba5e-9ed7a068d993 rack1 Finally we upgrade cassandra version in the node (through puppet): .. code-block:: shell $ puppet agent --enable && puppet agent --test Let's check the correct version is installed in /opt .. code-block:: shell $ ls -lah /opt/ | grep cassandra-$version lrwxrwxrwx 1 root root 21 Nov 27 14:13 cassandra -> /opt/cassandra-$version drwxr-xr-x 8 root root 4.0K Nov 27 14:13 cassandra-$version Now start back the cassandra service. .. code-block:: shell $ systemctl start cassandra@instance1 Once the service is started again, the ``nodetool status`` should display an `UN` (Up and Normal) status again for the node upgraded. $ nodetool status -r ... UN cassandra01.internal.softwareheritage.org 8.63 TiB 16 22.7% cb0695ee-b7f1-4b31-ba5e-9ed7a068d993 rack1 .. _automatic_cassandra_upgrade: Automatic procedure ------------------- It's the same procedure as previously described but only one call to a script in pergamon is required. With environment in {staging, production}: .. code-block:: shell root@pergamon:~# /usr/local/bin/cassandra-restart-cluster.sh $environment Note that you can also use the previously described checks procedure from a cluster node to follow through the upgrade. .. _cassandra_upgrade_checks: Final Checks ------------ Finally, check the version is the expected one. .. code-block:: shell $ nodetool version ReleaseVersion: $version $ nodetool describecluster Cluster Information: Name: archive_staging Snitch: org.apache.cassandra.locator.GossipingPropertyFileSnitch DynamicEndPointSnitch: enabled Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 583470c4-6dae-372d-bdab-f0bcbd679c74: [192.168.130.181, 192.168.130.182, 192.168.130.183] Stats for all nodes: Live: 3 Joining: 0 Moving: 0 Leaving: 0 Unreachable: 0 Data Centers: sesi_rocquencourt_staging #Nodes: 3 #Down: 0 Database versions: 5.0.2: [192.168.130.181:7000, 192.168.130.182:7000, 192.168.130.183:7000] Keyspaces: swh -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3} system_distributed -> Replication class: NetworkTopologyStrategy {replication_factor=3} provenance_test -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3} reaper_db -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3} system_traces -> Replication class: SimpleStrategy {replication_factor=2} system_auth -> Replication class: NetworkTopologyStrategy {sesi_rocquencourt_staging=3} system_schema -> Replication class: LocalStrategy {} system -> Replication class: LocalStrategy {} .. admonition:: Upgrading to a major version :class: warning | When updating to a major version, you need to run ``nodetool upgradesstables``. | You can perform this command manually on each node or use a script from `pergamon`. | With environment in {staging, production}: .. code-block:: shell root@pergamon:~# /usr/local/bin/cassandra-upgradesstables.sh $environment