Cluster Management Guide
A simple yet robust Cluster Management experience awaits!
There are two main utilities for working with your Cluster. The first one is a simple diagnostic utility that shows the current LIVE status of the cluster in real-time. The second utility is the main Cluster Management application that allows you to control, modify, and even replace permanently crashed cluster nodes. There is also a GUI Module to provide cluster status and some management functions. We've added some addendums for HA Clusters for things like EFS migration and Unattended Node Setup, which you can find in our wiki.
The Cluster Diagnostic Utility is very simple. When ran, it displays the current live cluster status and constantly refreshes until you pause by pressing P or exit with CTRL+C. This command takes no parameters when ran. While connected to either Cluster node or a configured Monitor Node (see below) via SSH, run the following:
cluster-mgr [<Command-Name> | <FieldName> [<Value> | null]]
The Cluster Manager is your one-stop shop for maintaining your cluster. Whether you are merely looking to change the destination email for cluster alerts or you need to replace a permanently crashed node, this is the only command you need. Simply running this command without parameters supplied or with option --help will show you the built-in help information for quick reference:
cluster-mgr <- OR -> cluster-mgr --help
cluster-mgr <FieldName> [<Value> | null]
This allows you to change the Cluster DB fields. All FieldName and Value parameters are CaSeSeNsItIVe! You should be very careful when making changes, as only the most basic of error checking will be performed on your inputs here. The most common tasks that would change Cluster fields in the DB will be handled automatically by other commands below. This will be primarily useful for changing things like passwords, access keys, and email addresses for alerts. Some Cluster fields can be cleared by supplying a Value parameter of null to cluster-mgr. In this example, we are changing the ClusterName to "Home-Office":
cluster-mgr ClusterName Home-Office
Some Fields will ask for more information or alter additional information automatically when you change or null their Value. The MailHost value is a prime example of this. When you null the MailHost value, it will wipe the values for MailUser, MailPass, MailFrom, and MailTo at the same time. If the MailHost is currently blank and you attempt to add a new Value, it will prompt you to add the other Mail Values to complete the Alert Email setup and will also send a test email when finished to confirm everything is working properly:
cluster-mgr MailHost null
Now we will add the values back to the Cluster (again, using GMail/GSuite as our example). It will ask for the rest of the Mail Values automatically and send a test email:
cluster-mgr MailHost smtp.gmail.com
NOTE: Do not manually change the following fields: PrimaryStatus, BackupStatus, PrimaryTime, BackupTime, PrimaryPublicIP, BackupPublicIP, My Role.
This command will clear the cluster-agent and cluster-sync-agent logs ONLY on the local Node it is ran on. So, to clear the logs on both Nodes, you will need to connect to each manually. Because these logs are included in the normal system log-rotate schedule, there should be no need to run this unless you are troubleshooting issues and need a clean log to start. The command simply asks for confirmation and then tells you when the logs are cleared:
This command will clear the cluster-sync-agent unison caches ONLY on the local Node it is ran on. So, to clear the caches on both Nodes, you will need to connect to each manually. If you find that files are not syncing between nodes (no files being listed in the cluster-sync-log), this command will attempt to fix the issue (corrupt cache data). You do NOT need to restart/interrupt any services or change the cluster status from Normal to run this command and it will NOT interfere with normal production services. The cluster-sync-agent will simply force a full check of all files during the next cycles and rebuild the cache properly. This is usually enough to get files syncing to/from EFS again:
cluster-mgr force-ip-[primary | backup]
This command will force the Elastic IP to be assigned to either the Primary or Backup node. The cluster will also be placed in Maintenance mode to ensure that neither node attempts to change the Elastic IP after manual assignment.
cluster-mgr force-ip-primary <-OR-> cluster-mgr force-ip-backup
This command will issue a "safe" restart of all FreePBX services on the local node by briefly placing the cluster into Maintenance mode to prevent a failure condition being declared and acted upon. If you just need to restart FreePBX services on the actively Running node (while performing after-hours maintenance, for example) and don't want the other server taking control of the Elastic IP, this will ensure the cluster doesn't react to the stopping of services:
This command will launch the mysql command line client and connect to the Cluster RDS Server automatically for you. Use this command if you need to manually access the asterisk, asteriskcdrdb, or your own custom central DBs for the Cluster:
cluster-mgr [reboot-this-node | reboot]
This command will reboot the local node in a calculated manner such as to minimize service downtime. If the local node you are trying to reboot is actively Running, services will be moved to the other node. When you run this command, it will ask you for confirmation after explaining the steps it plans to take:
cluster-mgr reboot-this-node <-OR-> cluster-mgr reboot
cluster-mgr [shutdown-this-node | halt]
This command will halt the local node in a calculated manner such as to minimize service downtime. If the local node you are trying to halt is actively Running, services will be moved to the other node. When you run this command, it will ask you for confirmation after explaining the steps it plans to take:
cluster-mgr shutdown-this-node <-OR-> cluster-mgr halt
This is a trio of commands to set the desired Cluster Status: set-Normal, set-Maintenance, set-FailOver. Maintenance mode is used to prevent the Cluster nodes from taking actions based on service monitors so you can perform work on the cluster without interruption. FailOver mode forces all services to switch to the Backup node. Simply run the command corresponding to the status you want to set and the Cluster will confirm by showing the new status:
cluster-mgr set-Normal <-OR-> cluster-mgr set-Maintenance <-OR-> cluster-mgr set-FailOver
This command simply shows you the current Cluster Status details and then exits. If you run the command as just show-status, it will not show sensitive information like secret keys or passwords. If you run the command as show-status-pw, this sensitive information will be shown.
cluster-mgr show-status <-OR-> cluster-mgr show-status-pw
This command does exactly as the name suggests; it sends a test email using the current Cluster Mail values and then exits. A confirmation will be shown. If you don't have the Cluster Mail values set or run this from a Monitor node, you will receive an error message.
This command will upgrade all components of the Cluster software on the local node without disrupting calling services by placing the Cluster in Maintenance for the duration of the update process. This command must be run on each node individually, in succession, and NOT at the same time:
This command will show a split-screen view of both the cluster-agent and cluster-sync-agent live logs on the local node:
This command will tail only the cluster-sync-agent log on the local node in real-time:
WARNING: THIS IS A DESTRUCTIVE ACT!!!
This command can only be ran locally from the Backup node. When ran, it will FORCE the current Primary node to detach from the cluster and revert to a standalone instance. Then the Backup node will promote itself to the Primary role. This will allow you to remove a bad Primary node and then add a fresh Backup node to the cluster. When you run the command, you will be presented with these warnings, information, and then asked to confirm your request:
WARNING: THIS IS A DESTRUCTIVE ACT!!!
This command will perform a controlled detachment of the local node from the cluster. If ran on the Primary node, it will either promote the Backup node (requires a reboot) to the Primary role or COMPLETELY DISSOLVE THE CLUSTER! On either node, the command will first clone all cluster data to the local instance before detaching it. This will allow the detached node to act as a standalone server.
If your intention is to fully dissolve the cluster and return to using just a single standalone instance for your production environment, perform the promote-backup-node command on the Backup node FIRST to force detach the current Primary node (this saves time, as no data is cloned on a forced detach). You can then terminate the OLD Primary node (it will NOT have your data on it). Then run detach-this-node on the NEW Primary node (OLD Backup node) to dissolve the cluster and result in a single standalone production server with all of your data.
Watch and manage your Cluster from afar!
You can install a copy of the cluster-diag and cluster-mgr utilities on any CentOS-based linux system on or off of AWS. This will allow you to view Cluster status and execute select management commands from a system other than one of the Cluster nodes. WARNING: In order to monitor the Cluster from a different AWS Region or from outside of AWS, you will need to open up an additional port (3306) in your Cluster Services Security Group to access RDS. This will expose your Cluster to a greater external security risk if you are not using a strong RDS password. If the Monitor node is in the SAME AWS Region as the Cluster, then you simply have to add the Cluster Services SG as an additional SG attached to that instance.
You will need to have the ClusterRDSendpoint, ClusterRDSusername, and ClusterRDSpassword values available and specify the Monitor role at the beginning of the Wizard. Use the following 3 commands to download and run the cluster-install-wizard:
sudo chmod +x cluster-install-wizard.sh
Once the utilities are installed, you can run cluster-diag and SOME cluster-mgr commands. These commands include: force-ip-[primary|backup], mysql-rds, set-<Status>, show-status[-pw], and upgrade-cluster (only upgrades the utilities on the Monitor node).
Watch and manage your Cluster from the browser!
You can perform most of the basic monitoring and management of your Cluster right from the FreePBX Admin GUI. Simply navigate the GUI to AWSFPBX HA Cluster in the Settings menu. It is important to note that this is meant for very "light duty" management and monitoring. All major management tasks should be executed from SSH CLI to ensure you don't run into problems.
You should avoid running commands like upgrade-cluster from here. Some commands like mysql-rds, fwconsole-restart, shutdowns, reboots, detachments, and promotions simply won't work because they require an interactive shell. However, it is very handy for set-Normal, set-Maintenance, set-FailOver, force-ip-primary, force-ip-backup, test-email, and changing fields like MailFrom, MailTo, MailUser, MailPass, and MailHost.