As FreePBX has continued to evolve and become increasingly complex, it is now clear that our previous HA strategy will be unable to keep up in the future without running into resource scaling issues...cluster tasks should not rival or compete with normal PBX services for resources! We also recognize that the old HA Cluster solution was not-at-all cost effective for smaller deployments, leaving many customers without the benefits of a quick recovery and less protection from unnecessary/unexpected downtime, potentially costing you valuable business.
Thankfully, the latest versions of FreePBX support a very simple and competent Warm Spare system that can provide much of the same protection from extended downtime as our previous HA Cluster solution, while also being much cheaper - with no EFS or RDS requirements and a lower resource overhead.
At the present time, the only way to achieve an automated fail over in the event of a random service failure is with Sangoma's Advanced Recovery module license. While the Advanced Recovery module will work with AWS FreePBX and we fully support its use, the $800/year ($400/server/year) licensing cost puts it out of budgetary reach for many small customers who will already be paying for two AWS FreePBX instances. So, we will are working on our own fail over automation solution specifically tailored to the AWS environment , as a complimentary add-on, much like we did with the legacy HA solution. In the meantime, manually switching to the Backup instance is as simple as swapping the Elastic IP and, in some cases, enabling Trunks.
This guide will show you how to setup Warm Spare High Availability with AWS FreePBX.
◼️ A normal Instruction/Bullet Point
📌 Important Advice
🔷 Production Instance Step
🔶 Backup Instance Step
Table of Contents:
◼️ 2 AWS FreePBX Instances; a Production Instance and a Backup Instance
🔷 The Production Instance can be a system already in use independently or disconnected from a legacy HA Cluster and operating as a standalone instance
📌 The Backup Instance should be launched cleanly from the same major AMI version as the Production Instance
❗️Both Instances should be in the same Region so they can share the Elastic IP, but should be in different Availability Zones to ensure a single AWS AZ failure doesn't take down both Instances. ❗️Using a multi-Region schema requires a much more advanced setup that won't be covered in this guide
❗️Both Instances require independent Activation/Deployment IDs and identical sets of Commercial Modules Licenses from Sangoma - it is not possible to share a single Deployment ID between the Instances
◼️ 1 Elastic IP Address
📌 This should be Attached to the Production Instance when you begin these steps
❗️Using a Private IP or VPN schema requires a much more advanced setup that won't be covered in this guide
◼️ 1 EC2 Security Group to be used on both Instances
📌 Needs to include access to TCP ports 22 and 80 granted to either the VPC Private Subnet CIDR (172.x.x.x) -OR- the Security Group ID (self-referencing) -OR- the Production and Backup Private IPs (two SG entries), allowing the Production Instance to connect to the Backup Instance to copy files and execute restore operations
Throughout the steps below, we will be switching back and forth between the Production and Backup Instances in order to complete steps via both the GUI and SSH. To save time, have the GUIs of both Production and Backup Instances loaded/logged into your browser and establish a single SSH connection to the Backup Instance. Remember that a newly launched instance will have a GUI 'admin' password set to the Instance ID (I-xxxxxxxxxxxxxxx).
🔷 Step 1 - Obtain Primary Public Key from 🔷Production🔷 Instance
🔷 Navigate the Production GUI to Admin > Backup and Restore
🔷 Access the Global Settings tab and copy the entire ssh-rsa Public Key field
🔶 Step 2 - Install Primary Public Key on 🔶Backup🔶 Instance via SSH
🔶 Connect to the Backup Instance via SSH and edit with nano /root/.ssh/authorized_keys
🔶 On a new empty line, paste the entire ssh-rsa key you coped in Step 1 and ensure an empty new line remains at the very end of the file, then save and close the file with Ctrl+o (letter O); Enter ; Ctrl+x
🔷 Step 3 - Create a new SSH FileStore on 🔷Production🔷 Instance
🔷 Navigate the Production GUI to Settings > FileStore, switch to the SSH tab, and click Add SSH Server
🔷 Populate the fields to create a new SSH FileStore connection to the Backup Instance and Submit
🔷 Step 3a (Optional) - Create a second S3 FileStore on 🔷Production🔷
If you wish to keep an additional copy of your most recent backup file in an S3 Bucket, setup a second FileStore connecting to S3. This is strongly advised, as it can help you can restore your config and data even if both Production and Backup Instances are somehow lost or damaged.
◼️ If necessary, create a set of IAM credentials with Access Key ID and Secret Access Key. YOU MUST BE CERTAIN TO SAFEGUARD THIS INFORMATION AS IT GRANTS FULL ACCESS TO YOUR AWS S3 BUCKETS AND WOULD BE VERY DANGEROUS IN THE WRONG HANDS!!!
◼️ If necessary, create a new S3 Bucket in the same region as your Production Instance to house your backup files
🔷 Navigate the Production GUI to Settings > FileStore, switch to the S3 tab, and click Add S3 Bucket
🔷 Populate the fields to create a new S3 FileStore connection to your desired Bucket and Submit
🔶 Step 4 - Create a new API Connection on 🔶Backup🔶 Instance
🔶 Navigate the Backup GUI to Connectivity > API, then click Add Application > Machine-to-Machine App
🔶 Fill in Your App Name and Description, supply gql:backup in the Allowed Scopes field, then click Add Application
◼️ You must now make note of the Token URL, GraphQL URL, Client ID, and Client Secret for use in the next step. YOU MUST BE CERTAIN TO SAFEGUARD THIS INFORMATION AS IT GRANTS READ/WRITE ACCESS TO YOUR FREEPBX BACKUPS AND WOULD BE VERY DANGEROUS IN THE WRONG HANDS!!!
🔷 Step 5 - Create a new Backup Job on 🔷Production🔷 Instance
🔷 Navigate the Production GUI to Admin > Backup and Restore, then click Add Backup
❗️CAVEATS TO BE AWARE OF❗️
🔷 Populate the fields and set the parameters of your Warm Spare backup job
🔶🔷 Step 6 - Add 🔶Backup🔶 Local Network to 🔷Production🔷 Instance
🔶 This step is necessary so, when a fail-over is performed, calls are routed over NAT correctly. First, navigate the Backup Instance GUI to Settings > Asterisk SIP Settings. NOTE: You can also source this from your VPC Subnets page on the AWS Console based on the Availability Zone the Backup Instance is located in, but this is less confusing to the average user!
🔷 Navigate the Production Instance GUI to Settings > Asterisk SIP Settings and paste the CIDR you copied from the Backup Instance or your VPC Subnet information. If an extra blank field is not showing, click the Add Local Network Field button. You should now have two sets of Local Networks showing, one for each Instance. NOTE: If you have both Instances in a single Availability Zone, they will have the same Local Network CIDR and you do not need to duplicate this entry in the list!
Your Warm Spare Setup is now complete. Data will be cloned from the Production Instance to the Backup Instance once per hour to keep it up-to-date.
This concludes the general Setup Guide.
The following sections cover additional information about Warm Spare High Availability Strategies, such as how to Fail Over and Upgrade your Instances.
...Fail Over to the 🔶Backup🔶 Instance
◼️ If your Production Instance fails for any reason or you need to keep call services active while you upgrade the Production Instance (24-hour offices, etc), switching to the Backup Instance is as simple as moving your Elastic IP to the Backup Instance via the EC2 Console or AWS APIs.
❗️If your Trunking setup necessitates disabling trunks on the Backup Instance, you'll also need to enable them once you've moved the Elastic IP to the Backup Instance
📌 Switching back to the Production Instance is completed in reverse: Disable Trunks on Backup (if necessary), then swap Elastic IP to Production Instance
...Upgrade your Instances
📌 If you need keep call services active during the upgrade, you may fail over to the 🔶Backup🔶 Instance before proceeding
❗️Create a Machine Image backup of the 🔷Production🔷 Instance via the EC2 Console before performing any upgrades
🔷 Temporarily Disable the Schedule of the Warm Spare Backup Job on the Production Instance
🔷 Run SmartUpgrade on the Production Instance
📌 If you failed over to the 🔶Backup🔶 Instance to start with, you should switch back to 🔷Production🔷 at this time
🔶 Run SmartUpgrade on the Backup Instance