High Availability Fail-Over Support for Instant Recovery

Updated: 2019-09-25

INTRODUCTION:

Now you can have TRUE HA clustering of FreePBX on AWS! Inspired by Sangoma's Local HA Cluster support for FreePBX, we have developed our very own HA Cluster solution custom designed from the ground up for AWS. This has been over a year of constant development and rigorous testing. Our solution provides the same level of robust fail-over support for all services as you will find in the Local FreePBX HA solution and leverages AWS services in place of the stringent hardware requirements you would find in a local HA setup. We use the Elastic File Service (EFS) - the AWS NFS solution - in place of DRBD and a Relational Database Service (RDS) Aurora MySQL instance in place of a physical shared SQL datastore, which also provides for immersive live cluster monitoring and management even from outside the cluster itself! By using EFS and RDS, you have the same near-infinite ability to scale up the size of your cluster to meet your organization's needs now and in the future! 

The best part of our HA Cluster solution is the price: FREE! That's right; we do not charge for our HA Cluster solution. You only pay the normal instance charges that will be associated with the second Backup node, RDS, EFS instances AND, per Sangoma licensing policies, a duplicate set of your paid Commercial module licenses from Sangoma for the Backup server node Deployment ID. With Amazon's Reserved Instances (https://aws.amazon.com/ec2/pricing/reserved-instances/) and our Annual Subscriptions (https://aws.amazon.com/marketplace/library), you can save up to 56% on your instance charges for your cluster by paying for them annually instead of hourly; a savings that can entirely pay for the Backup node instance!

Now, before we get started, there are several VERY IMPORTANT things you must know:

  • While you CAN add clustering support to any existing production instance, you MUST UPGRADE YOUR INSTANCE TO AMI v3.x IF YOU ARE STILL RUNNING AMI v2.x! This is accomplished via our Instance Migration Wizard or an In-Place Upgrade. More information about the transition to v3.x can be found here: https://www.thewebmachine.net/aws-freepbx-v3

  • If you will be converting an existing single-server production instance to a cluster, it is important to note that you will likely have to increase the instance size slightly over what you are currently using in order to accommodate the additional overhead involved in the automated cluster management and synchronization. For example, if you are running a t2.large, you may need to grow to a t2.xlarge instance size if you notice performance degradation. This generally isn't an issue in larger instance types with an abundance of vCPUs, as that is the resource most taxed by cluster management. 
     

  • The AWS Elastic File Service (EFS) and, therefore, CLUSTER SUPPORT IS ONLY AVAILABLE IN THE FOLLOWING REGIONS AT THIS TIME: us-east-1 (N.Virginia), us-east-2 (Ohio), us-west-1 (N.California), us-west-2 (Oregon), ca-central-1 (Canada), eu-west-1 (Ireland), eu-west-2 (London), eu-west-3 (Paris), eu-central-1 (Frankfurt), ap-southeast-1 (Singapore), ap-southeast-2 (Sydney), ap-northeast-1 (Tokyo), ap-northeast-2 (Seoul), ap-south-1 (Mumbai). There is no way to work around this limitation, as EFS is not accessible to other regions outside of its local region for security reasons per AWS policy and inherent security limitations of the underlying NFS. AWS is expanding EFS support to all regions systematically. If your region is not currently supported, it likely will be soon. We will update this page as new regions are added.

  • If you are trying to convert an existing production server into a cluster and this existing production server instance is NOT located in one of the above regions, you will need to move your instance to a supported region. You can find well written instructions for accomplishing this here: https://www.cloudberrylab.com/blog/how-to-move-amazon-ec2-to-different-availability-zone-vpc-region/
     

  • Plan your time appropriately!!! This process WILL take 1-5 hours to fully complete, mostly depending on whether this a new clean cluster or an existing Production conversion, AND ALL SERVICES ON AN EXISTING PRODUCTION SERVER WILL BE DISRUPTED FOR THE ENTIRE FIRST HALF OF THE INSTALLATION PROCESS. 

  • Given that this process will take so long, we strongly recommend that you launch into a tmux session before starting the cluster-install-wizard to protect you from accidental disconnects, which would interrupt the wizard and can cause problems. You can view our primer on tmux here:  https://twm.tips/tmux

REQUISITE COMPONENTS:

(​We will walk you through each of these components below. This is just a summary for quick reference.)

To help you keep track of all of the cluster information as we create each component, here is a handy worksheet. Simply copy this into a text editor and fill in each field as you complete this guide. Then, when you are ready to begin the Cluster Install Wizard down below, you can copy/paste the information more easily from just one location. The fields below are in the same order as you'll supply them to the Cluster Install Wizard:

ClusterRDSendpoint= 
ClusterRDSusername= 
ClusterRDSpassword= 
ClusterEFSendpoint= 
ElasticIP= 
ElasticAllocationID= 
AWSaccessKeyID= 
AWSsecretAccessKey= 
AWSregion= 
MailHost= 
MailUser= 
MailPass= 
MailFrom= 
MailTo= 

NOTE: If you wish to receive email alerts from the Cluster server nodes (strongly recommended!!), you will need to know the SMTP server Hostname/IP, username, and password for an appropriate account on your company's email server/service. If you use Google's GMail or GSuite for Business/Non-Profit, the MailHost= smtp.gmail.com and the MailPass= MUST be a Google App Password IF you use multi-factor authentication with the sending email account you intend to use here. MailFrom= usually must be the same the MailUser= unless there are multiple aliases assigned to the account. MailTo= can be any normal email recipient address and can even include multiple addresses separated by a comma (,) but do NOT include any spaces between the addresses. If you do NOT wish to receive email alerts or you wish to skip this for now (you can add this at any time later), simply leave the MailHost= parameter blank during the Cluster Install Wizard and the rest of the Mail parameters will be skipped.

PREPARE YOUR REQUISITES:


GENERAL: AWS Access Key and Secret Key
First, we will create a set of IAM user credentials for AWS so that your cluster server nodes can manage themselves and the ElasticIP assignment automatically. You'll start by going to the following AWS page: https://console.aws.amazon.com/iam/home?#/users

On this page, you will click Add User. Enter a Username (this is for your reference only) and choose the Programmatic Access option. Then select the Attach Existing Policies Directly tab and search for the AmazonEC2FullAccess permission. Once you click the Create User button on the last page, you MUST save the Access Key ID and Secret Access Key (click the 'show' link) for use during the cluster setup. You can also download this information in csv format for your records. EITHER WAY, YOU MUST BE CERTAIN TO SAFEGUARD THIS INFORMATION AS IT GRANTS FULL ACCESS TO YOUR AWS EC2 CONSOLE AND WOULD BE VERY DANGEROUS IN THE WRONG HANDS!!! If this information does become compromised in the future, you can return to this page, delete the user, create a new one, and reprogram your cluster with the new keys. Copy/paste this information into your Worksheet for the AWSaccessKeyID= and AWSsecretAccessKey= parameters.

NOTE: If you are creating multiple clusters across different AWS regions (or even within the same region), you only need to create one set of IAM Keys. This one account can be used by all of your clusters across all of your regions. However, if you ever revoke this key pair (as you are generally recommended to rotate keys on a schedule for optimal security), you will have to update all of your clusters simultaneously. If you use separate key pairs per cluster or per region, this minimizes the number of clusters that need changed simultaneously when you rotate keys.


The remaining requisite components are AWS Region specific!
From this point forward, you must commit to using one of the following AWS Regions for your cluster and all components must be housed within this single region. Make note of this region ID on your Worksheet for the AWSregion= parameter:

  • us-east-1 == N.Virginia

  • us-east-2 == Ohio

  • us-west-1 == N.California

  • us-west-2 == Oregon

  • ca-central-1 == Canada

  • eu-west-1 == Ireland

  • eu-west-2 == London

  • eu-west-3 == Paris

  • eu-central-1 == Frankfurt

  • ap-northeast-1 == Tokyo

  • ap-northeast-2 == Seoul

  • ap-southeast-1 == Singapore

  • ap-southeast-2 == Sydney

  • ap-south-1 == Mumbai

 

NOTE: GovCloud support is coming soon to US-Gov-West!


Again, if you are trying to convert an existing production server into a cluster and this existing production server instance is not located in one of the above regions, you will need to move your instance to a supported region. You can find well written instructions for accomplishing this here:
https://www.cloudberrylab.com/blog/how-to-move-amazon-ec2-to-different-availability-zone-vpc-region/

If you are adding cluster support to an existing production server, plan your time appropriately!!! This process WILL take 2-5 hours to fully complete depending on how much spool data is currently stored AND ALL SERVICES ON YOUR EXISTING PRIMARY SERVER WILL BE DISRUPTED FOR THE ENTIRE FIRST HALF OF THE INSTALLATION PROCESS!

 

 

Elastic IP Allocation

Once you have decided on your cluster region, we need information regarding an Elastic IP you wish you use with your cluster. If you are converting an existing Production instance, you should already have an Elastic IP assigned to it. If this is a brand new setup, you will need to create an Elastic IP for cluster use. Go to your EC2 Console and choose Elastic IPs under Network & Security or click here: https://console.aws.amazon.com/ec2/v2/home?#Addresses

Adding an Elastic IP is very simple. Click the Allocate New Address button, then click Allocate on the next page. That's it! You will be told if the allocation is successful. This can fail if you already have 5 Elastic IPs allocated in this region. You will need to contact AWS at the link they provide if you are denied for reaching this limit. AWS will approve most requests for more Elastic IPs, but can deny requests in regions with limited availability or if abuse of allocation (hoarding) is suspected. HOWEVER, IPv4 ADDRESS AVAILABILITY IS LIMITED INTERNET-WIDE AND YOU SHOULD REQUEST ONLY AS MANY IPv4 ADDRESSES AS YOU ABSOLUTELY NEED! (Learn more about IPv4 address scarcity and the struggle to move the internet to IPv6 here: https://en.wikipedia.org/wiki/IPv4_address_exhaustion)

With an Elastic IP allocated for your cluster, you'll want to make note of the following information from the main Elastic IPs page: Elastic IP and Allocation ID. Copy/paste this information into your Worksheet (ElasticIP= and ElasticAllocationID=) for easy reference during install.

Cluster Services Security Group

The Cluster Services Security Group (SG) is used to simultaneously secure and interconnect all components of your cluster so they can freely communicate with one another. This SG is very basic, as it points only to itself. However, in order to create a self-sourced entry, the SG must first exist. Then, you will edit it to add the self-sourced entry. Go to the Security Groups page under Network & Security or click here: https://console.aws.amazon.com/ec2/v2/home?#SecurityGroups

First, click Create Security Group, enter the Name "Cluster Services" and a desired Description and then click Create. Once created, you'll want to Edit Inbound Rules and add a single entry with All Traffic set as the Type. In the Source field Custom option, type "sg" and your existing Security Groups will be listed. Choose the Cluster Services SG and it will fill in the rest of the ID for you. Save the rule.

You'll use this Cluster SG when we create the RDS, EFS and EC2 instances below.

Relational Database System (RDS) Aurora MySQL Instance

The RDS Instance will store all Asterisk and FreePBX DBs, including your CDR/CEL, as well as the main Cluster Management DB. This allows both members of the cluster to share configuration information in real-time. Setting up a new RDS instance is also rather easy. You only have to concern yourself with the parameters and options we outline below. You should leave ALL unmentioned fields at their defaults for optimal results. Start by accessing the RDS management section of the EC2 Console by clicking here: https://console.aws.amazon.com/rds/home?#dbinstances

Click Launch DB Instance:

  1. Select Engine: Leave the default Aurora and MySQL 5.6 options

  2. Specify DB Details:

    • -Instance Specifications-
      • DB Instance Class: If your EC2 Instances will be of a Medium or Large size of any type (t2, m4, etc), you may use one of the db.t2 Classes. If you are using a larger instance size on EC2, use a db.r3 or db.r4 Class​. This does NOT have to match your EC2 Instance size (Large, XLarge, etc) but may have to be LARGER than the EC2 Instance size, especially if you have a high call volume , use queues, or engage in heavy amounts of call recording.

      • DB Instance Identifier: Give this a name that makes sense.

    • -Settings-

      • Master Username: This can be whatever you want and should not be something as easy to guess as "admin". You will want to make a note of this on your Worksheet under ClusterRDSusername=

      • Master Password: This should be a secure password with more than 8 characters, so long as it doesn't include the following special characters: [ ] { } \ | / ; " ' & # or @. You will want to make a note of this on your Worksheet under ClusterRDSpassword=

  3. Configure Advanced Settings:​ 

    • -Network & Security-
      • Public Accessibility: YES - This ensures full proper IP access for your cluster. It is protected by the Cluster SG and will NOT be visible on the internet.​

      • VPC Security Groups: You will change this to Choose exiting..., REMOVE the default entry listed, then add the Cluster Services SG you created earlier.

    • -Maintenance-

      • Auto Minor Version Upgrade: Change this to Disable...

      • Maintenance Window: Change this to Select Window and specify a day of the week and time of the day to perform minor maintenance operations on your RDS instance.​

  4. Click​ the Launch DB Instance button: It can take up to 5-10 minutes for the RDS instance to become fully ready. You can monitor the progress and obtain the final piece of RDS information for your Worksheet by clicking the View DB Instance Details button on the confirmation page. Once RDS is live, the Endpoint in the -Connect- section will display the endpoint address of the server. Copy this to your Worksheet for the ClusterRDSendpoint= parameter.

Elastic File Service (EFS)

The Elastic File Service will house a copy of all of the synchronized files between the cluster nodes. In the Cluster Management Guide, you can learn how to add virtually any custom directory on your server nodes to be synchronized, ensuring that even your custom applications can be protected by fail-over to the Backup node. To get started, navigate to the EFS service on the EC2 Console or click here: https://console.aws.amazon.com/efs/home?/filesystems

Click the Create File System button, DELETE all of the Default Security Group entries for EVERY Availability Zone, then add your Cluster Services SG to EVERY Zone. On Step 2, simply provide a Name for the EFS instance. Finally, review and click Create File System on Step 3. 

After the File System is created, you will be shown a confirmation screen that displays the EFS Endpoint address (labeled "DNS Name"). You will want to copy this to your Worksheet for the ClusterEFSendpoint= parameter.

EC2 AWS FreePBX Server Instances

UPDATE 2018-11-11: We have released an Unattended node setup method. This intended as an alternative to the Wizard method below, but can only be used on new instances being freshly launched. So, if you are converting an existing Production instance, you will need the Wizard method below to convert the Production server to the Primary node before using the Unattended method to create a fresh Backup node.

The final requisite component we must prepare are the actual AWS FreePBX server nodes. There are two scenarios here:

  1. If you already have a Production instance (and it is running v3.0 AND in an EFS-enabled region), then you only need to launch one new instance per the instructions below. You will set all Instance launch parameters to be identical to what you have set for your current Production instance with ONE EXCEPTION

    • You will want to select a DIFFERENT Availability Zone for the new Backup node than the current instance. You can identify which AZ your current instance is in via the EC2 Console. When you launch a new instance, the AZ is selected by choosing a specific Subnet during Step 3 of the Launch Instance Wizard. We will point this out again below, when appropriate.

  2. If you are creating a brand new Production environment from the start, you will follow these steps TWICE while choosing a different Subnet for each one. DO NOT CHANGE Number of Instances to more than 1 in Step 3 as a shortcut; this will create both in the same AZ, which will defeat one of the primary purposes of the HA Cluster...recovery from a complete failure of an AWS datacenter location. Different Availability Zones = Different Physical Datacenters in the Same Local Region = Better Protection from Downtime!

STEP 1:

Click one of the buttons below based on which AWSRegion you are building your Cluster in. This will open a new window/tab directly into the Launch Instance Wizard:

STEP2:

Choose an instance size EITHER identical to your current Production server or appropriate for your new environment. You may review the EC2 Console Instances section for the size of your current Production server instance. If you are unsure of what size to choose for your new environment, refer to this page for suggestions: EC2 Deployment Guide

STEP 3:

You will only need to change two options on this page from their defaults and there is a third option to consider if you chose a t2 type in Step2:

  • Subnet: Again, you want to change this to a DIFFERENT subnet for each of the 2 instances you launch (or from the Primary server already in Production). Some AWS Regions may have more or less subnets than listed in the image below.

  • Enable Termination Protection: CHECKED! - This will ensure that you can't accidentally terminate your instances from the EC2 Console.

  • T2 Unlimited (OPTIONAL): This option is for t2 types only and allows an instance to burst well beyond its size limit for as long as it needs to (for an extra charge based on resource usage). You MAY want to enable this option if you are using a lower size and want to avoid random performance issues on occasional busy days. 

STEP 4:

You may specify a custom size ONLY for the EBS /dev/sdb Volume if you desire. DO NOT CHANGE THE SIZE OF THE ROOT /dev/sda1 VOLUME! The Volume sizes of BOTH server Instances must be IDENTICAL! If you are launching a small to medium cluster or plan to use features like our S3 Sync for Call Recordings, IMAP Storage for VoiceMail, and/or Auto File Deletion for Call Recordings, you can most likely leave all settings here at their defaults.

STEP 5:

You will want to add a new tag with a Key of "Name" and a Value with something to identify your cluster instances (ex "Main Office Cluster"). On each of the two instances, one should specify "Primary" and one should specify "Backup" in the Value. This will ensure you can identify each of the nodes individually on the EC2 Console.

STEP 6:

You will want to accept the default Security Group provided by us on this page UNLESS you have customized your own copy of this SG for your existing Production instance. In this case, change Assign a Security Group to "Select an Existing Security Group" and choose your existing Production SG. If this is a new environment, use our provided SG for the first instance you launch, then "Select an Existing Security Group" and choose the newly created SG when launching the second instance.

STEP 7:

Review and then Launch the new Instance. Repeat, if necessary, to create a second Instance.

STEP A:

Once both Instances are launched, we need to assign the Cluster Services Security Group to both servers as an additional SG. From the EC2 Console Instances section, select each Cluster Instance, choose Actions, Networking, Change Security Groups. In the dialog, leave the "AWS FreePBX" SG checked and also check the "Cluster Services" SG. Then click Assign Security Groups.

CLUSTER INSTALL WIZARD:

Now that we finally have all the requisite components ready, we can connect to the first Instance via SSH and setup the Cluster with SmartUpgrade. This first server Instance will be called the Primary Node from now on. If you have an existing Production Instance that we are converting to a Cluster, this existing server is your Primary Node and MUST be converted to a cluster node before connecting the new Instance to the cluster.

If you are unfamiliar with SmartUpgrade and connecting to your AWS FreePBX Instances via SSH, please see this forum page for more information: https://twm.tips/su

Once connected to the Primary Node via SSH, it is important to run SmartUpgrade through one normal run to ensure everything on the server is up to date:

 smartupgrade --auto 

We strongly advise that you utilize the tmux utility to ensure the Cluster Install Wizard is not interrupted if you get disconnected from the SSH console. Tmux is a great terminal multiplexer that will retain your running terminals even if you disconnect from SSH, allowing commands to continue to run. More information on tmux can be found here: https://twm.tips/tmux  In short, you can run this command to, both, start a new tmux session AND reconnect to an existing tmux session in the event you get disconnected. Replace "MySession" with a session name of your choice:

 tmux new -A -s MySession 

 

You'll know you are in tmux when you see a green bar displayed at the bottom of the SSH console:

With the server fully up-to-date and tmux running, we can officially begin the Cluster Install Wizard. Run this command to get started:

 smartupgrade cluster-install-wizard 

The Wizard will first ask you for the Role of this machine. Type Primary and press ENTER. Proceed to fill in the requested information by copying/pasting from the Worksheet you created with all of the necessary fields. (In PuTTY, a right-click in the terminal window will Paste.) At the end, it will ask you if all of the information has been entered correctly. If so, it will then begin setting up the various core utilities needed for the Cluster.

tmux2.jpg

After this, it will advise you that it is ready to begin migrating the initial copy of data into the EFS and RDS instances. THIS PROCESS WILL TAKE AT LEAST AN HOUR, but is fully automated. When it is finished, it will show the preliminary Cluster Status and return you to the prompt. You are now ready to repeat this process with the Backup Node. WARNING: DO NOT START THE WIZARD ON THE BACKUP NODE UNTIL THE PRIMARY NODE IS COMPLETELY FINISHED!!! You may, however, run the normal SmartUpgrade process to ensure the Backup Node is fully up-to-date.

Once you connect to the Backup node via SSH, you'll run the same commands to first fully update your instance and then start tmux and the Wizard:

 smartupgrade --auto 

 tmux new -A -s MySession 

 smartupgrade cluster-install-wizard 

This time, you will tell the Wizard that this is the Backup node and it will only ask you for the Cluster RDS information before confirming that the information is correct (it will obtain everything else it needs from RDS automatically). Once confirmed, it will install necessary core utilities and then offer to begin the integration with the Cluster. THIS PROCESS WILL TAKE AT LEAST AN HOUR, but is fully automated just as before. When it is finished, it will also show the current Cluster Status and then return you to the prompt.

THAT'S ALL FOLKS!

That's it! Once the Backup node has finished connecting to the cluster and syncing the initial data store, both servers will automatically switch to a "Normal" Cluster status with all calling services and the Elastic IP address running on the Primary Node. You should be able to connect to the FreePBX Web GUI, make configuration changes, connect endpoints and trunks, and everything else you would normally do with FreePBX.

 

If a failure occurs with the Primary Node, the cluster will automatically switch calling services and the Elastic IP to the Backup Node, which will immediately begin serving calls. You can expect that the switch to Backup will take at least 15 seconds after a failure on Primary. Please reference the Cluster Management Guide for suggestions if you find that fail-overs take longer than 30 seconds to complete (when the Elastic IP switches to the other node).

You MUST now familiarize yourself with the Cluster Management Guide, as this tells you everything you need to know to properly manage and maintain your Cluster. Read and understand this material NOW...so you aren't scrambling to do so in an emergency. No cluster solution is 100% guaranteed or without maintenance needs. Knowing what your cluster can and cannot recover from, as well as how to perform tasks like SmartUpgrade, is vitally important to the long term health of your Cluster. A cluster is not a replacement for proper maintenance of your FreePBX environment!

As always, our friendly and knowledgeable Support Personnel are here to help if you need us! Just click the Live Chat or Support Request link at the top or bottom of this page. We are also always looking for ways to improve. If you have any comments or suggestions on how we may improve or clarify this documentation, please let us know!

It is also important that you are subscribed to our Updates Mailing list, as any critical updates available for your HA Cluster will be communicated via these email updates. You may subscribe now, if you aren't already, by clicking here.

 

Presenting AWS FreePBX HA Clustering!

 
 
 
 
 
 
 

The Home of AWS FreePBX®

The fully supported FreePBX® VoIP Telephony Platform is available on AWS as a readily launched Machine Image with the most comprehensive specialized support you'll find ANYWHERE...accept NO imitations! 

a division of Rebar IT Outsourcing

TheWebMachine Networks is a division of Rebar IT Outsourcing, a Technology Contract Services company.

 

If you are in need of Technical Support, first try visiting our Support Wiki. There, you will find our FAQs on the most common questions and issues experienced. You may also contact our friendly and knowledgeable Support Staff by using the floating blue button on the right.

FreePBX® is a Registered Trademark of Sangoma Technologies and is used with permission.
TheWebMachine Networks is a fully licensed and certified Sangoma partner.
Amazon Web Services is a trademark of Amazon.com, Inc. or its affiliates in the United States and/or other countries.
TheWebMachine Networks is an AWS Partner Network Member.
Rebar IT Outsourcing • TheWebMachine Networks
P.O. Box 271365 • Dallas, TX 75227 • USA