Building an elastic high availability MQTT broker cluster on AWS

Written by Florian Raschbichler

Category: HiveMQ Clustering AWS Third Party

Published: September 13, 2019


HiveMQ is a cloud-first MQTT broker with elastic clustering capabilities and a resilient software design which is a perfect fit for common cloud infrastructures. This blogpost discussed what benefits a MQTT broker cluster offers. Today’s post aims to be more practical and talk about how to set up a HiveMQ on one of the most popular cloud computing platform: Amazon Webservices.

Update
This post has been updated for HiveMQ 4 and the use of AWS Network Load Balancer has been added.

Running HiveMQ on cloud infrastructure

Running a HiveMQ cluster on cloud infrastructure like AWS not only offers the advantage of providing elastic scalability of the infrastructure, it also assures that state of the art security standards are in place on the infrastructure side. These platforms are typically highly available and new virtual machines can be spawned in a snap if necessary. HiveMQ’s unique ability to add (and remove) cluster nodes at runtime without any manual reconfiguration of the cluster allow scaling linearly on IaaS providers. New cluster nodes can be started (manually or automatically) and the cluster sizes adapts automatically. For more detailed information about HiveMQ clustering and how to achieve true high availability and linear scalability with HiveMQ, we recommend reading the HiveMQ Clustering Paper.

As Amazon Webservice is amongst the best known and most used cloud platforms, we want to illustrate the setup of a HiveMQ cluster on AWS in this post. Note that similar concepts as displayed in this step by step guide for Running an elastic HiveMQ cluster on AWS apply to other cloud platforms such as Microsoft Azure or Google Cloud Platform.

Setup and Configuration

Amazon Webservices prohibits the use of UDP multicast, which is the default HiveMQ cluster discovery mode. The use of Amazon Simple Storage Service (S3) buckets for auto-discovery is a perfect alternative when the individual HiveMQ broker nodes are running on AWS EC2 instances. HiveMQ has a free pre-built extension available for AWS S3 Cluster Discovery.

The following provides a step-by-step guide how to setup the brokers on AWS EC2 with automatic cluster member discovery via S3.

Setup Security Group

The first step is creating a security group that allows inbound traffic to the listeners we are going to configure for MQTT communication. It is also vital to have SSH access on the instances. After you created the security group you need to edit the group and add an additional rule for internal communication between the cluster nodes (meaning the source is the security group itself) on all TCP ports.

To create and edit security groups go to the EC2 console - NETWORK & SECURITY - Security Groups

Inbound traffic

Inbound traffic

Outbound traffic

Outbound traffic

The next step is to create an s3-bucket in the s3 console. Make sure to choose a region, close to the region you want to run your HiveMQ instances on.

Create IAM role

Our recommendation is to configure your EC2 instances in a way, allowing them to have access to the s3 bucket.

Create IAM Role

S3 Full Access Policy

HiveMQ on AWS

To install 2 HiveMQ broker nodes on 2 EC2 instances on AWS, we utilize the HiveMQ AMI

  1. Launch the AMI in your region of choice

  2. Select an instance type. We recommend using c5.xlarge for testing purposes.

  3. Configure the instance details
    Select the ec2 instance type

  4. Create 2 instances.

  5. Assign the newly created S3 Full Access role to the instances.

  6. Go to “Configure Security Group”.

  7. Select the Security Group we just creared.

  8. Launch the instances.

This action will automatically spawn two separate EC2 instances that run HiveMQ as a service.

Install and configure HiveMQ S3 Cluster Discovery Extension

Next, we want to enable the cluster mode on both of our HiveMQ instances and provide a way for the instances to discover each other. For this purpose, install the HiveMQ S3 Cluster Discovery Extension

  • Create an S3 Bucket the HiveMQ instances may use.
    Make sure to remember the bucket name. You can use the default configuration.

The following steps need to be done on each individual HiveMQ instance:

  • Connect to the instance via SSH

    1
    
    ssh -i <your-deployment-key> ec2-user@<instance-ip-address>

  • Switch to the root user

    1
    
    sudo su

  • Download the HiveMQ S3 Cluster Discovery Extension

    1
    
    wget https://www.hivemq.com/releases/extensions/hivemq-s3-cluster-discovery-extension-4.0.1.zip

  • Unzip the distribution

    1
    
    unzip hivemq-s3-cluster-discovery-extension-4.0.1.zip

  • This will create a folder hivemq-s3-cluster-discovery-extension

  • Open the HiveMQ S3 Cluster Discovery Extension configuration file (you may use a different text editor of course)

    1
    
    vi hivemq-s3-cluster-discovery-extension/hivemq-s3-cluster-discovery-extension.xml 

  • Configure the S3 Bucket region and name

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    ############################################################
    # S3 Bucket                                                #
    ############################################################
    
    #
    # Region for the S3 bucket used by hivemq
    # see http://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region for a list of regions for S3
    # example: us-west-2
    #
    s3-bucket-region:<your-region>
    
    #
    # Name of the bucket used by HiveMQ
    #
    s3-bucket-name:<your-bucket-name>

  • Change ownership of the extension folder to the hivemq user

    1
    
    chown -R hivmq:hivemq hivemq-s3-cluster-discovery-extension

  • Move the folder in to the HiveMQ Extension folder

    1
    
    mv hivemq-s3-cluster-discovery-extension/ /opt/hivemq/extensions/

Now that we have the HiveMQ S3 Cluster Discovery Extension successfully installed, let’s adjust the HiveMQ config. Change the /opt/hivemq/conf/config.xml file to look like the following:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
<?xml version="1.0"?>
<hivemq>

    <listeners>
        <tcp-listener>
            <port>1883</port>
            <bind-address>0.0.0.0</bind-address>
        </tcp-listener>
    </listeners>

    <cluster>
        <enabled>true</enabled>
        <transport>
            <tcp>               
                <bind-address>IP_ADDRESS</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <extension/>
        </discovery>
    </cluster>

    <anonymous-usage-statistics>
        <enabled>true</enabled>
    </anonymous-usage-statistics>

    <control-center>
        <listeners>
            <http>
                <port>8080</port>
                <bind-address>0.0.0.0</bind-address>
            </http>
        </listeners>
    </control-center>
</hivemq>

Line 15: Enter your EC2 instance’s internal IP address here.

All that is left to do is to restart the HiveMQ Service on both EC2 instances.

1
/etc/init.d/hivemq restart

The following log statement in the /opt/hivemq/log/hivemq.log file shows successful cluster establishment:

INFO - Cluster size = 2, members : [8Jojp, WlF1S].

Hint: This process can be applied to an arbitrary number of HiveMQ cluster nodes to create clusters of a bigger size than 2 if necessary.

Lunch and configure AWS NLB

We are now able to take advantage of rapid elasticity. Scaling the HiveMQ cluster up or down by adding or removing EC2 instances without the need of administrative intervention is now possible. One last step on our way to a true high availability including a load balancer to our setup. This way our HiveMQ broker cluster can act as a single logical broker nodes to MQTT clients. An MQTT clients simply needs to know the load balancers URL to connect, publish, and subscribe. The actual number of HiveMQ broker nodes active in the cluster are irrelevant to the MQTT client.

  1. Go to Target Groups of your EC2 account and click “Create target group”.
    Create target group

  2. Name your target group

  3. Choose “Instance” as type

  4. Select “TCP” as protocol

  5. Choose port “1883”

  6. Select the VPC, your HiveMQ Broker Nodes are running in

  7. Select TCP as health check protocol

  8. Click “Create”
    Create target group

  9. Select your newly created target group, go to “Targets”, and click “Edit”
    Create target group

  10. Select your HiveMQ instances

  11. Click “Add to registered”

  12. Save
    Create target group

  13. Go to Load Balancers and click “Create Load Balancer”
    Create target group

  14. Create a Network Load Balancer
    Create target group

  15. Name your Load Balancer and make it internet-facing

  16. Choose “TCP” and Port “1883”

  17. Configure your VPC and availability zones according to your needs. HINT: It is best practise to choose all availability zones.

  18. Go to “Configure Security Settings”
    Create target group

  19. Go to “Configure Routing” Hint: We recommend using plain TCP on your load balancer and configure TLS for security on the HiveMQ broker nodes themselves, as none of AWS’ Load Balancer opitons are capable of mutual TLS handshakes.

  20. Select our newly created target group and go to “Register Targets”
    Create target group

  21. Go to “Review” and “Create” the Load Balancer
    Create target group

That’s it! Once the Load Balancer finished provisioning, we can connect to our HiveMQ Broker Node cluster using the Load Balancer’s DNS name.

For production environments it’s recommended to use automatic provisioning of the EC2 instances (e.g. by using Chef, Puppet, Ansible or similar tools) so you don’t need to configure each EC2 instance manually. Of course HiveMQ can also be used with Docker, which can also ease the provisioning of HiveMQ nodes.

Who we are

We love writing about MQTT, IoT protocols and architecture in general. Our experts are here to help, so reach out to us if we can help!

contact us

About Florian Raschbichler

Florian serves as the head of the HiveMQ support team with years of first hand experience overcoming challenges in achieving reliable, scalable, and secure IoT messaging for enterprise customers.
Contact
<  Join the Free HiveMQ Webinar   |   Migrating from HiveMQ 3 to HiveMQ 4   >