Building a two node high availability MQTT cluster

In a professional M2M environment the availability of your services is key. For MQTT environments this means your broker needs a good and stable connection and always-on functionality. Beside this requirement you want to be able to update your private cloud infrastructure while it is running and in production.
We are aware of these requirements and therefore we want to share some basic information on how to build such an environment using the HiveMQ broker. HiveMQ comes with the necessary cluster functionality built-in. But you have to follow some rules depending on what hardware you’re using and on how you want to balance the incoming connections to your broker infrastructure. In this configuration the cluster looks like one MQTT broker to the clients. The benefit of such a setup is that you find yourself in the comforting situation of having a fail over, if one of your MQTT servers is not available. The still available brokers can handle the traffic in this scenario, which is best described as a master-master configuration. If one of the two nodes stops working the load balancer will reroute all incoming traffic to the working cluster node and you won’t have any interruptions on the client side.

Demo environment

In this blog post we are using 3 virtual machines for the setup. Two of them are running a HiveMQ server and the third is a redundant TCP load balancer appliance. HiveMQ two node cluster diagram

HiveMQ configuration

Both HiveMQ servers have to be configured as cluster nodes. It is recommended to implement the cluster node interconnection using a dedicated network interface on both sides. If the cluster interconnection is implemented between two remote data centers in different geographical locations, please be aware of the speed and latencies between both locations. The HiveMQ cluster is not what you may know as “bridging” from other brokers. HiveMQ uses a inter-node message serialization mechanism, but more on that later.

How does HiveMQ build a cluster

Our two nodes need to know each other to build a cluster. We strongly recommend to use a dedicated network for the cluster connection. In order to connect both cluster nodes you can choose i.a. between auto discovery or fixed configured IP addresses of cluster members. For a two node high availability cluster configuration auto discovery is not optimal in a production environment. This feature is more interesting for dynamic or elastic cluster environments with more than two cluster nodes. In order to save time we will go with the auto discovery mode on a private network between the two cluster nodes.

As soon as the HiveMQ cluster nodes are connected and the cluster is established, they will forward the publishes and subscribes to the other node. Let us first take a look at how HiveMQ handles messages and subscriptions within a cluster:

Both Clients are publishing and subscribing to the same topic. Client A and Client B is connected to Broker A. HiveMQ-cluster-howitworks-1 Client A publishes a message to the topic both clients (A and B) are subscribed to. This means the message stays at cluster node A because Client A and Client B are both subscribed to the topic on cluster node A. Cluster node B has nothing to do with this message at all.

HiveMQ-cluster-howitworks-2

If Client A is connected to cluster node A and Client B is connected to cluster node B and Client A publishes a message to cluster node A this will lead into this situation: HiveMQ-cluster-howitworks-3 The message is published to Broker A and Broker A forwards the message to Broker B because Client B is subscribed to the topic of the message on Broker B. Of course Broker A also publishes the message back to Client A because Client A has a subscription on the message topic, too.

How a HiveMQ cluster works

A message only gets forwarded to other cluster nodes if a cluster node is interested in it. This behaviour is clever for building a big cluster with a lot of nodes because it reduces the network traffic tremendously, because it prevents nodes from forwarding unnecessary messages.
A big advantage against bridging is that cluster subscriptions work dynamically. As soon as a client on a node subscribes to a topic it becomes known within the cluster. If one of the clients somewhere in the cluster is publishing to this topic, the message will be delivered to its subscriber no matter to which cluster node it is connected. Since HiveMQ uses a message serialization mechanism to share publishes between cluster nodes, there is no MQTT protocol overhead.1 This reduces the network traffic between nodes tremendously.

A potential problem you could stumble upon is a reconnect issue on the client side. What happens if the cluster node your client is subscribed to goes down? A classic approach would be to have an advanced reconnect logic where the client connects to another cluster node (the fallback IP could be hard coded). However, this is not optimal and error prone, so we recommend to avoid this kind of solution.

Instead we recommend using a load balancer for this case. It is an easy way to build a high availability environment without the need of implementing any advanced reconnect logic on the client side. This approach can be used easily with legacy devices which are already deployed. You do not have to change their already configured connection information in this case.

Configuring the cluster

In this example configuration both cluster nodes are CentOS 6.4 machines with Oracle JDK 1.6.0_45 and a HiveMQ server installed. Beside the HiveMQ server, the “access.log” plugin was deployed to see which clients are connected to which node, their subscriptions, etc. We are using HiveMQs auto discovery mode and both brokers are located in the same local area network, so the HiveMQ instances are building the cluster automatically. No further configuration is needed for this setup. You can see, only minimal configuration is necessary to setup a basic cluster environment.

Loadbalancer

We will not dive deep into the load balancer configuration, because it is out of our scope here. Basically you can use every TCP load balancer you have, for a production environment you should have a high availability solution though, otherwise the load balancer becomes your single point of failure. For this demonstration we used “ha proxy” as the TCP load balancer and configured it for round robin on both cluster nodes.

Demonstration

In this short video we will demonstrate the configuration in a virtual environment with round robin for new connections.

As you can see our tool SimpleMessage creates a new connection each time you click on the send button. This is maybe not the most realistic client behavior but helps to demonstrate that round robin is working through our load balancer. The subscriber connection remaines on the server it got connected to and is receiving all messages no matter to which broker they were published.

Conclusion

We demonstrated how to build a MQTT high availability cluster environment using a TCP load balancer and the HiveMQ broker. We will release a trial of HiveMQ soon so you can build your own MQTT cluster and try yourself. If you’re interested in more specific demonstrations on how HiveMQ can fit into your use case, please feel free to contact us


  1. This does definitely matter when you want to make sure that each node receives a message only once. This would mean you have to use Quality of Service 2, which are at least four messages in the message flow between each node when using bridging. See this section in the official specification