Creating highly available and ultra-scalable MQTT clusters
Written by The HiveMQ Team
Published: October 6, 2016
MQTT servers are critical pieces of a messaging infrastructure and are part of the business critical application backbone that must not fail. The scalable Publish / Subscribe architecture of MQTT depends on a MQTT broker as central distributor of messages. To circumvent such a single point of failure in messaging systems (if a broker is offline, clients can’t communicate!), MQTT broker clusters are required. One of the most unique and sophisticated features of HiveMQ is its cluster capability that is ultra-reliable and is typically used in cloud environments for systems that must not fail and need linear scalability over time.
This blog post series discusses why a MQTT broker cluster is needed for professional MQTT deployments and how to deploy a MQTT broker cluster. You’ll also see why it’s not enough to have distributed Topic Trees for scaling MQTT (= routing messages between cluster nodes). You’ll learn that network partitions must be supported by brokers in cloud environments and that a MQTT broker cluster needs to be resilient and available all the time since a broker cluster is one logical MQTT broker from a MQTT client’s perspective, which also means that a broker must never lose any MQTT sessions, even when topology changes occur.
What is a MQTT broker cluster?
A MQTT broker cluster is a distributed system that represents one logical MQTT broker. It consists of many different MQTT broker nodes that are typically installed on different physical machines and are connected over a network. From a MQTT client’s perspective, a cluster of brokers behaves like a single MQTT broker.
A HiveMQ broker cluster is perfect for cloud environments due to the advantages in scalability, elasticity and resilience. Deployments on Virtual Machines on cloud infrastructure providers like Amazon Web Services, Microsoft Azure or Google Cloud Platform are as easy to operate and scale as with classic on-premise data center infrastructure.
When talking about MQTT server clusters, never forget that MQTT clusters are a distributed system. This is very important to bear in mind when thinking about highly scalable MQTT deployments, because the Fallacies of Distributed Computing and the CAP Theorem are fundamental principles that also apply to MQTT broker deployments. In essence this means: The single components of the MQTT broker cluster are unreliable (because everything can fail at any time, especially in cloud environments) while the whole system needs to be reliable. This requires the MQTT broker cluster to be very resilient to failures and topology changes and thus it can’t sacrifice partition tolerance. So make sure your broker of choice supports network partitions if you must never lose any MQTT messages and you need your broker cluster highly available.
Load balancers are often used together with MQTT broker clusters to have a single point of entry for all MQTT communication. Any TCP load balancer can be used together with HiveMQ with any load balancing algorithm. Sticky sessions are not required for HiveMQ, even if persistent MQTT sessions are used.
Advantages of a MQTT Broker cluster with HiveMQ
What are the advantages of creating a MQTT broker cluster with HiveMQ for your production environment?
Eliminate the Single Point of Failure: Due to it’s brokered architecture, MQTT systems typically have a single point of failure: The broker. If this central component is not available, no MQTT communication is possible. A clustered HiveMQ installation eliminates this single point of failures since multiple brokers (installed on different machines) act as a single broker and if one of the individual brokers is unavailable, the logical broker as a whole is still available.
Message distribution across cluster nodes: Since a cluster of HiveMQ brokers is one logical broker, the nodes need to ensure that the publish and subscribe mechanisms work on all nodes the same and messages are distributed across cluster nodes if necessary. HiveMQ is exceedingly smart when distributing messages in the cluster.
Clients can connect to any broker node to resume sessions: MQTT clients with persistent sessions can resume their sessions at any cluster node any time, even in case of failures and network splits. This means clients can receive their queued messages, can resume their Quality of Service Flows and their subscriptions are preserved. This is also true if the node the client originally connected to is not available anymore.
Linear scalability: HiveMQ scales in a linear fashion and can scale from two cluster nodes up to hundreds. While we recommend to scale vertically and horizontally for high-throughput and low-latency scenarios, it’s perfectly possible to use many small server instances instead of few big ones. If the deployment requirement change, you can just add or remove cluster nodes as you need.
Resilient and fault-tolerant: Cloud environments can be tough and have high demands when it comes to resiliency and fault tolerance. Virtual machines or Docker containers can fail at any time due to various reasons and network splits may occur at any time. A HiveMQ broker cluster is designed to be operated in fragile environments and is fully resilient and fault tolerant in case of infrastructure problems. This means an arbitrary number of nodes can fail and the cluster is still operational.
Zero Downtime Upgrades: You certainly don’t want any downtime for your deployment just because you want to upgrade your MQTT brokers. Upgrading a HiveMQ cluster is as easy as it can get: Just spawn a new broker node with a new version and remove older nodes from the cluster- node by node. You can also rollback nodes in a cluster if you experience problems while upgrading. It’s important to note that there’s no need for a Blue-Green upgrade (which can be quite expensive in terms of infrastructure for huge clusters), since the clusters itself allow that different HiveMQ versions are part of the cluster while upgrading.
We saw in this blog post that MQTT broker clusters with HiveMQ have tremendous advantages in terms of scalability, reliability and high availability. If your business is dependant on a MQTT broker that needs to be available, no matter what the circumstances are, we recommend to go with a clustered MQTT broker. Just keep in mind that it’s not enough for a MQTT broker to route messages in the cluster. A HiveMQ cluster behaves like one logical MQTT broker, even if clients reconnect to other nodes and the topology changes. The important part is that MQTT sessions are available (and distributed) in the cluster and endure infrastructure failures and network splits. The question is not “Will the system fail?" but “When will the system fail?". HiveMQ is designed for these failures and offers best-in-class availability in unreliable (cloud) environments.
Next week’s blog post will cover the HiveMQ cluster in detail. Stay tuned!