HiveMQ Cluster

One of the outstanding features of HiveMQ is its ability to form an MQTT broker cluster. Clustering is necessary for building a service with high availability and can be utilized to achieve horizontal scaling.
By scaling a HiveMQ cluster, its user ensures that immense numbers of concurrently connected devices and message throughput can be reached.
Elastic clustering simplifies the addition of cluster nodes during runtime.

An MQTT broker cluster is a distributed system that represents one logical MQTT broker to connected MQTT clients, meaning that for the MQTT client it makes zero difference whether it is connected to a single HiveMQ broker node or a multi node HiveMQ cluster.
Individual HiveMQ broker nodes, which are typically installed on separate physical or virtual machines, need to be connected over a network.

Prerequisites

In order to form a HiveMQ cluster the individual broker nodes need to:

Enable Cluster

To configure cluster you have to choose a type of transport (TCP/UDP) and a type of discovery that fits your needs.

An example configuration using Static Discovery and TCP Transport to form a HiveMQ broker cluster can be found below.

Example HiveMQ cluster configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
        <transport>
           <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
           </tcp>
        </transport>
        <discovery>
            <static>
                <node>
                    <!-- replace this IP with the IP address of your interface -->
                    <host>192.168.1.1</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.2</host>
                    <port>7800</port>
                </node>
            </static>
        </discovery>

    </cluster>
    ...
</hivemq>
More detailed information on Cluster Transport and Discovery can be found in the following chapters.


The minimal configuration to enable clustering with the default values is as follows.

Example minimal HiveMQ cluster configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
    </cluster>
    ...
</hivemq>

If no other configuration for transport and discovery is specified, HiveMQ will start its cluster with UDP as transport and MULTICAST as discovery.

We recommend to use this minimal configuration for testing in local networks only.

Cluster Discovery

Discovery describes the mechanism used by individual HiveMQ broker nodes, which allows those nodes to find each other and form a HiveMQ broker cluster.
Both static and dynamic mechanisms are available. Dynamic methods are ideal for increasing the size of the HiveMQ broker cluster during runtime, while static discovery is well suited for static HiveMQ deployments with fixed cluster sizes.
The following mechanisms are available. Beware that not all discovery mechanisms and transport protocols are compatible.

Table 1. Discovery mechanisms
Name UDP transport TCP transport Dynamic Description

static

Static list of nodes and their TCP bind ports and IP addresses.

multicast

Finds all nodes using the same multicast address and port.

broadcast

Finds all nodes in the same subnet by using the IP broadcast address.

extension

Uses information provided by a HiveMQ Extension to discover cluster nodes.

Additional custom mechanisms can be implemented with the Extension Discovery.

Static Discovery

When using the static cluster discovery mechanism, the configuration must include a list of HiveMQ broker nodes that are intended to form a cluster with IP address and the port used for TCP cluster transport.
Each HiveMQ node will regularly check all nodes that are included in its own static discovery list and try to include them into the cluster.
A HiveMQ broker node’s own IP and port may be listed as well but are not necessary.

A typical use case for static discovery is a HiveMQ cluster deployment with a fixed size, often utilized with the focus on providing high availability for your MQTT broker.
When multicast and broadcast are not available in the environment the HiveMQ cluster is deployed to, static discovery is also a viable option.

Static discovery example configuration for a 3 node cluster
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <static>
                <node>
                    <!-- replace this IP with the IP address of your interface -->
                    <host>192.168.1.1</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.2</host>
                    <port>7800</port>
                </node>
                <node>
                    <!-- replace this IP with the IP address of another node -->
                    <host>192.168.1.3</host>
                    <port>7801</port>
                </node>
            </static>
        </discovery>

    </cluster>
    ...
</hivemq>
We recommend providing a full list of all HiveMQ broker nodes to each individual nodes for increased failure resistance and stability.

Multicast Discovery

Multicast discovery is a dynamic discovery mechanism, which works by utilizing an IP multicast address configured with your UDP transport.
A node utilizing this method regularly checks if any nodes are available that are listening on the same IP multicast address as itself.

If you have control over your network infrastructure or you need a true auto discovery mechanism that detects cluster nodes when they start up in the same network, UDP multicast can be a perfect fit.
This is a master-master scenario, so there is no single point of failure on the HiveMQ node side and you can scale out quickly by just starting up new HiveMQ nodes.
Most cloud providers prohibit the use of UDP multicast.
UDP works well as a starting point for testing HiveMQ’s cluster functionality in local test and development environments.

Ensure that multicast is enabled and configured correctly before setting up a HiveMQ cluster with multicast discovery. Most cloud providers like AWS do not allows IP multicast.

The following listing shows a configuration example, enabling HivMQ clustering with UDP multicast.

Example configuration for an elastic UDP multicast cluster
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

 <listeners>
     <tcp-listener>
         <port>1883</port>
         <bind-address>0.0.0.0</bind-address>
     </tcp-listener>
 </listeners>

 <cluster>
     <enabled>true</enabled>
     <transport>
         <udp>
             <!-- replace this IP with the IP address of your interface -->
             <bind-address>192.168.1.10</bind-address>
             <bind-port>8000</bind-port>
             <multicast-enabled>true</multicast-enabled>
             <!-- replace this IP with the multicast IP address of your interface -->
             <multicast-address>228.8.8.8</multicast-address>
             <multicast-port>45588</multicast-port>
         </udp>
     </transport>
     <discovery>
         <multicast/>
     </discovery>
 </cluster>

</hivemq>

HiveMQ instances will automatically form a cluster when the nodes find themselves.

It doesn’t work!!!
Cluster configuration is a complex subject and there are many possible error sources outside of HiveMQ. These are the most common obstacles we have seen in the past:
  • Is your firewall enabled and not configured accordingly (Tip: Disable the firewall for testing)

  • Are your cluster nodes in the same network?

  • Is multicast enabled on your machine? Is multicast supported by your switch/router?

Broadcast Discovery

Broadcast discovery is a dynamic discovery mechanism, which finds other cluster nodes in the same IP subnet by sending out discovery messages over the IP broadcast address.

It can be configured with the following parameters:

Property Name Default value Description

broadcast-address

255.255.255.255

Broadcast address to be used. This should be configured to your subnet’s broadcast address. Example: 192.168.1.255.

port

8555

Port on which the nodes exchange discovery messages. Must be in the same port-range on all nodes.

port-range

5

Amount of additional ports to check for other nodes. The range goes from port to port+range.

When HiveMQ is deployed in an environment that allows broadcasting, this can be a viable option to create an elastic HiveMQ broker cluster with relative ease. Typical use cases are test, development and integration environments as well as on premise infrastructure. Most cloud providers do not allow broadcasting.

Broadcast discovery example configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <broadcast>
                <!-- replace this IP with the broacast IP address of your subnet -->
                <broadcast-address>192.168.1.255</broadcast-address>
                <port>8555</port>
                <port-range>5</port-range>
            </broadcast>
        </discovery>

    </cluster>
    ...
</hivemq>
Broadcast discovery only works if all your nodes are in the same IP subnet.

Extension Discovery

Extension discovery delegates the discovery of cluster nodes to a HiveMQ Extension.

This allows HiveMQ to be extended with custom discovery logic for specific use cases. For more information on creating a custom discovery extension see the Extension Guide.

Discovery extensions for common use cases, like dynamic discovery on AWS or Docker, are available in the HiveMQ Extension Marketplace.

Extension discovery example configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>
        ...

        <transport>
            <tcp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </tcp>
        </transport>

        <discovery>
            <extension/>
        </discovery>

        ...
    </cluster>
    ...
</hivemq>
When using an extension for discovery there are two necessary steps. The HiveMQ configuration must be set to extension discovery and a discovery extension must be installed.


Cluster Transport

Cluster transport determines the network protocol, which is used to transfer information between HiveMQ broker nodes in a cluster.
HiveMQ offers the selection between TCP, UDP or secure TCP (via TLS encryption) as your cluster transport.

Table 2. Cluster Transport types
Transport type Description

TCP

Communication within the cluster is done via TCP.

UDP

Communication within the cluster is done via UDP.

Secure TCP

Communication within the cluster is done via a TLS encrypted, secure TCP.

TCP is the recommended transport protocol.

TCP Transport

TCP is an internet protocol that provides reliable, ordered and error checked delivery. Due to its reliability and the fact it is widely available on all environments, it is the recommended transport protocol for HiveMQ broker clusters.

TCP transport can be configured with the following parameters:

Table 3. Configuring TCP transport
Property Name Default value Description

bind-address

null

The network address to bind to. Example: 192.168.28.12

bind-port

8000

The network port to listen on.

external-address

null

The external address to use if the node is behind some kind of NAT.

external-port

0

The external port to use if the node is behind some kind of NAT.

Example TCP transport configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <tcp>
                <!-- replace this IP with the local IP address of your interface -->
                <bind-address>192.168.1.2</bind-address>
                <bind-port>8000</bind-port>

                <!-- replace this IP with the external IP address of your interface -->
                <external-address>10.10.10.10</external-address>
                <external-port>8008</<external-port>
            </tcp>
        </transport>

        <discovery>
            <broadcast/>
        </discovery>

    </cluster>
    ...
</hivemq>
TCP is the recommended transport protocol.

UDP Transport

UDP is a network protocol using a simple connectionless communication. It provides no mechanisms that handle unreliability in the network.
There is no guarantee for delivery, ordering or duplicate protection.
Most cloud providers prohibit the use of multicast, which dictates that UDP can not be used as the transport protocol in these environments.
Typical use cases for UDP as the HiveMQ cluster transport method are local test and development environments, as UDP multicast is a quick way to configure and establish a HiveMQ broker cluster. Using UDP transport in production is not recommended.

UDP transport can be configured with the following parameters:

Table 4. Configuring UDP transport
Property Name Default value Description

bind-address

null

The network address to bind to. Example: 192.168.28.12

bind-port

8000

The network port to listen on.

external-address

null

The external address to bind to if the node is behind some kind of NAT.

external-port

0

The external port to bind to if the node is behind some kind of NAT.

multicast-enabled

true

If UDP multicast should be used. This is required for MULTICAST discovery.

mulitcast-address

228.8.8.8

The multicast network address to bind to.

multicast-port

45588

The multicast port to listen on.

The only cluster discovery method that can be used with UDP transport is Multicast Discovery.

Example UDP transport configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>

        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.2</bind-address>
                <bind-port>8000</bind-port>

                <multicast-enabled>true</multicast-enabled>
                <!-- replace this IP with the multicast IP address of your interface -->
                <multicast-address>228.8.8.8</multicast-address>
                <multicast-port>45588</multicast-port>
            </udp>
        </transport>

        <discovery>
            <multicast/>
        </discovery>

    </cluster>
    ...

</hivemq>
UDP transport for production environments is not recommended.

Secure TCP Transport with TLS

TLS is a cryptographic protocol that provides communications security. Taking advantage of this feature requires the use of TCP as cluster transport and adding a <tls> configuration to it.
The primary purpose of using a secure TLS connection is to provide an encrypted connection between the HiveMQ broker nodes. This ensures privacy and data integrity. More details on the subject can be found in the Security chapter.
Encryption requires complex mathematical operations and calculation. This creates additional computing load on any service.
As HiveMQ is no exception to this, best performances can be achieved by providing a security layer on top of the TCP transport for the cluster, like a secure network zone for the HiveMQ cluster nodes.
Typical use cases for TLS encrypted TCP connection as cluster transport revolve around individual security requirements.

TLS transport can be configured using the following parameters:

Table 5. Configuring TLS transport
Property Name Default value Required Description

enabled

false

Enables TLS in the cluster transport.

protocols

All JVM enabled protocols

Enables specific protocols.

cipher-suites

All JVM enabled cipher suites

Enables specific cipher-suites

server-keystore

null

The JKS key store configuration for the server certificate.

server-certificate-truststore

null

The JKS trust store configuration, for trusting server certificates.

client-authentication-mode

NONE

The client authentication mode, possibilities are NONE, OPTIONAL (client certificate is used if presented), REQUIRED (client certificate is required).

client-authentication-keystore

null

The JKS key store configuration for the client authentication certificate.

client-certificate-truststore

null

The JKS trust store configuration, for trusting client certificates.

Table 6. Configuring TLS key store
Property Name Default value Required Description

path

null

The path to the JKS trust store. A key store where trusted certificates are stored.

password

null

The password for the key store.

private-key-password

null

The password for the private key.

Table 7. Configuring TLS trust store
Property Name Default value Required Description

path

null

The path for the JKS trust store which includes trusted certificates.

password

null

The password for the trust store.

It makes sense to use the same certificate for all HiveMQ cluster nodes. This means that the same JKS can be used as key store and trust store.
Example TLS transport configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    ...
    <cluster>
        <enabled>true</enabled>
        <transport>
           <tcp>
                <!-- replace this IP with the IP address of your interface -->
               <bind-address>192.168.2.31</bind-address>
               <bind-port>7800</bind-port>
               <tls>
                 <enabled>true</enabled>
                 <server-keystore>
                    <path>/path/to/the/key/server.jks</path>
                    <password>password-keystore</password>
                    <private-key-password>password-private-key</private-key-password>
                 </server-keystore>
                 <server-certificate-truststore>
                    <path>/path/to/the/trust/server.jks</path>
                    <password>password-truststore</password>
                 </server-certificate-truststore>
               </tls>
           </tcp>
        </transport>
    </cluster>
    ...
</hivemq>
Only the transport can be done over TLS. Discovery always uses plain TCP.

Cluster Failure Detection

HiveMQ provides several means of failure detection for added cluster stability and fault tolerance. By default both failure detection mechanisms are enabled and used to ensure fail over scenarios can be detected quickly and efficiently.

Table 8. Failure Detection mechanisms
Mechanism Description

Heartbeat

Continuously sends a heartbeat between nodes.

TCP health check

Holds an open TCP connection between nodes.

Default values are suited for most HiveMQ deployments. Only apply changes, if there is a specific reason for it.

Heartbeat

To ensure all currently connected cluster nodes are available and responding, a continuous heartbeat is sent between all nodes of the HiveMQ broker cluster.
A node that does not respond to a heartbeat within the configured time will be suspected as unavailable and removed from the cluster by the node that sent the initial heartbeat.

You can configure the heartbeat with the following parameters:

Property Name Default value Description

enabled

true

Enables the heartbeat.

interval

3000 (TCP) / 8000 (UDP)

The interval in which a heartbeat message is sent to other nodes.

timeout

9000 (TCP) / 40000 (UDP)

Amount of time that is tolerated for the response to a heartbeat message before a node is temporarily removed from the cluster.

The port used for heartbeat can not be configured. The transport port will be used for this mechanism. (Default: 8000)
Heartbeat example configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </udp>
        </transport>

        <discovery>
            <multicast />
        </discovery>

        <failure-detection>
            <heartbeat>
                <enabled>true</enabled>
                <interval>5000</interval>
                <timeout>15000</timeout>
            </heartbeat>
        </failure-detection>
    </cluster>
    ...
</hivemq>

TCP Health Check

The TCP health check holds an open TCP connection to other nodes for the purpose of recognizing a disconnecting node much faster than the heartbeat could. Additionally, the TCP health check enables nodes to disconnect immediately from a cluster.

You can configure the TCP health check with the following parameters:

Property Name Default value Description

enabled

true

Enables the TCP health check.

bind-address

null

The network address to bind to.

bind-port

0

The port to bind to. 0 uses an ephemeral port.

external-address

null

The external address to bind to if the node is behind some kind of NAT.

external-port

0

The external port to bind to if the node is behind some kind of NAT.

port-range

50

Port range to check on other nodes.

TCP health check example configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        <enabled>true</enabled>

        <transport>
            <udp>
                <!-- replace this IP with the IP address of your interface -->
                <bind-address>192.168.1.1</bind-address>
                <bind-port>7800</bind-port>
            </udp>
        </transport>

        <discovery>
            <multicast />
        </discovery>

        <failure-detection>
            <tcp-health-check>
                <enabled>true</enabled>
                <!-- replace this IP with the local IP address of your interface -->
                <bind-address>1.2.3.4</bind-address>
                <bind-port>0</bind-port>
                <!-- replace this IP with the external IP address of your interface -->
                <external-address>10.10.2.30</external-address>
                <external-port>0</external-port>
                <port-range>50</port-range>
            </tcp-health-check>
        </failure-detection>
    </cluster>
    ...
</hivemq>

Cluster Replicas

HiveMQ broker clusters replicate stored data across nodes dynamically. This means that each piece of persistent data gets stored on more than one node.
The minimum amount of replicas that have to be persisted within the cluster before each node deems the replication sufficient can be configured.
Each time the HiveMQ cluster size changes, all persisted data gets re-distributed to ensure the configured replica count is upheld. By default the replica count is 2 meaning that two replications for each persisted data set is available in the cluster.

Replica example configuration
<?xml version="1.0"?>
<hivemq xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">

    ...
    <cluster>
        ...
        <replication>
            <replica-count>2</replica-count>
        </replication>
    </cluster>
    ...
</hivemq>
The maximum replica count is cluster size - 1, meaning that so called full replication is established and all pieces of persistent data are stored on every node. Configurations with a higher value will be automatically be downgraded to the maximum. Be aware that this number changes with cluster size changes in elastic HiveMQ cluster deployments.

Restart the Cluster with Persistent Data

HiveMQ will automatically move the persistent data to a backup folder hivemq/data/cluster-backup on startup. This will not cause a HiveMQ cluster to lose any data in case a node is restarted, since all the data is replicated to the remaining nodes. If you want to shutdown and restart an entire cluster without losing the persistent data, there are a few things to consider.

Replica count for persistent data

It is highly recommended to have at least one replication (default), if the cluster has any persistent data. In case the replica-count is configured to be zero, restarting a node will cause persistent data to be lost.

Persistent data

The persistent data consist of the following things.

  • retained messages

  • subscriptions of persistent session clients

  • queued messages of persistent session clients

  • the current state of each individual message transmission, that is not yet completed

Shutting down the Cluster

In order to make sure that the data is not scattered across multiple nodes, it is important to shutdown one node at a time. Make sure that the last running node has enough disk space to store the data of the entire cluster. The last node that is shutdown will be the first to restart.

Restarting the Cluster

Go to the hivemq/bin folder of the last instance that was shutdown. Execute the recovery.sh file. This will start HiveMQ as usual but without moving the data to a backup folder. As soon as the first instance is running, you can start the other instances with the run.sh file, as usual. We do not recommend or support starting more than one HiveMQ instance with the recovery.sh file, this is likely to lead to inconsistency in the Cluster.