Top 10 IoT Scalability Tests for an MQTT Broker
Written by Michael Walter
Category: HiveMQ MQTT Testing IoT
Published: March 26, 2020
Many IoT applications begin with a small number of connected devices and then grow over time to a substantial number of devices. Therefore, it is important for organizations to understand how their IoT infrastructure will scale to handle an increase in the number of connected devices and the corresponding message throughput. Failure to plan for scalability can result in unplanned outages, lackluster end user experience due to increased response times and inefficient usage of cloud or data center resources. It is vital to ensure that your IoT application can scale to handle any anticipated growth in usage.
At HiveMQ, many of our customers will test the scalability of their IoT application. The MQTT broker is a central communication component for their IoT application, so we encourage them to carefully test for performance, scalability and reliability. The results of holistic MQTT broker tests will help with the planning of infrastructure resources and provide assurance the IoT applications will respond appropriately to growing demands and usage.
To help with the planning and analysis, we have prepared a list of test scenarios that are essential to test the performance, scalability and reliability of an MQTT broker. These test scenarios are applicable to any MQTT broker or MQTT cloud service; not just HiveMQ.
Testing the Basics
The basic functionality of an MQTT broker is to ensure message delivery between MQTT clients. Therefore, a test plan needs to ensure to properly cover all basic MQTT operations. Here’s a list of scenarios to consider for testing an MQTT broker’s capacity for connection handling, message throughput and processing of large amounts of MQTT topics.
1. MQTT Clients Connections
Unlike request-response based protocols like HTTP, MQTT is a push communication protocol so it requires a persistent TCP connection between the broker and each client. Establishing and maintaining these TCP connections is amongst the most resource intensive tasks of an MQTT broker.
Many IoT applications need to accommodate usage spikes. For instance, a connected car platform will typically see a usage spike during rush hour or a home automation system will see spikes during the evenings. How an MQTT broker handles a sudden increase in the number of client connections is important for capacity planning.
MQTT client connection tests should include the following:
Identifying how many concurrent MQTT client connections a single broker instance with given computing resources can handle without service degradation. This will help with capacity planning.
Testing how many simultaneous connection attempts a single broker instance with given computing resources can handle without service degradation. This will help you configure your brokers and infrastructure components accordingly, to avoid intentional or unintentional denial of service scenarios caused by unwanted spikes of connection attempts.
Running concurrent connections and spike connection attempt scenarios against a cluster of MQTT brokers to test and ensure necessary elastic scalability characteristics of your setup. We recommend incorporating a load balancing solution in these test cases.
When these client connection tests are run it is important the TLS encryption is turned on so there is more precise capacity tested.
2. Subscriptions and Topics at Scale
MQTT message exchange is based on topics. MQTT clients subscribe to topics and MQTT clients also publish to specific topics. To ensure your MQTT broker can support the scalability requirements of your IoT application, it is recommended to run the following tests for subscriptions and topics:
Increasing number of clients subscribed to the same topic. Clients often subscribe to the same topic when broadcast information is sent, for example, if an update for the clients is available. Testing the limits of subscribers per single topic that can be handled by an MQTT broker, with a given set of resources, will help you with capacity planning.
Increasing the number of subscriptions each client is using. Adding more subscriptions to existing clients simulates the adaptation of new services and functionality in your IoT application. You should test how many concurrently active topics the MQTT broker can handle without service degeneration.
Identifying the number of subscribers and topics an MQTT broker cluster can handle is a very useful performance indicator. It is important to not only test against a single MQTT broker instance but also against an elastic MQTT broker cluster to help understand the scalability for future usage demands of your IoT application.
3. Testing MQTT Message Throughput
An MQTT broker is responsible for reliable messaging between MQTT clients. MQTT messaging is fully asynchronous in nature, based on the publish/subscribe paradigm. It is important to test the message throughput that is expected for your IoT application. MQTT message publishing tests should include the following:
Fan-in: Fan-in refers to scenarios where a large number of clients are publishing messages and a smaller number of clients are subscribed to the corresponding topics. This simulates data collection by backend services.
Fan-out: Fan-out refers to scenarios where a small number of clients are publishing messages and a larger number of clients are subscribed to the corresponding topics . This simulates broadcasting messages to IoT devices.
One to One: Despite the asynchronous and de-coupled nature of the Publish/Subscribe paradigm it is often desirable to represent a one-to-one type communication between MQTT clients. To achieve this it is common to have dedicated topics for each client that contain the receiver’s clientID. Large scale load tests of this nature test an MQTT broker’s capacity for keeping large amounts of concurrent connections and its message throughput performance at the same time.
It is also important to identify the maximum message throughput a single MQTT broker instance as well as an elastically scaling MQTT broker cluster can support without service degradation. This will help you with capacity planning and can ensure that you are able to scale your deployment appropriately when an increase in usage makes it necessary.
As the message payload affects the maximal message throughput, we recommend to use an message payload that is suitable to the planned IoT application.
Advanced MQTT Features
MQTT includes a set of features that go beyond the basic functionality of connecting clients and sending and receiving messages. Here’s a list of features that are very popular and useful, which should be tested.
4. Quality of Service Levels
One of the key characteristics of MQTT is its capability to reliably deliver messages even under unreliable networking conditions. This gets accomplished by three different Quality of Service Levels: Zero (at most once delivery), One (at least once delivery) and Two (exactly once delivery). Make sure to test that the MQTT broker or service can support all Quality of Service Levels to the extent you plan on using them.
5. Queued Messages
When using persistent sessions and a quality of service level that is greater than zero, an MQTT broker will make sure that an offline client is not missing any messages for subscribed topics. The broker will queue all those messages and deliver them to the client in order once a connection is re-established. Well architectured brokers are capable of limiting the amount of queued messages on a per client basis. Test to make sure that the MQTT broker can handle a realistic amount of queued messages based on your expected message throughput and expected duration of offline clients.
6. Shared Subscriptions
Shared Subscriptions are a client load balancing mechanism that allows a group of consumers to subscribe to the same topic as a group, while the broker makes sure to send each message that arrives on the topic to a single group member only. This is very useful for back end consumer applications that process more messages than a single client can handle and may need the capability to scale up and down based on current load. You should include tests for the scalability of shared subscriptions by sending vast amounts of messages on a single topic and adding many members to a shared subscription group that consumes those messages.
7. Retained Messages
When an MQTT message is set to be retained the broker will persist the last message for the corresponding topic. This feature is useful to grant access to the last data on each topic for newly connected clients. For example, a thermometer might publish the temperature every 10 minutes. A mobile phone app that wants to connect to the thermometer would need to wait for the most recent published data before being able to show the temperature. A retained message for the temperature would allow the mobile app to show the most recent temperature immediately.
There are a number of tests that can be done to ensure the MQTT broker processes retained messages at scale:
- Ensure each existing topic in the system can store a retained message without significant service degradation.
- If all messages are retained, test the publishing throughput to ensure there is minimal degradation.
- Test a large number of MQTT clients that are reconnecting or resubscribing to topics that will return a retained message. Measure the impact of this scenario on the broker to assist in capacity planning.
High Availability Test
In addition to verifying that an MQTT broker can provide the necessary scalability and performance, it is important to test how a broker accommodates functions in high availability tests. The following scenarios cover the different failures an MQTT broker needs to accommodate when running in a cluster.
8. Large-scale Client Disconnect Scenarios
An IoT application needs to be able to gracefully recover from failures in the underlying infrastructure, ex. network outages or a load balancer crash. This type of failure will disconnect all the MQTT clients that were connected to the broker. This will result in a large number of MQTT client disconnects and reconnect events from the connected IoT devices. To ensure the resiliency of an IoT application the MQTT broker needs to be able to handle these scenarios as quickly as possible with minimal service degradation.
For this type of test scenario it is important to test the MQTT Last Will and Testament feature. Last Will and Testament allows MQTT clients to leave an MQTT message ( Last Will) with a broker so if the MQTT client becomes disconnected, the broker will automatically publish the message. If there are a large number of MQTT client disconnects, the publishing of the Last Will messages will have an impact on the broker performance.
9. Infrastructure Outage
An MQTT broker running in a private data center or on a cloud service needs to be able to adapt to infrastructure failures. Typically machines that are hosted in a public cloud have a monthly uptime of 99.99% so an MQTT broker running in such an environment will have the potential to fail. It is important to test how an MQTT broker cluster responds to the situation where one broker node in the cluster fails. An individual cluster node failure should not impact the behaviour of the connected IoT device and the end user experience. The expected result should be that the remaining cluster nodes do not lose the information associated with the failed node. Specifically, client sessions, queued messages for clients, subscriptions for clients, and retained messages all continue to be available to the other nodes.
NOTE: We recommend using a load balancer so the MQTT clients can be automatically redirected to another broker node.
10. Prolonged Network Outage
In accordance with high availability best practises instances of an MQTT broker cluster are often hosted across multiple availability zones. In this type of architecture, it is possible for the network to become disconnected in one region. During these so-called split-brain scenarios, a large MQTT broker cluster splits into two or more separate smaller clusters. The expected result should be that the remaining cluster nodes do not lose the information associated with the failed node. Specifically, client sessions, queued messages for clients, subscriptions for clients, and retained messages all continue to be available to the other nodes.
A robust test plan for IoT scalability is an essential part of any production rollout of an IoT application. This article identifies key scenarios for testing the scalability of an MQTT broker.
In the next article, we will identify typical pitfalls and best practices for implementing IoT scalability testing.
Let us know if you have comments or suggestions for other tests.