MQTT Topic Tree & Topic Matching: Challenges and Best Practices Explained
As both the size and complexity of IoT projects continues to grow, we talk to many IT architects who are working to solve the technical challenges to design a data foundation built for scale. In a large MQTT deployment, there may be thousands or even millions of clients subscribing to different topics.
This article explains why finding the matching subscriptions among millions of subscribers is a challenge and how an MQTT broker can overcome this challenge.
What is Topic Matching in MQTT?
MQTT is a publish/subscribe protocol where devices act as MQTT Clients and exchange messages over an MQTT Broker. MQTT Clients send their data in the PUBLISH control packets to the specific topic. The topic is separate from the packet’s payload, which allows the broker to avoid analyzing the packet’s payload. The broker delivers the published message to every client subscribed with a matching topic filter.
For those unaware, the main distinction between the topic and topic filter is that the topic is used for publishing and cannot contain wildcard characters whereas the topic filter can. The wildcard characters are used to aggregate multiple streams of data into one and are thus used on the subscriber’s side. It is possible to create a topic filter without wildcard characters, then it would only match at most one topic. That case is often referred to as an exact subscription. For more information refer to our article, MQTT Topics and Wildcards.
In a nutshell, topic filter can be thought of as a selector for topics that the PUBLISH packets are sent to. The broker must be able to find the matching subscriptions for each published message.
Subscriptions can contain wildcard characters to match a broad range of topics. Subscriptions with wildcards are often used when there is uncertainty about the topics that publishing clients will use. For example, when the publishing clients include their ID as one topic level, it may be impossible to reliably receive messages for all such topics without the usage of wildcard characters. While this is useful for the clients, finding all the matching subscriptions presents a technical challenge. In some real-life scenarios, brokers check millions of subscriptions for every published message.
MQTT Wildcard Topic Matching Challenge Explained
Since there are many use cases for wildcards, let’s examine the technical challenge of capitalizing on wildcard subscriptions. First, looking at every subscription for every published message is not scalable. The number of steps needed to find the matching subscriptions linearly increases with the number of subscriptions. Alternatively, the broker could map the subscriptions to their topic filters and check the map for all filters matching the topic of a published message. This method is also impractical because the number of potentially matching topic filters is rather large for topics with many levels. For instance, if a message is published to the topic “town/house/kitchen”, all the subscriptions with the following topic filters would match:
The broker must also check the map for all these topic filters. In production workloads, the broker has to find the matching subscriptions for published messages thousands of times per second, so it needs a specialized data structure for a fast lookup.
The Topic Tree
The Topic Tree is a data structure used to solve the challenges posed by the above wildcard topic matching problem. The topic of the published message is used to collect the matching subscriptions present in the topic tree. We start at the root of the topic tree and proceed through its levels using the topic segments to select the next node. If the current node has wildcard subscriptions (with # or +), they are added to the matching subscriptions. Once there are no more segments to match in the topic and there are non-wildcard subscriptions in the current node, there are exact subscriptions for this topic. An exact subscription filters topics of published messages for exact matches.
The broker can continue delivering the published message to the subscribed clients once it has found all matching subscriptions. This way of storing the subscriptions also reduces memory usage because topic levels shared across multiple subscriptions are only stored once.
For your application to filter messages to your specifications regardless of how the MQTT Broker defines topic matching, there are a few topic design considerations that you may leverage.
It is good practice to avoid topic levels that do not add additional information, like using the same topic level across all subscriptions. The most common example of such abuse is using the company name as the first level for every subscription. While some topic levels typically have less variety than others, you should omit topic levels that are the same for every topic. Similarly, leading with forward slashes must be matched, so should be avoided if you don’t want them present on all topics.
In conclusion, MQTT’s topic matching is crucial to its publish/subscribe protocol, enabling MQTT clients to exchange messages with the MQTT broker with minimal effort. Topic filters help select the topics to which PUBLISH packets are sent, and subscriptions with wildcard characters enable broad topic matching. However, finding all matching subscriptions presents a technical challenge, it can be solved using a specialized data structure called the Topic Tree. It is essential to use best practices when designing topics to make them agnostic of the implementation that the particular MQTT broker may have for topic matching.