MQTT Essentials Part 2: Publish & Subscribe
Welcome to the second part of the MQTT Essentials. A blog series about the core features and concepts in the MQTT protocol. In this second post, we’ll discuss the Publish/Subscribe pattern. First we look at the general characteristics of Publish/Subscribe itself and then we’ll focusing on MQTT. We’ll also explain how MQTT is different from traditional message queuing protocols.
In the first post, we introduced MQTT, explained its origin and history. If you haven’t already read it, you should definitely check it out.
The publish/subscribe pattern
The publish/subscribe pattern (pub/sub) is an alternative to the traditional client-server model, where a client communicates directly with an endpoint. However, Pub/Sub decouples a client, who is sending a particular message (called publisher) from another client (or more clients), who is receiving the message (called subscriber). This means that the publisher and subscriber don’t know about the existence of one another. There is a third component, called broker, which is known by both the publisher and subscriber, which filters all incoming messages and distributes them accordingly. So let’s dive into a little bit more details about the just mentioned aspects. Remember this is still the basic part about pub/sub in general, we’ll talk about MQTT specifics in just a minute.
As already mentioned the main aspect in pub/sub is the decoupling of publisher and receiver, which can be differentiated in more dimensions:
Space decoupling: Publisher and subscriber do not need to know each other (by ip address and port for example)
Time decoupling: Publisher and subscriber do not need to run at the same time.
Synchronization decoupling: Operations on both components are not halted during publish or receiving
In summary publish/subscribe decouples publisher and receiver of a message, through filtering of the messages it is possible that only certain clients receive certain messages. The decoupling has three dimensions: Space, Time, Synchronization.
Pub/Sub also provides a greater scalability than the traditional client-server approach. This is because operations on the broker can be highly parallelized and processed event-driven. Also often message caching and intelligent routing of messages is decisive for improving the scalability. But it is definitely a challenge to scale publish/subscribe to millions of connections. This can be achieved using clustered broker nodes in order to distribute the load over more individual servers with load balancers. (We will discuss this in detail in a separate post, this would go beyond the scope).
So what’s interesting is, how does the broker filter all messages, so each subscriber only gets the messages it is interested in?
Option 1: Subject-based filtering
The filtering is based on a subject or topic, which is part of each message. The receiving client subscribes on the topics it is interested in with the broker and from there on it gets all message based on the subscribed topics. Topics are in general strings with an hierarchical structure, that allow filtering based on a limited number of expression.
Option 2: Content-based filtering
Content-based filtering is as the name already implies, when the broker filters the message based on a specific content filter-language. Therefore clients subscribe to filter queries of messages they are interested in. A big downside to this is, that the content of the message must be known beforehand and can not be encrypted or changed easily.
Option 3: Type-based filtering
When using object-oriented languages it is a common practice to filter based on the type/class of the message (event). In this case a subscriber could listen to all messages, which are from type Exception or any subtype of it.
Of course publish/subscribe is not a silver bullet and there are some things to consider, before using it. The decoupling of publisher and subscriber, which is the key in pub/sub, brings a few challenges with it. You have to be aware of the structuring of the published data beforehand. In case of subject-based filtering, both publisher and subscriber need to know about the right topics to use. Another aspect is the delivery of message and that a publisher can’t assume that somebody is listening to the messages he sends. Therefore it could be the case that a message is not read by any subscriber.
So now we have learned a lot about publish/subscribe in general, but what about MQTT in specific. MQTT embodies all of the mentioned aspects, depending on what you want to achieve with it. MQTT decouples the space of publisher and subscriber. So they just have to know hostname/ip and port of the broker in order to publish/subscribe to messages. It also decouples from time, but often this is just a fall-back behavior, because the use case mostly is to delivery message in near-realtime. Of course the broker is able to store messages for clients that are not online. (This requires two conditions: client has connected once and its session is persistent and it has subscribed to a topic with Quality of Service greater than 0). MQTT is also able to decouple the synchronization, because most client libraries are working asynchronously and are based on callbacks or similar model. So it won’t block other tasks while waiting for a message or publishing a message. But there are certain use cases where synchronization is desirable and also possible. Therefore some libraries have synchronous APIs in order to wait for a certain message. But usually the flow is asynchronously. Another thing that should be mentioned is that MQTT is especially easy to use on the client-side. Most pub/sub systems have the most logic on the broker-side, but MQTT is really the essence of pub/sub when using a client library and that makes it a light-weight protocol for small and constrained devices.
MQTT uses subject-based filtering of messages. So each message contains a topic, which the broker uses to find out, if a subscribing client will receive the message or not. More details about topics are explained in Part 5 of the MQTT Essentials. It would also be possible to do content-based filtering with the HiveMQ MQTT broker and its custom plugin system.
In order to handle the challenges of a pub/sub system in general, MQTT has the quality of service (QoS) levels. It is easily possible to specify that a message gets successfully delivered from the client to the broker or from the broker to a client. But still there is the chance that nobody subscribes to the particular topic. If this is a problem, it depends on the broker how to handles such cases. For example, the HiveMQ MQTT broker has a plugin system, which is capable of identifying such cases and take action or just log every message into a database for historical analytics. In order to mitigate the inflexibility of topics, it is important to design the topic tree very carefully and leave room for extension for use cases in the future. If you follow these strategies, MQTT is perfect for production setups.
Distinction from Message Queues
So there are many confusions about MQTT, its name and if it is implemented as a message queue or not. We will try to bring light into the dark and explain the differences. In our last post we already pointed out that the name MQTT comes from an IBM product called MQseries and has nothing to do with “message queue“. But regardless of the name, what are the differences between MQTT and a traditional message queue?
A message queue stores message until they are consumed
When using message queues, each incoming message will be stored on that queue until it is picked up by any client (often called consumer). Otherwise the message will just be stuck in the queue and waits for getting consumed. It is not possible that message are not processed by any client, like it is in MQTT if nobody subscribes to a topic.
A message will only be consumed by one client
Another big difference is the fact that in a traditional queue a message is processed by only one consumer. So that the load can be distributed between all consumers for a particular queue. In MQTT it is quite the opposite, every subscriber gets the message, if they subscribed to the topic.
Queues are named and must be created explicitly
A queue is far more inflexible than a topic. Before using a queue it has to be created explicitly with a separate command. Only after that it is possible to publish or consume messages. In MQTT topics are extremely flexible and can be created on the fly.
If there are any other differences that we have missed, we would love your opinion in the comments.
So that’s the end of post number two and next week we will look more into the definitions of a MQTT client and broker as well as establishing a connection between them.
If you want to be notified as soon as the next part is released simply sign up for our newsletter, which brings you fresh content about MQTT and HiveMQ once a week. If you prefer RSS, you can subscribe to our RSS feed here.