Leveraging Behavior Model on MQTT Communication to Optimize IoT Deployments

by Stefan Frehse

Dec 5, 2023 20 min read

MQTT stands out as a highly versatile protocol designed for efficient, lightweight communication in IoT applications. Rooted in the publish-subscribe model, it inherently fosters a decoupled relationship between clients. Nevertheless, when working with MQTT, data producers and consumers often depend on a predefined set of guidelines. For instance, 1) a device comprising multiple MQTT connections must adhere to a specific initialization sequence to be recognized as online; 2) a predetermined sequence of messages is necessary to trigger the readiness of data producers for full functionality; and 3) proper resource usage caused by clients. In this article, a behavior model based on a finite state machine is introduced. At the end of this article, a real-world example is shown which can already be used today.

The Need for Behavior Policies on MQTT Communication

Since MQTT allows implementing any of these communication schemes flexibly, data producers usually implement some control logic to validate against these. However, this approach has some drawbacks since either data transfer happens unnecessarily, and the respective control logic has to be implemented in all data consumers. Even when data producers are implemented according to agreed standards, checks are required to validate these standards to overcome inadvertent faulty deployments. Applying behavior policies overcomes the need to check on the consumer side.

The Model

Regardless of the MQTT version, the MQTT protocol implements a defined state machine to handle client connections. For example, a CONNECT packet is initially required to establish a client-to-broker connection. Once the connection has been established, a certain state is materialized and further packets can be sent. Consecutively, a client can now send a SUBSCRIBE packet to subscribe to an MQTT topic. After that, a client sends a PUBLISH packet to publish a payload to the broker. Eventually, the client sends a DISCONNECT packet to disconnect from the broker — the final state is reached.

The following diagram shows the client behavior we just described. Just keep in mind that this demonstrates the progress over time when a client is connecting to a broker rather than reflecting the actual implementation.

MQTT Protocol Usage Example on State Machine

A finite state machine can be briefly specified that consists of finite States and finite Transitions between two states. In the example above, the following states and transitions are shown:

States: Connected, Subscribed, Published, Start, End
Transitions:
- Start -> CONNECT -> Connected
- Connected -> SUBSCRIBE -> Subscribed
- Subscribed -> PUBLISH -> Published
- Published -> DISCONNECT -> End

Note: The above states and transitions are simple examples of how the client is interacting with an MQTT broker. HiveMQ’s broker implementation is much more flexible and highly scalable, but for simplicity’s sake, the state machine illustrates a typical scenario.

Purpose-driven Protocol

Most use cases define their purpose-driven protocol on top of these states and transitions provided by the MQTT protocol, mostly on top of MQTT payloads with actual data. An example: A device is considered entirely online when the following MQTT packets are sent in the specific order: CONNECT, SUBSCRIBE, PUBLISH+, DISCONNECT whereas PUBLISH+ means at least one publish is being sent. The essence is that the SUBSCRIBE packet must come before the PUBLISH messages to consider this device adequately initialized.

To validate this kind of sequence, a specific protocol is defined that requires this particular sequence — a purpose-driven protocol is defined. Other scenarios: 1) the Last Will must be defined to be valid, and 2) no duplicate payloads must be sent within an hour to the purpose-driven protocol.

Generally, this client behavior can be validated using custom code implementation as an extra microservice or by implementing a HiveMQ Extension.

Behavior Models as Purpose-driven Protocol Checker

In this section, we introduce Behavior Models. A behavior model is a finite state machine that specifies the purpose-driven protocol on top of any MQTT packets. The state machine has the following properties:

States:

Initial state: The start state that a behavior model starts as soon as a client initiates the connection with the MQTT broker
Terminal state: The state has two subtypes: Success indicates that a client has passed the behavior model successfully and Failed otherwise.
Intermediate State: All other states model intermediate states between initial and terminal.

Transitions:

A certain event may cause a state transition. This includes at least MQTT packets such as CONNECT, PUBLISH, and similar or more complex events such as time or even external triggers.
A transition consists of a start state, a conditional event, and a target state.
A conditional event consists of an event such as an 1) MQTT packet (such as CONNECT or PUBLISH), or an event and a condition that returns true or false.

Actions:

A transition may have additional behavior to execute, e.g., modifying the payloads being sent before reaching any consumer, or building custom metrics.

Memory:

It also has a limited memory to store and load data to and from, which has the operation store(var, value)and load(var), which returns the stored value.

Example: DUPLICATE_COUNTER

Since the formalism might be a bit too dry, let’s consider an example below: Define a behavior model called DUPLICATE_COUNTER, which counts the number of two consecutive and identical payloads. This behavior model could be modeled as follows:

MQTT Behavior Model of DUPLICATE_COUNTER

States:

The model has three states, called Start, CONNECTED, and End
Start is an initial state, Connected is an intermediate state, and End is a successful terminal state.

Transitions & Actions:

Transition from Start to Connected: triggered once a client initiates a connection via the MQTT packet CONNECT. While executing the transition, a variable named counter is initialized to 0 and stored in the memory.
Loop from Connected to Connected: triggered by the conditional event once a client sends payloads via the MQTT PUBLISH packet and the condition isIdentical is true. Consequently, the action is executed that increments the variable counter by one.
Transition from Connected to End: A client sends an MQTT DISCONNECT packet, the transition is triggered, and the state machine results in the successful terminal state.

The counter increases every time two identical messages are published consecutively. Note, the function isIdentical is not further introduced here for the sake of simplicity.

The formalism of behavior models allows us to check whether the usage of the MQTT protocol is well-defined. The models may have two outcomes: either the MQTT client implements the state machine correctly (the behavior models end up in a successful terminal state), or the MQTT client misbehaves (which ends up in the failed terminal state). Especially, misbehavior can be handled accordingly, e.g., to first make these clients visible in large-scale IoT deployments or to correct behavior by introducing additional logic in the state transitions. In particular, the latter case is interesting for fixing clients that are hard to fix due to the lack of updateability.

Implementation in Data Hub

With the HiveMQ Platform version 4.20, we made the HiveMQ Data Hub generally available, which implements, alongside data-policy, the new behavior policies with pre-defined Data Hub behavior models.

A behavior policy instantiates a behavior model checker for selectable client connections. Each MQTT packet received in the broker is passed to the checker to determine the state transitions of the instantiated behavior model. Moreover, state transitions can cause further actions. Even disconnecting a client or dropping a message is possible when a client misbehaves.

The general view of the architecture is shown below. MQTT packets from a connected client are processed by the broker and checked by the instantiated Behavior Model Checker, which we call the policy engine. The policy engine returns the further action to be taken, for example, drop the message, disconnect the client, or just log a message for further inspection.

Behavior Model Checker in HiveMQ Data Hub

As a consequence, each client connection has some further information in regards to the instantiated behavior model’s state and the variables — remember, we created a counter variable in the example above.

However, an essential aspect is to make client behavior visible. For this purpose, the current state of each client connection can be requested via the REST API — even state variables persisted in the memory are returned for further debugging.

Publish.duplicate

Consider the following described behavior model that is an actual available behavior model in HiveMQ Data Hub and similar to the presented example model above. The intention of this behavior model is to track duplicate messages in practice with two terminal states.

States	Type	Description
Initial	Initial, Non-Terminal	The starting point of the model which is entered as soon as a client is matched by the policy
Connected	Intermediate, Non-Terminal	The state models that a client has successfully connected to the broker
NotDuplicated	Intermediate, Non-Terminal	Indicates that either the client has sent its first message or two consecutive messages are different.
Duplicated	Intermediate, Non-Terminal	Indicates that the client has sent a message which is equal to the previous one.
Violated	Failure, Terminal	When a client has sent two equal consecutive messages at any point in time and disconnects the state, Violated is the terminal state.
Disconnected	Success, Terminal	When a client has always sent different consecutive messages and disconnects, the state Disconnected is the terminal state.

Below, the state machine of the behavior model is shown.

The state machine of the behavior model in MQTT Communication

As you can see in the state machine, there is one state indicating whether the client sends duplicate messages – the Duplicated state.

The listing below shows the complete behavior policy ready to be used:

{
  "id": "drop-duplicate-messages",
  "createdAt": "2023-11-22T18:02:06.853Z",
  "lastUpdatedAt": "2023-11-22T18:38:01.714Z",
  "matching": {
    "clientIdRegex": ".*"
  },
  "behavior": {
    "id": "Publish.duplicate",
    "arguments": {}
  },
  "onTransitions": [
    {
      "fromState": "Any.*",
      "toState": "Duplicated",
      "Mqtt.OnInboundPublish": {
        "pipeline": [
          {
            "id": "operation-GDgvk",
            "functionId": "Mqtt.drop",
            "arguments": {
              "reasonString": "The message you sent was a duplicate message caught by policy ${policyId}."
            }
          }
        ]
      }
    }
  ]
}

The policy instantiates the behavior model for each connecting client configured in the matching field. In this example, the Publish.duplicate behavior model is used. Next, an additional action in the onTransitions fields is defined. In this particular case, every time the MQTT client sends a PUBLISH packet that leads the state machine to move from Any (wildcard state) into the Duplicated state, the pipeline is executed. In this case, the pre-defined and available Mqtt.drop function is executed, which means the incoming MQTT PUBLISH packet will be dropped. Eventually, depending on the version of the MQTT client, a reason string is provided to the sending client.

This behavior policy is ready to be used in your IoT deployments with HiveMQ Data Hub to avoid message duplicates on the consumer side.

You may also use the HiveMQ Control Center to create the policy above. Please see the screenshot below showing the relevant part of the behavior policy to create it in your HiveMQ Control Center.

HiveMQ Control Center for Creating a Behavior Policy

Showcase

In this section, we want to demonstrate the semantic of clients with regard to having the behavior policy registered in HiveMQ Data Hub. The animated illustration below shows three parts of it:

The top-left console shows an MQTT client publishing data to the HiveMQ Broker using the mqtt cli. The client connects with the broker with the clientId “testclient”. Once a connection is established, messages are published to the topic “test” in the following order:
1. { “temperature”: 123 }
2. { “temperature”: 123 }
3. { “temperature”: 124 }
4. { “temperature”: 124 }
5. { “temperature”: 123 }

The bottom-left console shows an MQTT client that subscribes to the wildcard topic to consume all data. As you can see in the output, the client only consumes distinct consecutive messages:

{ “temperature”: 123 }
{ “temperature”: 124 }
{ “temperature”: 123 }

The right console shows the requested state from the REST API for the client “testclient”. Every time the client has sent a duplicate message, the behavior model moves into the Duplicated state. The response also shows further information.

Practical Implications

The behavior model Publish.duplicate shows generally advantages in various dimensions as listed below:

Improves Efficiency: Reducing unnecessary data transmission enhances network efficiency, ensuring that only relevant, timely data is communicated.
Lowers Operational Costs: Less data transmission means lower bandwidth usage and potentially reduced costs associated with data storage and processing.
Enhances System Performance: Systems and networks are less burdened, leading to improved performance and reliability.
Facilitates Better Data Management: With more streamlined data flows, it becomes easier to manage, analyze, and leverage data effectively.
Supports Scalability: Efficient data transmission is crucial for scalability, especially in growing IoT networks where the number of devices and data points can exponentially increase.

Conclusion

Integrating behavior models into MQTT communication is essential for optimizing IoT deployments. The introduction of finite state machines ensures systematic validation of MQTT clients, enforcing predefined sequences and standards. The DUPLICATE_COUNTER behavior model exemplifies practical application, tracking conditions like duplicate message occurrences.

As the IoT landscape advances, leveraging behavior models in MQTT emerges as a proactive strategy for ensuring reliability, efficiency, and cost-effectiveness in large-scale IoT scenarios, marking a significant stride in achieving well-defined and optimized MQTT-based IoT deployments.

Stefan Frehse

Stefan Frehse is Engineering Manager at HiveMQ. He earned a Ph.D. in Computer Science from the University of Bremen and has worked in software engineering and in c-level management positions for 10 years. He has written many academic papers and spoken on topics including formal verification of fault tolerant systems, debugging and synthesis of reversible logic.