Getting Started with MQTT Data Validation Using HiveMQ Data Hub

by Kudzai Manditereza

Oct 4, 2023 14 min read

The HiveMQ Data Hub is an integrated policy engine within the HiveMQ MQTT broker. It helps in detecting and managing problematic data and misbehaving MQTT clients. A key feature of HiveMQ Data Hub is Data Validation. This lets you validate and manipulate MQTT data while in motion, ensuring that recipients get data in the correct format. This central data governance approach saves developers from writing separate code for each MQTT client to rectify data issues. Plus, it simplifies the task of identifying and addressing bad data sources.

In this article, we’ll guide you through setting up data validation on the HiveMQ Broker. This includes downloading the HiveMQ broker, activating the HiveMQ Data Hub and Rest API, specifying and loading the schema that describes the MQTT message structure, and establishing policies to oversee rules throughout the MQTT topic hierarchy based on the schema.

Downloading HiveMQ MQTT Broker and Configuring Data Hub

Start by navigating to the HiveMQ website and clicking on “Get HiveMQ”. Choose “Download HiveMQ”, and once downloaded, unzip the folder to a location of your choosing. Getting Started with HiveMQ Webpage

From version 4.17 of HiveMQ onwards, a free version of the Data Hub, with basic functions, is bundled with your HiveMQ package. This free mode permits the creation of a singular policy and offers a basic function set. To explore HiveMQ Data Hub’s full potential, you can activate a five-hour test phase, which we’ll illustrate shortly.

By default, HiveMQ Data Hub’s data validation is disabled. To activate it, head to the main folder of the downloaded and unzipped HiveMQ broker, and open the 'conf' directory. Here, you’ll find a config.xml file. Launch this file in your preferred text editor and add the <data-governance-hub> and <rest-api> XML element into its default contents, resulting in the following config.xml file.

HiveMQ Data Hub Config File Both these elements should have their ’enabled’ option set to ’true’. Having done that, navigate to the ‘bin’ directory and start the HiveMQ broker by running the run.bat file for Windows users or the run.sh file for MAC users.

HiveMQ Broker run.sh

Upon successful startup of the HiveMQ broker, you can observe the HiveMQ REST API endpoint running on port 8888 as shown in the image above. We will use this endpoint to add the broker’s data validation schemas and policies.

However, before diving in, we need to activate the trial mode to unlock the full capabilities of HiveMQ Data Hub. To achieve that, we need to send a POST request with an empty body to the following REST API endpoint:

http://localhost:8888/api/v1/management/data-governance-hub/start-trial

We will use Postman, a free developer API tool, for this and successive interactions with the HiveMQ REST API endpoint.

Postman Workspace Adding the API

The Postman screenshot above shows that the status code 204 is returned upon successful trial mode activation.

Adding Data Validation Schema to HiveMQ Data Hub

In HiveMQ Data Hub, Data Validation relies on the interaction between two entities: Schemas (which outline the expected structure and format of incoming MQTT message payload data) and Policies (which instruct the HiveMQ broker how to handle incoming MQTT messages that do not match the desired schema).

The HiveMQ Data Hub allows schema definitions for MQTT payloads using either the JSON or Protobuf data formats. It facilitates the addition, removal, and retrieval of the data validation schemas through its Rest API methods, “Create a New Schema,” “Delete Schema,” and “Get Schema,” respectively.

For this demonstration, we’ll create a JSON schema to check if a sensor’s MQTT payload includes temperature and humidity values within the desired range. If the values are outside this range, an appropriate message will be logged.

We’ll begin by defining the body for the “Create a New Schema” HTTP POST request. The body consists of three primary fields:

The “id” - serves as a unique identifier for the schema within the HiveMQ broker.
The “type” - indicates whether the schema is in JSON or Protobuf format.
The “schemaDefinition” - a Base64 encoded representation of the JSON schema file.

Here is the body for our “Create a New Schema” HTTP POST request. Right now, the “schemaDefinition” field is blank. First, we’ll define the JSON schema. After that, we’ll encode it using Base64. Finally, we’ll place the encoded text into the “schemaDefinition” field.

{
"id": "sensor_values",
"type": "JSON",
"schemaDefinition": ""
}

Below is the JSON schema definition designed to validate the value ranges for temperature and humidity in sensor readings. This schema outlines the acceptable structure and criteria for valid sensor data. The data should include two elements:

“temperature” - a number that falls between 20 and 70.
“humidity” - a number that ranges from 65 to 100.

{
        "title": "Valid Sensor Data",
        "description": "A schema that matches the temperature and humidity values of any object",
        "required": [
            "temperature",
            "humidity"
        ],
        "type": "object",
        "properties": {
            "temperature": {
                "type": "number",
                "minimum": 20,
                "maximum": 70
            },
            "humidity": {
                "type": "number",
                "minimum": 65,
                "maximum": 100
            }
        }
}

For example, if you had a sensor reading with a JSON payload shown below, it would be valid as per our schema.

{
    "temperature": 25,
    "humidity": 80
}

However, a reading like the one below would be invalid because the temperature and humidity values are below the specified minimums.

{
    "temperature": 15,
    "humidity": 60
}

Let’s save our schema definition in a file named sensor_values_schema.json.

After creating the schema to validate sensor data, the next step is to encode this file in Base64. Use the command below to do this, and the encoded outcome will be saved in sensor_values_base64.txt.

cat sensor_values_schema.json | base64 > sensor_values_base64.txt

Next, take the content from sensor_values_base64.txt and paste it into the “schemaDefinition” field of the “Create a New Schema” HTTP POST request body we prepared earlier.

{
"id": "sensor_values",
"type": "JSON",
"schemaDefinition": "ewogICAgInRpdGxlIjogIlZhbGlkIFNlbnNvciBEYXRhIiwKICAgICJkZXNjcmlwdGlvbiI6ICJBIHNjaGVtYSB0aGF0IG1hdGNoZXMgdGhlIHRlbXBlcmF0dXJlIGFuZCBodW1pZGl0eSB2YWx1ZXMgb2YgYW55IG9iamVjdCIsCiAgICAicmVxdWlyZWQiOiBbCiAgICAgICAgInRlbXBlcmF0dXJlIiwKICAgICAgICAiaHVtaWRpdHkiCiAgICBdLAogICAgInR5cGUiOiAib2JqZWN0IiwKICAgICJwcm9wZXJ0aWVzIjogewogICAgICAgICJ0ZW1wZXJhdHVyZSI6IHsKICAgICAgICAgICAgInR5cGUiOiAibnVtYmVyIiwKICAgICAgICAgICAgIm1pbmltdW0iOiAyMCwKICAgICAgICAgICAgIm1heGltdW0iOiA3MAogICAgICAgIH0sCiAgICAgICAgImh1bWlkaXR5IjogewogICAgICAgICAgICAidHlwZSI6ICJudW1iZXIiLAogICAgICAgICAgICAibWluaW11bSI6IDY1LAogICAgICAgICAgICAibWF4aW11bSI6IDEwMAogICAgICAgIH0KICAgIH0KfQ=="
}

To add the Schema to HiveMQ Data Hub, we’ll use Postman. Send a POST request containing the body we’ve prepared to this Rest API endpoint:

http://localhost:8888/api/v1/data-validation/schemas

If you look at the Postman screenshot below, you’ll see a “201 Created” response. This confirms that our request was successful and the data validation schema has been created in HiveMQ Data Hub. Additionally, a JSON object is returned, verifying the creation of the schema and noting the time it was created.

Postman Successful Request

Adding Policies to HiveMQ Data Hub

With our JSON Data Validation Schema now in place within the HiveMQ Data Hub, the next step is to implement a Policy. This will instruct the HiveMQ broker on the appropriate actions for incoming MQTT messages that don’t align with our sensor_values schema.

Here’s a JSON representation of a Policy. It’s designed to cross-check all data pushed to the sensordata topic against the sensor_values schema. Should there be a successful match, the broker will log a message indicating “valid sensor data.” Conversely, if there’s a mismatch, it logs an “invalid sensor data” message.

{
        "id": "com.hivemq.policy.sensordata",
        "matching": {
            "topicFilter": "sensordata"
        },
        "validation": {
            "validators": [
            {
                "type": "schema",
                "arguments": {
                    "strategy": "ALL_OF",
                    "schemas": [
                    {
                        "schemaId": "sensor_values",
                        "version": "latest"
                    }
                ]
            }
        }
    ]
},
    "onSuccess": {
        "pipeline": [
            {
            "id": "logSuccess",
            "functionId": "System.log",
            "arguments": {
            "level": "INFO",
            "message": "${clientId} sent a valid sensor data on topic '${topic}' with result '${validationResult}'"
        }
    }
]
},
        "onFailure": {
            "pipeline": [
            {
                "id": "logFailure",
                "functionId": "System.log",
                "arguments": {
                "level": "WARN",
                "message": "${clientId} sent an invalid sensor data on topic '${topic}' with result '${validationResult}'"
            }
        }
    ]
    }
}

We’ll use Postman to upload the policy to HiveMQ Data Hub. Send a POST request, incorporating the policy within the request body, to this Rest API endpoint:

http://localhost:8888/api/v1/data-validation/policies

Once you execute this, a “201 Created” response on Postman appears as shown below. This confirms the request was successful and the policy has now been created in HiveMQ Data Hub. Postman Created Policy

Having successfully set up and uploaded both a data validation schema and a policy, it’s time to test the HiveMQ Data Hub. We’ll send sensor data to the HiveMQ broker for this purpose. We’ll use MQTT.fx, an MQTT Client tool, to publish a JSON message that doesn’t adhere to our data validation schema’s requirements.

MQTT.fx Publishing JSON message

Upon sending the first message, we receive a log notification indicating the required minimum values: temperature should be at least 20 and humidity should be no less than 65.

HiveMQ Log Notification

Next, we’ll send another MQTT message, ensuring it aligns with our data validation criteria.

MQTT.fx Sending Message

When sent, we receive a log confirming that we’ve transmitted valid MQTT sensor data to the specified topic.

MQTT Sensor Data Log Confirmation

Conclusion

Throughout this article, we’ve demonstrated the process of setting up and configuring MQTT data validation using the HiveMQ Data Hub. By walking through the steps to download the HiveMQ broker, activate the HiveMQ Data Hub and its Rest API, specify the data schema, and implement policies, developers are better equipped to maintain the quality of MQTT messages in their IoT applications.

Ready to take MQTT data handling to the next level? Download the HiveMQ MQTT broker and explore the capabilities of the HiveMQ Data Hub.

Kudzai Manditereza

Kudzai is a tech influencer and electronic engineer based in Germany. As a Sr. Industry Solutions Advocate at HiveMQ, he helps developers and architects adopt MQTT, Unified Namespace (UNS), IIoT solutions, and HiveMQ for their IIoT projects. Kudzai runs a popular YouTube channel focused on IIoT and Smart Manufacturing technologies and he has been recognized as one of the Top 100 global influencers talking about Industry 4.0 online.

Getting Started with MQTT Data Validation Using HiveMQ Data Hub

Downloading HiveMQ MQTT Broker and Configuring Data Hub

Adding Data Validation Schema to HiveMQ Data Hub

Adding Policies to HiveMQ Data Hub

Conclusion

Kudzai Manditereza

Related content:

A Step-by-Step Guide to Connecting Ignition to MQTT and HiveMQ

Building a Unified Namespace: Why MQTT Outperforms NATS

Distributed Data Intelligence in Manufacturing: The Path, Benefits, and Pitfalls