Monitoring MQTT Messages with InfluxDB and Grafana
Written by Florian Raschbichler
Published: October 14, 2020
System monitoring is an essential part of any production software deployment. In many use cases, monitoring is just as critical as security and should be given the same level of attention. An absence of cohesive tools combined with the presence of an incorrect mindset are typical challenges to effective monitoring. It is important not to fall victim to the false sense of security the lack of accurate monitoring can impart.
You need to monitor your system
This blog post shows you how to set up a standardized dashboard that includes the metrics that are most commonly used for live monitoring MQTT brokers. Of course, this does not mean that these are all the metrics you need to successfully monitor your system. It is up to you to decide which metrics are crucial for yor individual use case and deployment. However, we hope to give you a good starting point.
HiveMQ exposes a large number of metrics with Java Management Extensions (JMX) and enables monitoring with JMX monitoring tools such as JConsole.
Real-time monitoring with tools such as JConsole is certainly better than nothing but has its own disadvantages. HiveMQ is often deployed in a containerized environment and therefore direct access to the HiveMQ process might not be possible. Despite that, use of a time-series monitoring solution provides the added benefit of functioning as a great debugging tool when you need to find the root cause of a system crash or similar event.
To enable time-series monitoring, HiveMQ offers two pre-built monitoring extensions free of charge: the InfluxDB Monitoring Extension and the Prometheus Monitoring Extension.
We are frequently asked for monitoring tool recommendations. Ultimately, deciding on the right tool is a matter of personal preference and the choice is up to you. In the past, we have had good experiences with the combination of Telegraf, InfluxDB, and a Grafana dashboard.
Telegraf can be used for gathering system metrics and writing them to InfluxDB. HiveMQ is also able to write metrics to InfluxDB and a Grafana dashboard is a good solution for visualizing these gathered metrics.
Please note, this is just one monitoring solution. There are countless viable monitoring options are available.
Installation and Configuration
The first step to achieving our desired monitoring setup is to installing and starting InfluxDBinstall and start InfluxDB. InfluxDB works out of the box with no additional configuration. Once InfluxDB is up and running, use the command line tool to create a database called ‘hivemq’.
Attention: InfluxDB does not provide authentication by default. When you run InfluxDB on an external server, lack of authentication can expose your metrics to a third party. Make sure you that address this potential security issue adequately.
InfluxDB data increases rapidly. Over time, the continuously expanding volume of data uses large amounts of disc space. To mitigate the situation, InfluxDB offers the possibility to create retention policies. Usually, it is sufficient to retain your InfluxDB data for two weeks. The syntax for creating a two-week retention policy looks like this:
Which, if any, retention policy is best for your individual use case is entirely up to you.
The second step is downloading the InfluxDB HiveMQ Extension. For this example setup, all services run locally, so we can use the
influxdb.properties file that is included in the HiveMQ InfluxDB Monitoring Extension without any further adjustments. When you run an external InfluxDB, you must change the IP address in the properties file.
If you run HiveMQ in a cluster, it is important to use the exact same
influxdb.properties on each node with the exception of this property:
This property should be set individually for each HiveMQ node in the cluster for better transparency.
The HiveMQ InfluxDB extension now gathers all the available HiveMQ metrics and writes them to the configured InfluxDB.
The third step is to install Telegraf on each HiveMQ cluster node.
You need to configure the
telegraf.conf file to tell Telegraf which metrics to gather and write to InfluxDB. The default
telegraf.conf file is full of comments and options that are not needed for HiveMQ monitoring. The basic configuration that we suggest looks like this:
This configuration provides metrics for
- CPU: CPU Usage divided into spaces
- System: Load, Uptime
- Disk: Disk Usage, Free Space, Inodes used
- DiskIO: IO Time, Operations
- Memory: RAM Used, Buffered, Cached
- Kernel: Linux specific information like context switching
- Processes: Systemwide process information
Note that some modules such as Kernel may not be available on non-Linux systems.
When you use an external InfluxDB, be sure to update the URL in the configuration appropriately.
This configuration gathers the CPU percentage and total usage every five seconds. See this page for other possible configurations of the system input.
At this point, the terminal window on which you run InfluxDB should display something like this:
This indicates a successful write of the Telegraf metrics to InfluxDB.
The next step is to install and start Grafana.
Grafana works straight out of the box and can be reached on localhost:3000.
Once Grafana is installed, we can configure our InfluxDB as the data source for Grafana.
Step 1: Add Data Source
Step 2: Configure InfluxDB
Now, we need a dashboard. Because the dashboard question comes up quite often, HiveMQ provides free dashboard template that displays useful metrics for most MQTT deployments. Use the template to start building a dashboard that is tailored to your own individual use case. You can download the template here. The JSON file in the dashboard template zip file can be imported to Grafana.
Step 3: Import Dashboard
That’s it. We have set up a working dashboard on which you can observe the types of metrics that have proven vital for successful monitoring of many MQTT deployments.
Disclaimer: This article presents one possibility and a good starting point for monitoring your MQTT use case. The actual requirements for each use case vary. To determine what works best for you and your deployment, we recommend reading the getting started guide from Grafana.