HiveMQ - Monitoring with InfluxDB and Grafana
Written by Florian Raschbichler
Category: HiveMQ Third-Party
Published: June 30, 2017
System monitoring is an essential part of any production software deployment. Some people believe it to be as critical as security and it should be given the same attention. Historical challenges to effective monitoring are a lack of cohesive tools and the wrong mindset. These can lead to a false sense of security, which it is important to not fall victim of.
You need to monitor your system
This blog post we will provide you with a standardized dashboard, including metrics we believe to be useful for live monitoring MQTT brokers. This does in no way mean that these are all the metrics you need to monitor or that we could possibly know what’s crucial to your use case and deployment.
In order to provide you with the opportunity of implementing cohesive monitoring tools, the HiveMQ core distribution comes with the JVM Metrics Plugin and the JMX Plugin. The JVM Plugin will add crucial JVM metrics to the already existing available HiveMQ metrics and the JMX Plugin will enable JMX monitoring for any JMX monitoring tool like JConsole.
Real-time monitoring with the use of tools like JConsole is certainly better than nothing but has its own disadvantages. HiveMQ is often deployed in a container environment and therefore direct access to the HiveMQ process might not be possible. Despite that, using a time series monitoring solution also provides the added benefit of functioning as a great debugging tool, when trying to find the root cause of a system crash or similar.
We routinely get asked about recommendations for monitoring tools. At the end of the day this is down to preference and ultimately your decision. In the past we have had good experiences with the combination of Telegraf, InfluxDB and a Grafana dashboard.
Telegraf can be used for gathering system metrics and writing them to the InfluxDB. HiveMQ is able to write its own metrics to the InfluxDB as well and a Grafana dashboard is a good solution for visualizing these gathered metrics.
Please note that there are countless other viable monitoring options available.
Installation and configuration
The first step to achieving our desired monitoring setup is installing and starting InfluxDB. InfluxDB works out of the box without adding additional configuration. When InfluxDB is installed and running, use the command line tool to create a database called ‘hivemq’.
Attention: InfluxDB does not provide authentication by default, which could open your metrics up to a third party when running the InfluxDB on an external server. Make sure you cover this potential security issue.
InfluxDB data will grow rapidly. This can and will lead to the use of large amounts of disc space after running your InfluxDB for some time. To deal with this challenge InfluxDB offers the possibility to create so called retention policies. In our opinion it is sufficient to retain your InfluxDB data for two weeks. The syntax for creating this retention policy looks like this:
Which, if any, retention policy is best for your individual use case has to be decided by you.
The second step is downloading the InfluxDB HiveMQ Plugin. For this demonstration all the services will be running locally, so we can use the influxdb.properties file that is included in the HiveMQ Plugin without any adjustments. Bear in mind that you need to change the IP address, when running an external InfluxDB.
When running HiveMQ in a cluster it is important you use the exact same influxdb.properties on each node with the exception of this property:
This property should be set individually for each HiveMQ node in the cluster for better transparency.
This plugin will now gather all the available HiveMQ metrics (given the JMX Plugin is also running) and write them to the configured InfluxDB.
The third step is installing Telegraf on each HiveMQ cluster node.
Now a telegraf.conf needs to be configured, telling Telegraf which metrics it should gather and eventually write to an InfluxDB. The default telegraf.conf is very inflated and full of comments and options, that are not needed for HiveMQ monitoring. The config we propose looks like this:
This configuration provides metrics for
- CPU: CPU Usage divided into spaces
- System: Load, Uptime
- Disk: Disk Usage, Free Space, INodes used
- DiskIO: IO Time, Operations
- Memory: RAM Used, Buffered, Cached
- Kernel: Linux specific information like context switching
- Processes: Systemwide process information
Note that some modules like Kernel may not be available on non-Linux systems.
Make sure to change the url, when not using a local InfluxDB.
This configuration will gather the CPU’s percentage and total usage every five seconds. See this page for other possible configurations of the system input.
At this point the terminal window, you are running Influxd in, should be showing something like this:
Showing a successful write of the Telegraf metrics to the InfluxDb.
The next step is installing and starting Grafana.
Grafana works out of the box and can be reached via localhost:3000.
The next step is configuring our InfluxDB as the Grafana’s data source.
Step 1: Add Data Source
Step 2: Configure InfluxDB
Now we need a dashboard. As this question comes up quite often, we decided to provide a dashboard template, that displays some useful metrics for most MQTT deployments and should give you a good starting point for building your own individual dashboard tailored to your use case at hand. You can download the template here The JSON file inside the zip can be imported to Grafana.
Step 3: Import Dashboard
That’s it. We now have a working dashboard displaying metrics, who’s monitoring has proven vital in many MQTT deployments.
Disclaimer: This is one possibility and a good starting point we like to give you for monitoring your MQTT use case. Logically the requirements for your individual case may vary. We suggest reading the getting started guide from Grafana and to find what works best for you and your deployment.