Monitoring the HiveMQ MQTT Broker with Graphite and Collectl

When running any server software in production, it is crucial to monitor relevant metrics and stats of your application, your software environment and your server (operating system). This gives you the opportunity to quickly react to unforseen events or problems. It may also help in finding out the problem in disaster cases and to reproduce the errors. And last but not least, it is important to know you are not flying blind, not recognizing the arising need of your production software to scale out or you just want to know the peak times to develop a better understanding of your users.


There are tons of tools for monitoring available and most of them do their job very well. For collecting and visualizing stats we favor Graphite, because it’s dead easy to integrate other tools with it and there are many third party tools available, which perfectly integrate with Graphite. HiveMQ has native support for Graphite and it’s extremely simple to enable it by configuration. For monitoring the server environment itself, we are using Collectl, which also offers a native Graphite integration.

Some people find the standard Graphite visualizations unpleasant and fortunately there are many third party visualizations like Graphene or Giraffe. If these visualizations attract you more, it’s easy to use them instead of standard Graphite.

Here is an overview of the architecture, which we use to monitor HiveMQ:
Graphite HiveMQ Architecture


Graphite is a graphic system which monitors and displays stats of different data sources and is also capable of aggregating them if needed. It’s highly scalable and very easy to extend with new data sources.

It’s strongly recommended to use a dedicated server instance for the monitoring server in production because in disaster cases you want at least the monitoring server to be running, even if your monitored server dies.


Collectl is a handy tool for Linux servers, which collects stats about the server environment. Collectl is very friendly in terms of CPU and memory usage and is suitable for running in production environments. The built-in Graphite integration is trivial to configure. It may require some experimenting if you want to configure Collectl to monitor some very special system metrics, though.


Install + Configuring Graphite

There are several ways to install Graphite. Graphite itself is written in Python, so you probably need to install Python first. Please consult the official installation guide for more information.

Graphite itself does not need any special configuration to start monitoring. The only thing left to do on the Graphite side is to add different monitoring widgets to your dashboard. You have to do this after the first metrics are published, otherwise Graphite does not know which stats you want to display.

Make sure you configure your firewall properly. The Graphite standard port for receiving metrics is 2003.

Install + Configuring Collectl

When using a Linux distribution, you can use your favorite package manager to install Collectl. When using RHEL or a RHEL compatible Linux (like CentOS) execute the following command:

yum install collectl

If you want the most recent Collectl version, you can download it from the official website and replace the version installed by your package manager manually. On a RHEL execute the following commands:

wget YOUR_SOURCEFORGE_MIRROR/collectl-3.6.7-1.el6.noarch.rpm
yum localinstall collectl-3.6.7-1.el6.noarch.rpm

Now we are ready to start collecting metrics of the HiveMQ production server. Execute the following command to start collecting metrics:

collectl -i 2 -scdnm --export graphite,GRAPHITE_IP

This command will collect stats every 2 seconds of the CPU, the RAM, the discs and the network. If you want other stats, please consult the Collectl documentation

Configure HiveMQ

HiveMQ comes with a Graphite integration out of the box. Open your file and set the following properties:


That’s all, now HiveMQ is configured properly to report statistics every 5 seconds.

Creating a dashboard

All your data from Collectl and HiveMQ is now available in Graphite and you can proceed creating your dashboards as you want. Look here for a getting started guide to create Graphite Dashboards.


  1. Mark Seger says:

    Always good to see others who have discovered collectl. A couple of comments:
    – have you tried colmux? think top-anything for collectl metrics on multiple nodes
    – do you collect any local metrics? it IS possible to send a subset to graphite while recording much more locally at one rate AND send them to graphite at a different rate

    1. Hi Mark,

      thanks for suggesting colmux, we haven’t seen that tool yet. Looks very promising, especially when monitoring clusters (like our HiveMQ clusters 🙂 ) or for load tests where we have to deal with many MQTT clients.

      We do not collect any local stats for our environment at the moment, we use Graphite as our main metrics datasource and so all relevant stats from collectl are published to Graphite.

      Last but not least, thanks Mark for creating Collectl, it’s really dead simple to use and does the job very well!

  2. Mark Seger says:

    Always glad to hear from happy users. As a comment, never say you collect everything you need as you never know what you’re missing until it’s not there. For example, one thing that’s real easy to forget is if you look at disk summary stats, collectl got them by collecting detailed info on each disk and adding it all up. In some cases, and I don’t know about your situation, people could have hundreds of disks they’re not sending stats upstream about because it could overwhelm the centralized collector OR they’re only sending them upstream say every minute and losing all that great detailed info.

    For example, what I tend to do in m collectl.conf file for part of the DeamonCommand would be something like:

    –export graphite,i=60 -P –rawtoo -f/var/log/collectl

    then when you run it as a daemon, collectl will collect all stats locally at 10 second intervals (we actually collect stats every second within hpcloud) and send them to graphite every minute – though you also need to figure out is you want totals for the minute, or just the high/low or average.

    Anyhow, now if you see something weird going on, you can either plot the finer-grained data (the plot file is generated by -P) OR replay the collectl data stored in the raw file. Here you can look at individual disks, cpus and even networks. I’d bet you don’t send the queue depths or service times for each disk to graphite, or do you? 😉

    And finally, if you do get colmux via the collectl-utils package, try this trick:
    – build a file called foo with the addresses of a bunch of machines (I’ve done this with a few hundred at once)
    – install TermReadKey
    – run the command “colmux –addr foo –command “-scdnm”

    you’re now see real-time collectl data once a second sorted by the first column of output. if you use the right/left arrows you can move the sort column! there are a few other tricks but I’ll let you figure them our yourself, kind of like a video game.

    alternatively you can try the command:
    colmux -addr foo -command ‘-sD’

    and now you’ll see all disks across all servers! now you can sort by the service times and see which disks are slow. try that with graphite 😉

    have fun

Leave a Reply

Your email address will not be published. Required fields are marked *