HiveMQ Swarm Monitoring

Monitor Your Swarm with InfluxDB

InfluxDB is a widely-used open-source time-series database that is written in Go and optimized for fast, high-availability storage and retrieval of time-series data.
InfluxDB is a popular choice for gathering and visualizing application metrics.

You can configure the InfluxDB connection in two ways:

Environment-variable Based Configuration

The following environment variables are available for configuration of your InfluxDB connection:

Variable

Definition

SWARM_INFLUXDB_HOST

InfluxDB host

SWARM_INFLUXDB_DATABASE

The name of the InfluxDB database

SWARM_INFLUXDB_AUTH

Optional string used for basic HTTP authorization. For example, admin:admin.

SWARM_INFLUXDB_PREFIX

Optional prefix of all metrics with the specified string

SWARM_INFLUXDB_INTERVAL

Interval how often to send to influxDB (in seconds)

SWARM_INFLUXDB_TAG_HOSTNAME

Adds the InfluxDB tag hostname

SWARM_INFLUXDB_TAG_{SUFFIX}

Adds the InfluxDB tag {suffix}

File-based InfluxDB Configuration

If no InfluxDB configuration can be acquired via environment variables, HiveMQ Swarm looks for the configuration in the HiveMQ Swarm config.xml configuration file.

<swarm>
    <metrics>
        <influxDB>myInfluxDB</influxDB>
        <influxDBHost>http://localhost:8086</influxDBHost>
        <influxDBAuthString>authString</influxDBAuthString>
        <influxDBInterval>20m</influxDBInterval>
        <influxDBPrefix>myPrefix</influxDBPrefix>
        <influxDBTags>
            <influxDBTag>
                <key>myTag</key>
                <value>myValue</value>
            </influxDBTag>
        </influxDBTags>
    </metrics>
</swarm>

Monitor Your Swarm with Prometheus

Prometheus is a popular open-source solution for event monitoring and alerting.
Prometheus provides a simple and powerful dimensional data model, flexible query language, efficient time-series database, and real-time metrics.

When the rest service of HiveMQ Swarm is enabled, the metrics are provided via the /metrics endpoint.

REST service example configuration to enable metrics reporting
<swarm>
    <commander>
        <agents>
            <agent>
                <host>localhost</host>
                <port>3881</port>
            </agent>
        </agents>
    </commander>

    <rest>
        <enabled>true</enabled>
        <listeners>
            <http>
                <enabled>true</enabled>
                <bindPort>8080</bindPort>
                <bindAddress>0.0.0.0</bindAddress>
            </http>
        </listeners>
    </rest>
</swarm>
Be sure to verify that your Prometheus server can reach the IP address of the network interface.
The metrics of HiveMQ Swarm agents reset when a scenario finishes.
Since Prometheus gathers metrics periodically, it is possible that the final metric values do not get polled.
To avoid this we recommend adding a stage to the end of your scenario that waits for at least two times the Prometheus scraping interval. Use the delay command to do so.

Test Your REST Service Configuration

To test the configuration of your REST service, use your browser to navigate to <restservice-ip>:<restservice-port>/metrics. For example http://localhost:8181/metrics.

Information similar to the following verifies that the metrics are available:

# HELP com_hivemq_messages_incoming_publish_rate_total Generated from Dropwizard metric import
(metric=com.hivemq.messages.incoming.publish.rate, type=com.codahale.metrics.Meter)
# TYPE com_hivemq_messages_incoming_publish_rate_total counter
com_hivemq_messages_incoming_publish_rate_total 0.0
# HELP com_hivemq_messages_incoming_pubrec_rate_total Generated from Dropwizard metric import (metric=com.hivemq.messages.incoming.pubrec.rate, type=com.codahale.metrics.Meter)
# TYPE com_hivemq_messages_incoming_pubrec_rate_total counter
com_hivemq_messages_incoming_pubrec_rate_total 0.0
...

Install Prometheus

  1. Download Prometheus and install the Prometheus application on a machine of your choice.

For best results, we recommend that you do not run Prometheus on the same machine as HiveMQ Swarm.

A step by step Prometheus getting started guide and detailed configuration information are available in the Prometheus documentation.

  1. To enable Prometheus to gather metrics from HiveMQ Swarm, add a scrape configuration to your Prometheus configuration. Scrape from the <ip>:<port>\metrics address of the REST service.

  2. Open the web address of your Prometheus application and verify that HiveMQ Swarm metrics are visible.

global:
  scrape_interval: 15s
  query_log_file: /prometheus/query.log
scrape_configs:
  - job_name: 'swarm'
    scrape_interval: 5s
    metrics_path: '/metrics'
    static_configs:
      - targets: ['<agent-1>:8181', '<agent-2>:8181']
This example is for a 2 agent setup. If you want more agents, add the additional addresses to the targets.

Display HiveMQ Swarm Metrics in Prometheus

Prometheus provides built-in functionality to display metrics on-the-fly that can be helpful when you want an in-depth look into specific metrics that you do not monitor constantly. Navigate to http://localhost:9090/.

HiveMQ Swarm metrics in Prometheus

Frequently, Prometheus is used as a data source for monitoring dashboards such as Grafana. For a complete tutorial on how to set up a Grafana dashboard and use Prometheus as a data source, see HiveMQ - Monitoring with Prometheus and Grafana.

HiveMQ Swarm Metrics

This draft documentation accompanies the Early Access Release of HiveMQ Swarm. To learn more or give us feedback, contact us.
Or visit our Community Forum.

HiveMQ Swarm offers five types of metrics

Table 1. HiveMQ Swarm Metric Types
Metric Type Description

Gauge

A gauge returns a simple value at the point of time the metric was requested.

Counter

A counter is a simple incrementing and decrementing number.

Histogram

A histogram measures the distribution of values in a stream of data. They allow to measure min, mean, max, standard deviation of values and quantiles.

Meter

A meter measures the rate at which a set of events occur. Meters measure mean, 1-, 5-, and 15-minute moving averages of events.

Timer

A timer is basically a histogram of the duration of a type of event and a meter of the rate of its occurrence. It captures rate and duration information.

Table 2. HiveMQ Swarm Standard Metrics
Metric Name Type Description

commander_agents_connected_gauge

Gauge

The number of agents that are connected to the commander.

commander_scenario_stage_index_gauge

Gauge

The number of the scenario stage that is currently in progress. The value is 0 if no stage is currently in progress.

agent_connection_count

Counter

The number of connection attempts.

agent_connect_failed_count

Counter

The number of failed connect attempts.

agent_connect_successful_count

Counter

The number of successful connect attempts.

agent_publish_outgoing_count

Counter

The number of outgoing publish messages.

agent_publish_successful_count

Counter

The number of successful publish messages.

agent_publish_error_count

Counter

The number of successful publish messages.

agent_publish_incoming_count

Counter

The number of incoming publish messages.

agent_subscribe_outgoing_count

Counter

The number of outgoing subscribes.

agent_subscribe_successful_count

Counter

The number of successful subscribes.

agent_subscribe_failed_count

Counter

The number of failed subscribes.

agent_unsubscribe_outgoing_count

Counter

The number of outgoing unsubscribes.

agent_unsubscribe_successful_count

Counter

The number of successful unsubscribes.

agent_unsubscribe_failed_count

Counter

The number of failed unsubscribes.

agent_connect_outgoing_meter

Meter

The rate of connection attempts.

agent_connect_successful_meter

Meter

The rate of successful connection attempts.

agent_connect_failed_meter

Meter

The rate of failed connection attempts.

agent_subscribe_outgoing_meter

Meter

The rate of outgoing subscribes.

agent_subscribe_successful_meter

Meter

The rate of successful subscribes.

agent_subscribe_failed_meter

Meter

The rate of failed subscribes.

agent_unsubscribe_outgoing_meter

Meter

The rate of outgoing unsubscribes.

agent_unsubscribe_successful_meter

Meter

The rate of successful unsubscribes.

agent_unsubscribe_failed_meter

Meter

The rate of failed unsubscribes.

agent_publish_qos_0_outgoing_meter

Meter

The rate of outgoing QoS 0 publishes.

agent_publish_qos_0_successful_meter

Meter

The rate of successful QoS 0 publishes.

agent_publish_qos_0_failed_meter

Meter

The rate of failed QoS 0 publishes.

agent_publish_qos_0_outgoing_payload_meter

Meter

The rate of the payloads of outgoing QoS 0 publishes.

agent_publish_qos_1_outgoing_meter

Meter

The rate of outgoing QoS 1 publishes.

agent_publish_qos_1_successful_meter

Meter

The rate of successful QoS 1 publishes.

agent_publish_qos_1_failed_meter

Meter

The rate of failed QoS 1 publishes.

agent_publish_qos_1_outgoing_payload_meter

Meter

The rate of the payloads of outgoing QoS 1 publishes.

agent_publish_qos_2_outgoing_payload_meter

Meter

The rate of outgoing QoS 2 publishes.

agent_publish_qos_2_successful_meter

Meter

The rate of successful QoS 2 publishes.

agent_publish_qos_2_failed_meter

Meter

The rate of failed QoS 2 publishes.

agent_publish_qos_2_outgoing_payload_meter

Meter

The rate of the payloads of outgoing QoS 2 publishes.

agent_publish_total_outgoing_meter

Meter

The rate of outgoing publishes.

agent_publish_total_successful_meter

Meter

The rate of successful publishes.

agent_publish_total_failed_meter

Meter

The rate of failed publishes.

agent_publish_total_outgoing_payload_meter

Meter

The rate of the payloads of outgoing publishes.

agent_publish_qos_0_incoming_meter

Meter

The rate of outgoing QoS 0 publishes.

agent_publish_qos_0_incoming_payload_meter

Meter

The rate of the payloads of successful QoS 0 publishes.

agent_publish_qos_1_incoming_meter

Meter

The rate of outgoing QoS 1 publishes.

agent_publish_qos_1_incoming_payload_meter

Meter

The rate of the payloads of successful QoS 1 publishes.

agent_publish_qos_2_incoming_meter

Meter

The rate of outgoing QoS 2 publishes

agent_publish_qos_2_incoming_payload_meter

Meter

The rate of the payloads of successful QoS 2 publishes.

agent_publish_total_incoming_meter

Meter

The rate of outgoing total publishes

agent_publish_total_incoming_payload_meter

Meter

The rate of the payloads of successful total publishes.

agent_disconnect_outgoing_meter

Meter

The rate of outgoing connects