MQTT Essentials Part 10: Keep Alive and Client Take-Over

mqttessentials_part10

Welcome to the tenth part of MQTT Essentials, a blog series about the core features and concepts in the MQTT protocol. In this post we talk about the Keep Alive feature of MQTT and why this feature is especially important for mobile networks.

The problem of half-open TCP connections

MQTT is based on the Transmission Control Protocol (TCP). This protocol ensures that packets are transferred over the internet in a “reliable, ordered, and error-checked” way. Nevertheless, from time to time, the transfer between communicating parties can get out of sync. For example, if one of the parties crashes or has transmission errors. In
TCP, this state of incomplete connection is called a half-open connection. The important point to remember is that one side of the communication continues to function and is not notified about the failure of the other side. The side that is still connected keeps trying to send messages and waits for acknowledgements.

As Andy Stanford-Clark (the inventor of the MQTT protocol) points out, the problem with half-open connections increases in mobile networks:

“Although TCP/IP in theory notifies you when a socket breaks, in practice, particularly on things like mobile and satellite links, which often “fake” TCP over the air and put headers back on at each end, it’s quite possible for a TCP session to “black hole”, i.e. it appears to be open still, but in fact is just dumping anything you write to it onto the floor.”

Andy Stanford-Clark on the topic “Why is the keep-alive needed? (Source)

MQTT Keep Alive

MQTT includes a keep alive function that provides a workaround for the issue of half-open connections (or at least makes it possible to assess if the connection is still open).

Keep alive ensures that the connection between the broker and client is still open and that the broker and the client are aware of being connected. When the client establishes a connection to the broker, the client communicates a time interval in seconds to the broker. This interval defines the maximum length of time that the broker and client may not communicate with each other.

The MQTT specification says the following:

“The Keep Alive … is the maximum time interval that is permitted to elapse between the point at which the Client finishes transmitting one Control Packet and the point it starts sending the next. It is the responsibility of the Client to ensure that the interval between Control Packets being sent does not exceed the Keep Alive value. In the absence of sending any other Control Packets, the Client MUST send a PINGREQ Packet.”

As long as messages are exchanged frequently and the keep-alive interval is not exceeded, there is no need to send an extra message to establish whether the connection is still open.

If the client does not send a messages during the keep-alive period, it must send a PINGREQ packet to the broker to confirm that it is available and to make sure that the broker is also still available.

The broker must disconnect a client that does not send a a message or a PINGREQ packet in one and a half times the keep alive interval. Likewise, the client is expected to close the connection if it does not receive a response from the broker in a reasonable amount of time.

Keep Alive Flow

Let’s take a closer look at the keep alive messages. The keep alive feature uses two packets:

PINGREQ

pingreq

The PINGREQ is sent by the client and indicates to the broker that the client is still alive. If the client does not send any other type of packets (for example, a PUBLISH or SUBSCRIBE packet), the client must send a PINGREQ packet to the broker. The client can send a PINGREQ packet any time it wants to confirm that the network connection is still alive. The PINGREQ packet does not contain a payload.

PINGRESP

pingresp

When the broker receives a PINGREQ packet, the broker must reply with a PINGRESP packet to show the client that it is still available. The PINGRESP packet also does not contain a payload.

Good to Know

  • If the broker does not receive a PINGREQ or any other packet from a client, the broker closes the connection and sends the last will and testament message (if the client specified an LWT).
  • It is the responsibility of the MQTT client to set an appropriate keep alive value. For example, the client can adjust the keep-alive interval to its current signal strength.
  • The maximum keep alive is 18h 12min 15 sec.
  • If the keep alive interval is 0, the keep alive mechanism is deactivated.

Client Take-Over

Usually, a disconnected client tries to reconnect. Sometimes, the broker still has an half-open connection for the client. In MQTT, if the broker detects a half-open connection, it performs a ‘client take-over’. The broker closes the previous connection to the same client (determined by the client identifier), and establishes a new connection with the client. This behavior ensures that the half-open connection does not stop the disconnected client from re-establishing a connection.


That’s the end of part ten in our MQTT Essentials series. We hope that you have enjoyed the whole series. Although this is the last official post, we’re adding an MQTT Essential special edition next week on MQTT over Websockets.

Have a great week and see you on the next MQTT Monday! (We already have a lot of great ideas for future topics, so stay tuned for more helpful content about MQTT and HiveMQ.)

If you want to read more blog posts on MQTT, sign up for our newsletter and get notified as soon as a new post is available. If you prefer RSS, you can subscribe to our RSS feed here.

36 comments

  1. Pitouli says:

    This was super instructive !
    I loved the whole serie: I already had the basics, so part 1 to 5 were more “refreshing” than “necessary” in my case –but are surely an excellent start for a complete beginner– but I learned a lot of things in part 6 to 10.
    I especially loved the “Best Practices”, such as the “online/offline” based on LWT and Retain message…
    Thank you!

  2. Hi
    i was just wondering ; in MQTT-SN too , the client has to form a connection before publishing/subscribing ….isn’t this additional overhead ?

    1. Hi Uwe,

      thanks, we fixed that link!

      Best,
      Dominik from the HiveMQ Team

  3. Christopher Donovan says:

    Great Series, I am a beginner in the realm of IoT and messaging – very informative, well written, short but to the point and easy yo understand the concepts involved.

    Thanks.

  4. Ervin says:

    “It can happen that one of the communicating parties gets out of sync with the other, often due to a crash of one side or because of transmission errors”. Does this mean it was an ungraceful disconnect?

    “The important point is that the still functioning end is not notified about the failure of the other side and is still trying to send messages and wait for acknowledgements”. What about the Last Will and Testament?

    Might I have missed something important?

    1. Hi Ervin,

      the key point here is that a half-open connection looks like an open connection and thus the parties (or one side) thinks the connection is still OK although it’s essentially blackholing. LWT is only sent after a broken TCP connection was detected (which needs the keep-alive in order to circumvent half-open sockets).

      Hope this helps,
      Dominik from the HiveMQ

  5. Nitin Ratnakaran says:

    Thanks a lot for this series. I’ve been searching for martial on MQTT and this blog really explained all the concepts in simple language. Much appreciated.

  6. Oliver E says:

    Thanks for the series! Good intro.

  7. Thanks a lot for this series, I am new for MQTT but reading this series i understand better about MQTT.

    Thanks HiveMQ for this series.

  8. Kyle H says:

    Very informative series.

    Appreciate the work 🙂

  9. Max Morlock says:

    Is there a reason why the maximum keep alive period is exactly 18h 12min 15sec? Why not 12h or 24h?

    Thanks for the informative series!

    1. Hi Max,

      yes, there is a reason. For the keep alive value the MQTT protocol has allocated a size of 2 bytes. (see http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/errata01/os/mqtt-v3.1.1-errata01-os-complete.html#_Toc385349238)
      This makes it possible to store a number between 0 and 65,535.

      When you convert 65535 seconds into hours, minutes and seconds it is 18h 12min 15sec.

      Best,
      Christian from the HiveMQ Team

  10. Sudheer says:

    Thanks for the detailed series.

    Is there any association between TCP connection and client identifier? I assume broker can store client identifier and TCP connection information in the persistent session, if exists. What if client opted to not having a persistent session?

    1. Hi Sudheer,
      If a client is connected then its client identifier is indirectly associated with its TCP connection.
      If a client does not have a persistent session, the information is removed when the client disconnects or the TCP connection is gone.
      Hope that helps,
      The HiveMQ Team

  11. Sudheer says:

    Thanks for the response HiveMQ Team.

    I’m thinking of possibility to multiplex different client connections to broker using a load balancer. Since client identifier is sent only in the CONNECT packet, I believe it is not possible.
    Please let me know your thoughts on this.

    Thanks,
    Sudheer

    1. Hi Sudheer,

      you are right this is currently not possible.

      Regards,
      The HiveMQ Team.

  12. Ger says:

    The need to keep a TCP connection open all the time seems so contradictory with the typical MQTT networks: low bandwith, low energy, small footprint, etc. Specially in mobile networks where providers drop the connection regularly, this requires keep alive periods of max ten minutes or so and that is a lot of communication for battery powered systems. Specially if the field systems only need to send status information irregularly. In my systems there will be no messages for days or even weeks.
    I plan to deactivate the keep alive and setup a connection with the broker only if there is a message to be send. Problem is that I also need to send a message from a central system (typically a command or config change) to a field system and if there is no connection, the broker can’t send.
    in short: isn’t there a way the broker can start a connection if there is a message for a client ?

    1. Hi there,

      for typical MQTT use cases multiple SSL handshakes for reestablishing connections is more bandwidth extensive, than keeping the connection alive.
      According to the MQTT spec

      A Client always establishes the Network Connection to the Server.

      For networks taking advantage of NAT, initiating the connection is technically impossible for the broker.

      Hope that helps,
      Florian, from the HiveMQ Team.

  13. Sam says:

    When the client sends a PINGREQ message, the broker will reply with a PINGRES. But when the clients sends regular messages within the keep alive time and that no PINGREQ message is necessary, how will the broker inform the client that it is still alive?

    1. Hallo Sam,

      The primary function of keepAlive is letting the broker know that the client is still alive.
      While only sending QoS=0 messages over an extended period of time there is the possibility of a half-open connection occurring, since the client wont receive any acknowledgement for these messages.
      As Quality of Service 0 doesn’t provide any guarantees, this is in line with the MQTT spec.
      The client can always send extra PINGREQs regardless of the keepAlive intervals, if this is a major concern in your use case.

      Hope the helps.

      Kind regards,
      Florian, from the HiveMQ Team.

  14. Chau says:

    Hi,
    Is Client Take-Over is the default function in MQTT or it is customized for HIVEMQ? how to config this function to work?
    Thank you.

    1. Hallo Chau,

      Client Take-Over is part of the MQTT 3.1.1 spec.
      See 3.1.4 Response:

      If the ClientId represents a Client already connected to the Server then the Server MUST disconnect the existing Client

      HiveMQ does not need any configuration for this functionality to work.

      Kind regards,
      Florian, from The HiveMQ Team.

  15. Thank you for the kind words. We are glad you liked it!

    Kind regards,
    Florian from The HiveMQ Team.

  16. Code101 says:

    Hi,

    I have a doubt regarding MQTT Broker issue if someone can help me out on that with. Sometimes the broker connection is not getting established properly. I am porting MQTT on a controller and 210 error message is coming which says in the sheets that the SSL Handshake failure is happening. What can I do to prevent it?

  17. Antonio says:

    Can I connect a client and not disconnect it until the keepalive timer skips?

    That is, I want to implement a service where clients only connect to the broker once and send messages (publish, publish, publish….) without having to reconnect as long as the keepalive timer has not finished.

    1. Hallo Antonio,

      Nice to see you are taking an interest in MQTT and HiveMQ.
      A client does not have to reconnect ever. The keepAlive mechanisms is just a sort of control to identify broken connections.
      When your client has connected and a stable connection is in place, it can stay connected and publish or subscribe messages indefinitely.

      Hope this helps.
      Kind regards,
      Florian from The HiveMQ Team.

  18. Justin Eltoft says:

    In Client Take-Over, does this also act as a ping? I assume it must, because there may be a very little time left on the previous keep alive value. Thanks!

    1. Justin Eltoft says:

      Also I should have asked, what if the process is already started to send out the LWT? How much clean up can be assumed by client take-over? The client may not know if the LWT was sent out unless it too subscribes for watching for it’s own “death”? And then has to refute it. Or is the best practice on reconnect to always clean up an LWT and resend an “I’m online” message?

    2. Hi Justin,

      When a client take over occurs it was always triggered by the broker. Likewise the LWT is triggered by the broker.
      Whenever a client gets disconnected through the client take over mechanism the broker will NOT send the LWT.
      Hope this helps.

      Kind regards,
      Florian from The HiveMQ Team.

    3. Hi Justin,

      Nice to see you’re taking an interest in MQTT.
      In a client take over scenario the client that does not get disconnected has by definition just sent a CONNECT packet, which counts as an MQTT packet.
      Any MQTT packet will “reset” the keepAlive timer.
      I hope this helps.

      Kind regards,
      Florian from The HiveMQ Team.

  19. Mark Johnson says:

    Thank you so much for this AWESOME blog about MQTT. I learned so much about MQTT.

    1. Hi Mark,

      Thanks a lot for the kind words, we appreciate them.

      Kind regards,
      Abdullah from The HiveMQ Team.

  20. Farhad says:

    Thanks for the very informative blog posts – great MQTT tutorial! Quick question, what is the rule of thumb for setting the keep-alive value? If I typically send messages every minute, should I set it to 2 minutes? Also, how does QOS 1 vs QOS 0 affect keepalive functionality?

    1. Hi Farhad,

      Thank you for the kind words. We appreciate them.

      There is no rule of thumb that we can recommend for all. It’s entirely use case depended. If you’re expecting QoS 1 or 2 message every minute than a TTL of 2 minutes is perfect.

      It looks different when you’re sending messages with QoS 0. There is no acknowledgement of packets with QoS 0, so pinging is required to ensure that the connection is still healthy.

      Hope I could clear things up.

      Kind regards,
      Abdullah from the HiveMQ Team.

Leave a Reply

Your email address will not be published. Required fields are marked *