Skip to content

Overcoming MQTT Cluster Sharding Challenges for IoT Scalability

by Anthony Olazabal
23 min read

In the first part of this series, Navigating the Challenges of MQTT Sharding for IoT Scalability, we discussed the challenges and limitations of the sharded approach. In this article, we address some of the challenges that come with sharded architecture.

How to Address Some of the MQTT Cluster Challenges

Imagine a company that offers connected services with mobile applications for the automotive industry. This company has expanded in Europe and in the United States. To provide the best service to millions of users, they need to rely on a communication solution that delivers performance, reliability, and scalability in order to connect more and more vehicles.

After a study of the communication protocols currently available, the company decided to focus on MQTT and the market-leading HiveMQ platform. To provide a worldwide service with minimum latency for users, they designed a sharded deployment with multiple clusters all around the world.

This can be summarized in a very high-level schema as shown below:

Example of connected services with mobile applications using MQTT Cluster Sharding

Now that we have the big picture, let's discuss the challenges faced and the potential solutions adopted.

Data Consistency

Maintaining data consistency between shards is a complex task that involves careful design and implementation. Here are several strategies commonly employed to address the challenge of data consistency in a sharded environment:

  1. Transaction Management:

    Use distributed transactions or two-phase commit protocols to ensure that operations across multiple shards are atomic. This ensures that either all changes are applied, or none at all, maintaining consistency.

  2. Synchronization Services:

    Use external services or tools for synchronization. This may involve having a centralized service that tracks changes and propagates them to the relevant shards. Tools like Apache Kafka can be used for distributed event streaming.

  3. Shard-Aware Application Logic:

    Design the application logic to be aware of the sharding strategy and implement custom logic to ensure consistency. This may involve routing requests to the appropriate shard and handling conflicts in a way that preserves overall data integrity.

  4. Shard-Level Locking:

    Implement locking mechanisms at the shard level to ensure that only one shard can modify a particular piece of data at a time. This approach helps prevent conflicts but may introduce contention and affect performance.

The choice of strategy depends on the specific requirements and constraints of the application, the level of consistency needed, and the performance considerations.

Load Balancing

Load balancing or geo load balancing involves distributing network traffic across multiple clusters located in different locations to optimize performance, availability, and redundancy. Managing load balancing typically involves considerations such as proximity to end-users, network latency, and failover capabilities. Here are key steps and strategies to effectively manage load balancing:

  1. Global Server Load Balancers (GSLB):

    Implement GSLB solutions to direct users to the nearest or most responsive server based on factors like server health, load, and proximity. GSLB solutions often work in conjunction with DNS to provide efficient global load balancing.

  2. Health Monitoring:

    Regularly monitor the health and performance of clusters across different geographical locations. Health checks can be implemented to ensure that users are directed to healthy clusters, improving the overall reliability of the system.

  3. Geographic DNS Resolution:

    Use DNS resolution mechanisms that consider the geographic location of the user. This can be achieved through DNS providers that offer geo-based resolution, directing users to the closest server based on their geographical location.

  4. Latency-based Routing:

    Employ latency-based routing to direct traffic to the cluster with the lowest round-trip time for a particular user. This helps optimize performance by minimizing network latency.

  5. Failover and Redundancy:

    Plan for failover and redundancy across multiple geographical locations. If one cluster becomes unavailable, traffic can be automatically redirected to an alternate location to ensure continuous service availability.

  6. Traffic Shaping and Prioritization:

    Implement traffic shaping and prioritization policies to manage and prioritize different types of traffic. This can help ensure that critical applications receive the necessary resources and that less critical traffic doesn't impact performance.

  7. Cloud-based Load Balancers:

    Leverage cloud-based load balancers provided by major cloud service providers. These services often include built-in global load balancing features and can seamlessly integrate with other cloud services.

  8. Dynamic Configuration and Automation:

    Implement dynamic configuration and automation tools to adjust load balancing settings based on changing conditions, such as traffic patterns, server health, or geographical changes.

  9. Dispatcher Service:

    Implement a worldwide distributed dispatcher service that is aware of load and client repartition between clusters. This service can listen for specific messages of Birth and Death (concept extracted from Sparkplug) to know if the client is online and where it is connected.

By combining these strategies and tools, you can effectively manage geo load balancing to deliver a more responsive, reliable, and scalable experience for users across different regions. The specific approach may vary depending on the architecture, requirements, and technologies employed in a given environment.

Fault Tolerance

Managing fault tolerance between shards in a distributed system involves implementing strategies to ensure system reliability in the face of node failures or other disruptions. Here are some key approaches to enhance fault tolerance in a sharded environment:

  1. Redundancy:

    Duplicate data across multiple sharded clusters. If one cluster fails, another can take over, ensuring continuity of service. This redundancy can be implemented at the data level, where each piece of data is stored on multiple shards, or at the service level, where entire shards have backup counterparts.

  2. Replication:

    Replicate data across multiple nodes or shards. Use technologies like master-slave replication or multi-master replication to ensure that data is consistently available even if one shard experiences a failure. Replication can be synchronous or asynchronous, depending on the trade-offs between consistency and performance.

  3. Automated Failover:

    Implement automated failover mechanisms that can quickly detect when a shard becomes unavailable and automatically redirect traffic to a healthy shard. This minimizes downtime and ensures continuous service availability.

  4. Health Monitoring:

    Regularly monitor the health of shards to identify potential issues before they lead to failures. Health checks can include checking the availability of resources, network connectivity, and overall system performance.

  5. Load Shedding:

    Implement load shedding mechanisms to dynamically redirect or limit traffic to specific shards during periods of high demand or when a shard is experiencing issues. This can prevent cascading failures and help maintain overall system stability.

  6. Isolation of Failures:

    Design the system in a way that failures in one shard do not impact the functionality of other shards. Isolating failures ensures that problems are contained and don't propagate across the entire system.

  7. Graceful Degradation:

    Implement strategies for graceful degradation, where the system can continue to operate with reduced functionality even if some shards are unavailable. This ensures that critical services remain accessible in the event of partial failures.

Implementing fault tolerance in a sharded system requires a combination of architectural choices, operational practices, and monitoring tools. The specific approach may vary based on the characteristics of the application, the level of fault tolerance required, and the constraints of the underlying infrastructure.

Cross-Shard Communication

Handling cross-shard communication is a critical aspect of managing a sharded system where data is distributed across multiple shards or partitions. Cross-shard communication refers to interactions between clients or processes residing in different shards. Managing this communication effectively is essential for maintaining data consistency and ensuring that the system functions cohesively. Here are several strategies for handling cross-shard communication:

  1. Local Operations Whenever Possible:

    Design the system to favor local operations within a shard. Minimize the need for cross-shard communication by organizing data and operations in a way that reduces dependencies on data from other shards.

  2. Event Sourcing:

    Implement an event-sourcing pattern where changes in one shard produce events that are consumed by other shards. This can help maintain eventual consistency and reduce the need for direct cross-shard queries.

  3. Global Data Store:

    Use a global data store or distributed cache that is accessible by all shards. This can act as a shared repository for certain types of data, reducing the need for direct cross-shard communication.

  4. Shard Routing Layer:

    Introduce a shard routing layer that is responsible for directing requests to the appropriate shard. This layer can be aware of the location of data and route requests accordingly, reducing the complexity of cross-shard communication for application logic.

  5. Shard Awareness in Application Logic:

    Design the application logic to be aware of the sharding strategy. This may involve custom logic to route requests to the appropriate shard based on the location of the data being accessed.

The choice of strategy depends on the specific requirements of the application, the level of consistency needed, and the performance considerations.


Handling elasticity in a sharded deployment involves adapting the system's capacity to varying workloads by dynamically scaling the number of shards clusters or nodes in clusters. Elasticity ensures that the system can efficiently handle increased demand and scale down during periods of lower demand. Here are strategies for handling elasticity in a sharded deployment:

  1. Auto-Scaling:

    Implement auto-scaling mechanisms that can dynamically adjust the number of shards or nodes based on predefined metrics such as CPU usage, memory consumption, or request throughput. Cloud providers often offer auto-scaling features that can be configured to meet specific requirements.

  2. Horizontal Scaling:

    Embrace horizontal scaling by adding more clusters to the system. This approach allows the system to handle increased load by distributing the workload across a larger number of resources.

  3. Elastic Load Balancing:

    Use elastic load balancers that can distribute incoming traffic evenly across multiple sharded clusters. This ensures that the load is balanced efficiently, and new clusters can be seamlessly added to the system.

  4. Predictive Scaling:

    Implement predictive scaling based on historical data and workload patterns. Machine learning algorithms or rule-based systems can predict future demand and adjust the number of shards accordingly.

  5. Cost Monitoring and Optimization:

    Continuously monitor and optimize costs associated with elasticity. Scaling up and down can impact infrastructure costs, so it's important to find the right balance between performance and cost efficiency.

  6. Automated Provisioning:

    Use automated provisioning tools to deploy and configure new clusters. Infrastructure as Code (IaC) tools can automate the process of creating and configuring resources, streamlining the scaling process.

  7. Dynamic Configuration Management:

    Implement dynamic configuration management to adjust system settings, such as connection pools, cache sizes, and thread pools, in real-time based on the current workload and capacity.

  8. Monitoring and Alerts:

    Set up comprehensive monitoring and alerting systems to detect changes in system behavior and trigger scaling actions. Proactive monitoring helps prevent performance issues and ensures that the system can adapt to changes in demand.

Elasticity in a sharded deployment requires a combination of architectural considerations, automation, and proactive monitoring. It's important to continuously evaluate and optimize the system's ability to scale based on evolving requirements and workloads.

Data Migration

Data migration in a sharded deployment involves moving or redistributing data across clusters. This process may be necessary when scaling the system, adding or removing shards, or reorganizing data to optimize performance. Handling data migration in a sharded environment requires careful planning and execution to minimize downtime, ensure data consistency, and avoid performance degradation. Here's a guide on how to handle data migration in a sharded deployment:

  1. Sharding Strategy Adjustment:

    If the goal of data migration is to optimize performance or accommodate changing requirements, consider adjusting the sharding strategy. This may involve redefining shard boundaries, introducing new shards, or merging existing ones.

  2. Automated Migration Scripts:

    Develop automated scripts or tools to handle the migration process. Automated scripts can ensure consistency and reduce the risk of human error during data transfer.

  3. Monitor and Validate:

    Implement monitoring during the migration process to track progress, detect anomalies, and ensure that the system continues to operate within acceptable performance parameters. Validate the data on both the source and destination shards to confirm the accuracy of the migration.

Data migration in a sharded environment is a complex task that requires thorough planning, testing, and coordination. By following best practices and leveraging automation, organizations can ensure a smooth and efficient data migration process while minimizing disruptions to the overall system.

To Wrap Up

In conclusion, MQTT sharding represents a powerful approach for scaling and optimizing communication in large-scale IoT deployments. By distributing data across multiple shards, organizations can achieve enhanced performance, improved fault tolerance, and the ability to handle massive workloads. However, navigating the world of MQTT sharding comes with both opportunities and challenges.

The strengths of MQTT sharding lie in its ability to provide scalability, reliability, and efficient resource utilization. Sharded clusters empower organizations to cater to the growing demands of IoT applications, where a multitude of devices seeks real-time, bi-directional communication. With features like load balancing, fault tolerance, and elastic scaling, MQTT sharding is well-suited for dynamic and resource-intensive IoT ecosystems.

Yet, as with any powerful tool, there are limitations to MQTT sharding. Challenges in maintaining message order, handling cross-shard communication, and ensuring consistency across shards require careful consideration and strategic planning. The complexity introduced by sharding demands expertise in development, maintenance, and operational aspects, making it imperative for organizations to weigh the benefits against the potential drawbacks.

As the IoT landscape continues to evolve, MQTT sharding stands out as a key solution for addressing the scalability needs of modern applications. By understanding the intricacies of sharding and tailoring its implementation to specific use cases, organizations can unlock the full potential of MQTT in building resilient and responsive communication infrastructures.

In the realm of MQTT sharding, success hinges on a nuanced understanding of the application requirements, thoughtful architectural design, and continuous monitoring and optimization. As organizations embark on the journey of sharded MQTT clusters, they find themselves at the forefront of innovation, ready to harness the full power of scalable and efficient communication in the dynamic world of IoT.

Anthony Olazabal

Anthony is part of the Solutions Engineering team at HiveMQ. He is a technology enthusiast with many years of experience working in infrastructures and development around Azure cloud architectures. His expertise extends to development, cloud technologies, and a keen interest in IaaS, PaaS, and SaaS services with a keen interest in writing about MQTT and IoT.

  • Contact Anthony Olazabal via e-mail
HiveMQ logo
Review HiveMQ on G2