Evolution of Software Testing at HiveMQ
Written by Michael Walter
Published: April 14, 2023
In this blog post, I’ll detail what we improved over the years, the challenges we experienced, and the unique solutions we found while continuously improving the software testing process. If you are a software tester, I hope you can learn from our mistakes, successes, and software testing improvement methodology.
When I started working for HiveMQ (dc-square back then) in February 2017 software testing was already an established process in the company. Unit and integration tests, manual testing, and automated rolling upgrade tests for the HiveMQ broker everything existed already. Still, as a software tester, it was my job to improve the testing process where I could.
Testing of the HiveMQ broker
As mentioned, the developers already had thousands of unit/integration tests(UT/IT) running on our CI (continuous integration). But there is a problem with the integration tests. They do not test the end product a customer will interact with. Prime example, the zip that customers get contains an obfuscated jar to run the broker, whereas in an IT the broker is run via a custom Java class where nothing is obfuscated. So while everything works fine in integration tests, the same use case can fail when the customer runs the obfuscated broker, because a Java class was wrongly obfuscated.
This and the factor that we wanted to automate as much as possible of the ‘acceptance test’ phase, led us to create what we call the hivemq-testsuite (based on JUnit 5). It allows us to add system tests and run them automated locally or via the CI. A test in the testsuite, see code example below, gets a hivemq.zip (1) plus configuration (2,3,4 and more) as input, and based on that, the HiveMQ cluster is built (5). We run the test against this cluster, and we clean up the setup after the test. The test shown below currently does take 10 seconds.
Over roughly five years, almost 2.700 system tests were added that test the functionality of the HiveMQ broker automatically.
Many changes, be it dependency updates or code refactoring, could be tested just by running these tests. This saved, and still saves, a lot of hours of manual work that we can now use for non-trivial testing.
In case a test fails, all the information of the test setup of the test is kept. Easing the reporting, just zip the information and attach it to the ticket, which in turn helps during the debugging process of the issue.
Hurdles along the way
The refined process I just presented didn’t manifest all at once. Along the way there were challenges - not everything went smoothly. Below I will share some of the biggest issues we faced maintaining an ever-growing testsuite and our solutions to them.
One of the early issues was the time-draining cost associated with the duration of the test setup. We are talking about one minute just to set up a one node cluster.
Initially, we solely used Docker to create the test environment building the complete container image for each new test. It was a total time waster. To alleviate this, we used a base image containing reusable components. This resulted in the duration to create a one-node cluster falling from one minute to around 14s.
Another improvement we made was instead of running all tests in Docker, we added the functionality to create the setup locally, reducing the duration even more to around 7s. Resulting in the 10 seconds for the test shown above.
This worked well until there were just too many tests. If all tests were just single node tests, just to create the test environment we would still need 315 minutes (2.700 * 7s = 18.900s), and this would be the best-case scenario.
So even with all the improvements, our CI took around 90-minutes to execute all tests across 16 agents - quite long. The simple solution would have been to scale up the agent count, let’s say to 32, to cut the build time by half (45 minutes). I will describe how we achieved a similar duration by revisiting the testsuite.
First, we decided to review all existing tests and merge them where feasible. Around 1.000 tests were removed by this process without losing any coverage. You might find even a few unnecessary duplicate tests covering the same use case, as can happen when different people add tests.
To give an idea what kind of tests we merged, we had separate tests using MQTT 3 and MQTT 5 clients for many features, so, why not test both versions in one test?
Another big factor was the parameterization of tests. For example, we ran the same test with nine different combinations of Quality of Service levels for subscriber and publisher clients. In many cases, this could be reduced as seen below. Only parameterize the subscriber QoS and publish all possible QoS levels for each. This reduces the test amount from nine to three while keeping the quality of the test.
While merging tests, we came up with a second idea. Instead of continuing to merge tests, just let all tests in the same test class use the same HiveMQ test setup. For this, we went on a two-month journey reworking our tool that creates the test environment, so it could either build a setup per test or per test class, depending on JUnit’s lifecycle feature. Next, we had to split and rework the test classes so that test cases that only use the same environment are in one test class.
The result was quite astounding, the impact of the creation and deletion of the test environment per test can be clearly seen.
Despite improving the tests we only saw a real improvement in the total build time (sum of the duration of all agents) but not in the individual build time.
Here our last tip, if you also distribute your test classes across multiple agents, take a look at the execution times of your tests classes. The build time is only as fast as the slowest agent. If for example, tests are split across test classes, and there is just one test class that needs 60 minutes alone, you will never get a faster build time than that. The solution is quite simple, break up those classes into smaller classes until you reach a level where the load distribution across all agents is equal.
Summing up this section
- revisit tests and where feasible
- remove or merge tests
- if used, check if the parameterization can be reduced
- use one test setup for multiple tests
- make sure tests are equally distributed across the agents executing the tests
Currently, with all the above improvements to the testsuite, we achieved a build time reduction of originally 90 minutes to around 50 minutes without adding new agents. All this without completing the per-class rework.
Testing of the Enterprise Extensions
With our first enterprise extension, the Kafka extension, in development, we saw the need for system tests for extensions similar to the broker tests. The difference to the broker tests is that here we need, in addition to the broker, the extension itself and the intended third party service against which we can test the extension against. The solution was simple, we could already build extensions zips programmatically to test the functionality of the extension SDK’s of the broker via the testsuite. We just needed to add the feature that we could also build extensions out from an extension zip file. The only other requirement was to have the appropriate third party service available to test the extensions (for the Kafka extension, we need a running Kafka). As we already use Docker, the solution was Testcontainers as they provide modules like for Kafka and many more (Note: HiveMQ also provides its own module).
This worked really well for a long time, as for each new enterprise extension we found a Testcontainer module that let us test all features of the enterprise extension against the respective service. Only recently, with the release of the AWS Kinesis extension, did we run into problems where the respective module didn’t allow us to test all features against it. This issue appeared specifically when testing more than 2-3 shards, which is an integral feature of AWS Kinesis.
Rolling upgrade tests
Rolling upgrade tests are critical as they ensure customers can upgrade from their current HiveMQ version seamlessly to the next one. Therefore, it was a problem that our old framework for upgrade tests was not easy to maintain and take unnecessarily long to execute. The old framework used many scripts to create the setup and execute tests. As not everyone is fluent in scripting languages, we had the issue where almost no one added new tests to it. The test environment was built via terraform which created virtual machines per each test.
This resulted in around 90 minute build times for executing 12 tests. Therefore, we decided to migrate all the tests to a new framework based on Java as the language, as every developer in the company knew it and therefore could add tests.
We also decided to create the setup using Docker, which has less overhead than creating virtual machines. Attentive readers will now think “those two criteria are already covered with the testsuite,” this is correct, and we decided to use that framework.
We only needed to add a feature that allowed the testsuite to select two different HiveMQ versions the “old version”
After migrating all 12 tests to the new framework, we completed those tests in less than 10 minutes. So we successfully reduced our testing time. What about our second criteria to easily add tests?
A whopping 138 new rolling upgrade tests since the migration. Additionally, we still only have a ~37-minute build time. This is due to the new framework, being able to distribute the tests across multiple agents (in this case four).
Finally, I want to show you what test regime we can run daily and automated with the above described improvements/migrations.
As can be seen we test all our current supported broker/extension versions and make sure rolling upgrades don’t break deployments. Just thinking about running all those upgrade tests with our old framework sends shivers down my spine.
I would like to use the outro to thank my good buddy Abdullah (before Software Tester, now Software Engineer at HiveMQ) as without him tackling these big initiatives while still doing all the daily work would have been near impossible.