Manufacturing process control relies on a loop of data collection, analysis, feedback, and feedback control. Data quality is essential for proper operations. There are many ways data quality is influenced and managed through manufacturing processes. Data is used in planning, architecture, process control during the actual production, data collection, and post-manufacturing analyses. In Part 1 of this series, we looked at using the K-means method to find clusters within data in an unsupervised manner.

In this second article, we’ll explore how a relatively simple yet powerful machine learning (ML) technique called Kernel Density Estimation can be used to rapidly ingest and analyze data to gain powerful insights into the operation of your devices, machinery, and sensors in IoT and IIoT applications.

By applying this technique, organizations can not only identify patterns in equipment performance but also predict potential failures, enabling effective predictive maintenance strategies. Furthermore, these insights can help optimize Overall Equipment Effectiveness (OEE) by reducing unplanned downtime, improving production efficiency, and enhancing the overall operational performance of industrial systems.

There is a fair amount of background information that has to be covered to introduce Kernel Density Estimation and its automated use in machine learning. This article introduces the background concepts, intuitions, basic interpretation, and related use case details. With the base information in mind, the next article in the series will start applying these concepts and show an example of how KDE can be used in an automated fashion to analyze signal data and make decisions about collected data.

In this article, we will walk you through how Kernel Density Estimation (KDE) is highly useful for IIoT applications due to its simplicity, scalability, and effectiveness in estimating the distribution of data for large datasets from a smaller sample. As IIoT systems generate massive amounts of data from sensors, machines, and other connected devices, KDE can play a critical role in deriving actionable insights.

What is Kernel Density Estimation (KDE)?

Kernel Density Estimation is an unsupervised machine learning algorithm with the goal of estimating the distribution or histogram of data given a sample. This method is predictive and relates to histograms of the data and the probability density function (PDF) of the underlying data. The kernel density estimate (KDE) is an estimate of the probability density function (PDF) for a measured sample of data.

At the heart of KDE is understanding and calculating how data is distributed. From this calculation, we get a statistical representation of the data we can use numerically in further applications. In the case of this article series, we’re setting up to use the KDE as a prime tool to measure and compare acquired signal data against well-known conditions. With these comparisons, we can make assessments about how well the distribution of data fits or not and take action like issuing warnings, alerts, and tracking quality and performance of plant-floor equipment.

Benefits of KDE

Computationally efficient - KDE can work on relatively small samples provided that they are representative
Good estimation - KDE provides a use statistical basis for estimating
Relevant - given ideal or expected samples, KDE is good at showing the effect of noise and other data aberrations

Applications of Kernel Density Estimation (KDE) Analysis for IIoT

In this article and the series that follows, we touch briefly on a few common applications of KDE for:

Noise filtering
Anomaly detection
Prediction of future behavior for equipment

Making it again useful, like K-means, in the domains of:

Equipment Baselining
Overall Equipment Effectiveness (OEE)
Predictive Maintenance (PM)

We continue with a brief review of the base scenario for this article series—the pizza dough manufacturing facility:

Predictive maintenance example for a fleet of pizza dough mixers at a fictional pizza dough manufacturer. As in the previous article, the manufacturing equipment plant floor has mixers generating data including current draw and power output. The PM/OEE and MES and other control clients are interested in performance of the equipment. (The diagram shows these clients consuming data from a plant broker.) These clients could also be configured to subscribe directly to HiveMQ Edge to do processing near the edge. In fact, given appropriate hardware at the edge, the processing could be co-located on the same hardware or virtual machine as HiveMQ Edge, thereby reducing the network traffic and increasing locality of reference for data. A benefit of edge processing is potentially shortening cycle times and improving detection response time.

Some Math and Statistics

At HiveMQ, we use MQTT as the lingua franca in machine-to-machine, machine-to-human, and human-to-machine communications. Using HiveMQ Edge as depicted above, we take signals and data from multiple, disparate devices, machines, and sensors, translate them into common, machine and human readable MQTT—based on well-ordered topic trees. Thereby, we harness the power of your organization’s data as a uniform and unified basis to make intelligent and near-real-time decisions at the operational technology layer (OT) where data is produced all the way through the information technology (IT) layers within your organization seamlessly.

Having harnessed the power of that machine data, let’s look at a few data samples and a few statistical charts to better understand what KDE is and how it works. First, we have the ideal or pure sample of motor current draw in an AC motor for these pizza dough mixers:

ideal or pure sample of motor current draw in an AC motor for these pizza dough mixers. As with all real-world processes, noise is unavoidable. In this next picture, we see the overlay of noise on the ideal signal:

Motor current signal sample: The overlay of noise on the ideal signal We would like to understand the role of noise, other data problems, and how they affect the measured sample. Doing so provides a good quantitative basis for computer algorithms to measure, detect, and finally respond to departures from normal.

Here we see the noise component broken out:

Motor current signal sample: Noise Finally, when the actual readings and samples are taken from the dough mixers, the sampled data looks like this:

Sampled data of actual readings and samples from the dough mixers Clearly, this is very distorted from the pure or ideal sample. Absolute thresholding, such as above or below 30A or -30A current draw, is easy to detect. But how do we know if something is wrong—a stuck dough hook, a failing motor, or a potential safety issue is happening or about to happen? To detect that, we need to see more samples of bad data.

Let’s look at signal where there is high noise and signal attenuation due to brown-out or other current drops:

High noise and signal attenuation due to brown-out or other current drops And, a sample with missing data:

Motor current signal (missing data with noise)

Histograms

We can see that these situations could be very challenging to detect with simple thresholds and range-based alarms. What we need is to view the signal differently. Enter the histogram, which gives us the familiar “bell curve” approach of viewing data in terms of how frequently data falls into low, medium, and high ranges:

Ideal Signal and Corresponding Histogram:

Ideal Signal and Corresponding Histogram Now, let’s look at the noisy signal:

Sampled data of actual readings and samples from the dough mixers and its histogram:

Ideal Signal and Corresponding Histogram (with noise) Wow! That histogram is telling for the noisy signal. And, knowing what the ideal signal should be, we can easily and visually detect the difference. The good thing is that we have a quantitative method—the histogram and its data to detect a problem.

Limitations of the Histogram

This histogram is great—if we have a very large and representative data set. However, in actual scenarios, we need to work in near-real-time and with limited data. In such cases with limited data, the histogram analysis of the distribution of the data will not be telling. Additionally, the histogram is discretized into a few buckets numbering 5-11. The histogram is useful, but it does not give us enough resolution. There is a possibility that the kind of analysis we want will be skewed or misrepresent the true signal behavior.

Kernel Density Estimation for Industrial Applications

The kernel density estimator may look daunting:

The kernel density estimator But viewing it in pseudo code, or Python-like code, is easy to calculate as a loop with multiplication and division:

sum = 0
for i in range(n):
  sum += K * ((x - X[i]) / h)

fhat *= 1/(n*h) * sum

This code is simple. And it optimizes very well on standard CPUs. It is highly parallelizeable. For larger datasets, the code can be optimized to run well on dedicated hardware such as GPGPUs or other ML accelerator hardware. All of these factors lend to KDE’s computation efficiency. This means it will run well on a range of systems, even constrained edge hardware, and give results quickly.

We are now precisely at the point where methods like Kernel Density Estimation are helpful. We’ll get into more detail about how to use the KDE method in Part 3 of this article series. But, for now, let’s look at the kernel density estimate (KDE) and the histogram for an ideal signal:

Kernel density estimate (KDE) and the histogram for an ideal signal And the KDE and histograms of the noisy signal:

The KDE and histograms of the noisy signal From both examples above, we can see that the KDE provides a more fine-grained estimate of the probability distribution of the signal under study. This additional detail in the KDE is helpful when comparing and fitting multiple estimates. KDE’s higher resolution lets us look at more of the signal spectrum, thereby increasing the sensitivity of the tests we can perform. Consequently, we can make better predictions about signals. So equipped, we can use these insights to improve our understanding of connected equipment, manufacturing processes, and overall plant behavior.

Usage of Analysis Results for Optimizing Operations in IIoT

Great! Now we have a visual means to easily compare signals. It is easy to see the differences among the pure, ideal signal, noisy signals, and signals with missing data or other abnormalities. The histogram shows these signal differences readily and plainly. From these comparisons we can designate “good” samples and differentiate them from “bad” signals.

Now what we need is a programmatic, numerical way to differentiate between good and bad signals. We also need a measure of how close-to-good the signal is or how much the signal departs from the ideal.

In statistics and machine learning, this value is typically called the goodness of fit measure. Using this value, we can set thresholds for how far from good we’ll accept. We use these thresholds to establish conditions such as “OK”, “degraded”, “invalid / error”, and more.

Statistical Tests of Fitness Using the K-S Test

Now, we’re ready to introduce the goodness-of-fit statistic. The Kolmogorov-Smirnov Goodness-of-Fit Test, or simply and more commonly known as the K-S Test, is perfect for the job here. The K-S Test is useful in comparing KDEs. KS-Test is a nonparametric test of equality of continuous distributions—real signals—that makes minimal assumptions about the underlying data. K-S Test provides a solid, single valued metric we’ll use for comparing two (2) kernel density estimates.

The two main metrics from the K-S test are the:

K-S Statistic (called D) - the measure of maximum distance between distributions
p-value - the measure of statistical significance. p-value < 0.05 implies that the two measured distributions may be from the same distribution.

For this application, we are going to pay more attention to the K-S statistic (D) as a measure of the difference between the two measured signals (samples). We’ll relax the stronger p-value interpretation, but not ignore it. From these values we’ll make classifications, comparisons, and ultimately decisions about the quality of the signal data we’ve collected from our MQTT-enabled devices.

Conclusion

We now have a basic understanding of the KDE, what sample data distributions look like, and how the KDE compares to a histogram. It is easy enough to see the differences, in the visuals and charts, in how the data are distributed. With the KDE calculated, we can now use these results to perform programmatic analysis of various samples and set the stage for classification and action on the measurements and calculations.

Stay tuned! In Part 3 of this series, we’ll explore applying the Kernel Density Estimation techniques and gaining insights from the results. We’ll go into more depth about the code and the automated use of the KDE.

Navigate this series:

Machine Learning in IIoT: Using K-means Clustering for Predictive Maintenance and OEE

Bill Sommers

Bill Sommers is a Technical Account Manager at HiveMQ, where he champions customer success by bridging technical expertise with IoT innovation. With a strong background in capacity planning, Kubernetes, cloud-native integration, and microservices, Bill brings extensive experience across diverse domains, including healthcare, financial services, academia, and the public sector. At HiveMQ, he guides customers in leveraging MQTT, HiveMQ, UNS, and Sparkplug to drive digital transformation and Industry 4.0 initiatives. A skilled advocate for customer needs, he ensures seamless technical support, fosters satisfaction, and contributes to the MQTT community through technical insights and code contributions.

Machine Learning in IIoT: Foundations for Using KDE for Predictive Maintenance and OEE