Predictive maintenance helps anticipate when maintenance should be performed on machinery. In many industries, this approach uses AI and machine learning techniques, which in turn need to be run effectively on specific devices.
Recently, Capgemini defined an interesting architecture that allows predictive maintenance AI models to be run on edge devices directly. This idea has several benefits, including faster detection of, and response to, anomalies and the ability to reduce interaction with the Cloud when needed.
The process also requires the implementation of a proper software architecture, in which technologies such as 5G might have a crucial impact (especially for high scalability).
This article provides an overview of predictive maintenance, with a special focus on how the AI solutions used for such tasks can be deployed on edge devices.
We also describe Capgemini’s approach to implementing predictive maintenance, which combines edge machine learning, cloud technologies and even 5G connectivity.
What is predictive maintenance?
With the increasingly wide spread of artificial intelligence and machine learning, the adoption of predictive maintenance can be seen as one of the most prominent examples of such data-driven solutions.
Considering the broad diffusion of IoT and AI sensors, especially in smart manufacturing, predictive maintenance allows the maximization of productivity and product quality while also reducing costs by scheduling preventive maintenance tasks.
In this context, from a technical point of view, predictive maintenance can be seen as a set of artificial intelligence techniques that make use of neural networks, deep learning and other machine learning solutions.
Such algorithms are generally fed with data gathered in the process of monitoring specific machinery, and are used to train models that can anticipate possible anomalies.
To better understand how predictive maintenance works, let’s see an example of its application.
Consider the case of a robotic arm used within a production chain in industry and imagine that this arm is programmed to move an object from one position to another.
In this context, the industry probably wants to use a set of sensors to measure the robotic arm’s performance and other parameters. For instance, the arm’s final positioning might be monitored by a camera and stored somewhere within a database.
At the same time, the arm joints can be also monitored – recording their temperature, for example.
If such data are continuously collected, along with the results of maintenance interventions, over a matter of months a reliable dataset that can be used to train a predictive model is created. This model may in turn anticipate when a specific maintenance intervention needs to take place, by observing the performance of the robotic arm in real-time along with temperature data.
Preventive maintenance vs. Predictive maintenance
The introduction of predictive models in such contexts is relatively recent. In the past, the traditional approach to machinery upkeep involved regular scheduling of maintenance. In parallel, any anomalies that arose were resolved as soon as possible and considered extraordinary events.
However, although these anomalous events may be extraordinary, they can have a broader impact on both the production chain and the industrial economy.
The above approach, often referred to as “preventive maintenance”, has thus two main drawbacks:
- It is difficult (if not impossible) to anticipate (and thus avoid) the occurrence of extraordinary maintenance events;
- Scheduled maintenance has a constant cost, which is unavoidable even when the machinery is performing in normal fashion.
As a general rule, the main benefits of predictive maintenance relate to both the points above. By continuously monitoring machinery and using a predictive machine learning model on real-time data, it becomes possible to anticipate extraordinary events and avoid their negative impact on production.
Moreover, instead of scheduling maintenance regularly, predictive maintenance models enable such activities to be carried out only when actually needed, thus reducing costs.
Predictive maintenance on edge devices
Like any other machine learning model, the models used for predictive maintenance need a training process. Data are used to optimize the internal parameters and make the models work properly.
The training process is usually done on highly performant machines, with one or more GPUs capable of significantly speeding up the heavy training process.
Today, companies usually rely on cloud services, such as Microsoft Azure AI. Such services have some interesting advantages.
Training models is a task that needs to be run only rarely: once a model is trained, it usually doesn’t require retraining – or at least, such activity can be scheduled quite infrequently.
Consequently, instead of buying and maintaining a powerful machine to use only rarely, for the training process, it is better to benefit from pay-per-use offers provided by the aforementioned Azure AI Services.
Once the models are trained and built (on the Cloud), they need to be deployed and put into use. The inference process (i.e., deriving predictions from the AI models by feeding them with new data) is usually a much less heavy-duty process than training.
However, AI models are usually deployed on the Cloud or a remote server, and the data are sent over the network to gather inference results.
Considering the applications of predictive maintenance, having the models deployed on a remote server might imply some drawbacks. For instance, network latency might be an issue in some contexts, where connectivity might be slow or unstable, implying slow response times.
This issue might be even worse if connectivity cannot be provided at all. Finally, even if connectivity is present and reliable, data transfer is always critical, especially if data contains sensitive information that needs to be properly secured.
To address such issues, in recent years, we have witnessed an increasing number of applications adopting a paradigm known as “edge machine learning”.
Here, the idea is to deploy the actual machine learning models locally, without the need to query an external cloud server. Applying this idea would allow the system to react more quickly to anomalies, with no latency due to possible network issues (which might be significant in some situations).
Moreover, the whole inference process can work without connectivity, allowing operators to activate the appropriate maintenance procedure in anomaly detection.
In terms of data security, this approach would avoid data transfer and could even store any sensitive data locally (which is a significant plus, if you think of all the limitations introduced by GDPR in Europe, to give just one example).
Capgemini’s architecture for predictive maintenance on the edge
It should be clear by now that implementing effective predictive maintenance can significantly impact productivity and reduce costs in companies that use machinery within their production chain.
It is therefore no surprise that Capgemini decided to invest significant effort into the development of a specific software architecture that supports predictive maintenance applications.
Capgemini designed this architecture by implementing edge machine learning, exploiting the increasing capabilities of IoT devices, which can be deployed to run AI models directly next to the machinery.
Such devices operate by collecting data from sensors in real-time, allowing continuous monitoring of the operations. Data is then processed locally from machine learning predictive models, which can reactively detect anomalies, benefitting from being as close as possible to the source.
As explained in the previous section, this approach also allows for operation in the absence of connectivity, which is a major plus in contexts where network availability might not be taken for granted.
When the machine learning models identify an anomaly, the edge device sends a notification to the monitored machinery to vary its operability accordingly (e.g. reducing speed, or activating an alternative operational state).
In parallel, the system sends a notification to a plant manager, thus opening a maintenance case and reactively managing it to solve the anomaly.
Digital twins: monitoring machine learning performance
While reactivity is granted by continuous execution on the edge devices, the Capgemini architecture is designed so that data gathered from sensors is also sent to a cloud service, Microsoft Azure AI.
The rationale behind this step is fascinating because it allows the machine learning models’ performance to be monitored.
In other words, the Cloud is used to assess whether or not the predictions are actually reliable. If they are not, a new training process is started. The new models can be automatically deployed on edge devices.
To implement such a sort of “meta-monitoring”, Capgemini relies on Microsoft Azure AI services to implement the “digital twin” paradigm. Such a service allows defining a digital replica of the monitored machine directly on the Cloud in more detail.
This replica is fed with the data received continuously from the edge devices, also used by the machine learning models on edge devices. With this double checking, if the predictions (made on the edge devices) do not coincide with the digital replica state (computed on the Cloud), the machine learning models might be considered less reliable.
If such reliability drops under a certain threshold, a re-training action is triggered to build a newly trained model which is then sent to the edge devices.
The role of 5G
The double-sided nature of such an architecture, which exploits both edge and cloud technologies, allows for a higher reliability and scalability level.
However, in cases where the number of edge devices is particularly high, network reliability and speed may become particularly relevant to overcoming congestion and the resulting slowness in anomaly response.
Consequently, the use of 5G is crucial in this context to support reliable and effective streaming of data to the servers.
In addition to the above, 5G can also increase the speed of communication, which can dramatically decrease latency down to 1 ms (20 times less compared to 4G).
The integration of 5G can also be introduced gradually and in a hybrid manner, supporting both new and traditional technologies and granting retro-compatibility.
This allows a better subdivision of the connectivity bandwidth and a smoother transition in terms of the costs required to convert all the devices to 5G.
Needless to say, independent of the integration of 5G within the network, local users can always connect both to edge devices (with lower latency) and to the central cloud. Here, data can be stored from edge devices located worldwide, or used for training more general AI models.
These two levels (edge and cloud) also allow the implementation of smart data segregation policies; for example, allowing local users to access only data from edge devices, but not that stored in the cloud.
Conclusions
Although implementing such solutions might require complex systems to be integrated in order to effectively react to anomalies and produce reliable predictions, the benefits of predictive maintenance seem evident.
Capgemini’s architecture uses both edge machine learning and cloud solutions (digital twins) to provide a highly reliable predictive maintenance system.
Although such a system can provide reliable results even with low connectivity, the idea’s scalability benefits dramatically from 5G connectivity, especially when the number of edge devices significantly increases.