Artificial intelligence and machine learning are huge topics at the moment. So, when I saw the title of this talk at Codemotion Amsterdam 2019 I was intrigued. The talk was given by three developers from ING, who build components for the rest of the business. Read on to find out how they managed to create a solution that can efficiently serve ML models across multiple teams.
Effi Bennekers, Pierre Venter and Marcin Pakulnicki work as developers in ING’s Omnichannel and DevOps teams, providing tools and services for other teams to use. Many teams at ING now use machine learning (ML) models. But generally, the teams have worked in isolation from each other. Each team researches ML, locates a suitable model from an external source, trains it with their data and then uses it. But this is a bad idea because teams keep reinventing the wheel. So the challenge was to create a system that allowed teams to share their trained models.
Autonomy equals anarchy?
At ING, development teams have a lot of autonomy. This means they are free to choose what technology to use and to find their own solutions to problems. Over recent years, a trend has developed. Teams identify a problem that can be solved by machine learning. They do some research and choose a well-known ML library. They then source their own data and train a model. This model is then pushed to their repo and packaged into the source code.
So, what’s the problem, you might ask? Well firstly, ING teams have high churn. So, the next time the team sees an identical problem they go through the whole process again from scratch. Secondly, because models are being baked into production code, they can’t be updated when the data changes. Thirdly, there is little or no knowledge transfer between teams. This leads to wasted effort. Finally, all too often the models end up getting lost and have to be recreated from scratch.
Scaling is sensible
The solution was to build a system that can serve multiple models from a central repository. These models can be version-controlled and can be reused by multiple teams. The models should be accessible via APIs as shown below.
Automatic classification of feedback
As an illustration, the speakers showed how to improve the process for classifying user feedback. This user feedback comes from multiple sources such as reviews, Twitter, direct communications, etc. It needs to be properly categorised so that the appropriate team can action it. The old-fashioned approach involved employing people to classify all the feedback. Feedback was captured in a spreadsheet and then manually assigned to one of 52 very specific categories. Needless to say, that process was not very quick and gave rise to an internal saying “Banking in the rearview mirror”.
The aim was to create a system that was able to classify the feedback much faster. In turn, this would allow the relevant team to identify the problem and code a new feature for production. The solution was clearly to develop a machine learning system that can do this classification automatically. But this service has to be available to every team across the company. So, they came up with “The Voice”. This is a web app that can be used to classify any feedback by clustering and labelling it.
The Voice is effectively classifying user feedback using ML as a service. It receives input from users of the app. It then performs an ML/rules-based analysis. Then the feedback is immediately passed to the responsible team.
Data labelling and model training
As with all categorisation problems in ML, there are two distinct stages. Firstly, you need to label the data. Then you need to classify it.
The team knew they had unlabelled data to start with, but because they knew the number of categories they could easily use KMeans clustering to label the data. Having labelled the data, they could then classify it. Because of the relatively low sample count they used the SDG classifier. They then demonstrated how they can use this model for live feedback.
Containerising ML models
The next challenge is how to package a model like this so that it can be served via a suitable platform. Importantly, the actual training of the model is out of scope. The answer is to use Docker to containerise the models. They also needed to be able to serve multiple versions of the same model (e.g. for A/B testing). This needed them to use GRPC for routing. The initial version of the architecture used a combination of Docker, zookeeper, Kafka and a custom-built ML platform app.
The initial architecture worked well. But because ING is a bank there are regulatory compliance issues that need to be addressed. They also had to be able to scale and prevent this system from being a single source of failure. The solution to that was to containerise the whole system, packaging just the relevant models into each container. The resulting architecture is shown below.
The final set of challenges were around how to make the system handle data frames (the data abstraction used by most ML libraries) and how to streamline the build/release platform. The resulting solution uses GitLab Runner to coordinate things. The resulting pipeline has three phases:
Conclusions
Building a system that is capable of serving ML models in a scalable manner is hard. Doing so in a tightly regulated industry like banking is even harder. But the benefits are clear. The end result is a system that can serve multiple ML models and versions scalably, securely and efficiently. By containerising the models within a parent container, the team created a system that is robust, easy to deploy and doesn’t suffer from the curse of centralisation.