Thiago de Faria, Solution Engineer at Linkit, strongly believes that the entire data science (ML, AI) pipeline should be processed online. This is only one of the cultural limitations of a developing world torn between computer science degrees and non-delivering AI Masters/PhD.
Why don’t we create an AI devops cloud-native app on K8S using blockchain backed by IoT devices that the user will experience with a VR headset? A truly disruptive digital transformation!
A marketing way to say absolutely nothing in an ICT-based slang.
There is a need to avoid all this if the goal is to be a state-of-the-art developer in machine learning and artificial intelligence.
“This is the 50th and last time this speech is to be given,” starts the solution engineer at Linkit and editor at ITNEXT: it has been a valuable performance on how to correctly approach machine learning programming.
“Thiago wakes up every day with one goal: develop happy high-performing teams to decrease time-to-market and build production-ready applications, always!”, states his Linkedin profile. The brilliant Brazilian is full of surprises. “I must say that not having a formal computer engineer formation was quite nice because I could focus on the real things. I worked over theoretical math, differential geometry and the like”, recalls the AI engineer. “My master studies were later on AI, but from a very theoretical standpoint, on neural networks executing support vector machine algorithms”.
From theory to developing, and then to engineering. “I understood that the problem was not the algorithm, not the calculation, but having the correct data, at the correct time, in the correct data store: that’s why I moved to the backend”. Needless to say, he’s a very big advocate of open source tools.
Thiago playing guitar at Codemotion Rome, 2019. His fingers are so fast to cloak the instrument. How fast can he code?
Culture is everything
A path made of more important steps is important, because the biggest problem is always about culture, from the ability to picture the real world from different points of view.
AI engineers usually come from Masters/PhD models for publications. For them, it’s ok having something of lower quality, even failing. Here is their cultural problem. They don’t need to deliver, they don’t compete in the real world. They only have a single, non-competitive point of view.
Fighting the downwards spiral
There is a timeless conflict between designers, developers and sellers. The fight between designers and developers seems simple. “The answer is this model”, states a designer, showing a physical new device he envisioned. “The solution is in my code!”, counterattacks the developer. This endless fight is solved by the commercial side: they listen to both and when meet the client they listens to him, maybe takes notes, then says something ending with “we have this beautiful Powerpoint!”.
Thiago’s definition of ML:
Make machines find patterns
without explicitly programming them to do so.
Thiago’s definition of AI:
Making computers capable of doing things
that when done by a human,
would be thought to require intelligence.
Local DS is dead. Don’t use that laptop!
The conflict also arises in ML/AI/DS projects. ML development is still in its infancy, making the problem even bigger than in the usual software development. It’s hard to track ML experiments to identify the correct mix of parameters, data and code leading to a good model. Different models can’t be easily moved among different tools. Each platform supports a different set of different algorithms. You can’t easily reproduce results, deploy it, standardise it. Observing software is hard. In ML this worsens; it is even worse in production.
Clearly there is a need for ML-based development. It can only be in the cloud.
It’s official: local data science is dead.
Mlflow is the answer
Thiago suggests writing AI projects on open-source platforms. This will make it easier to update the code, maintain the system and observe the executions.
In ML development you never know the true results of the tool you are using before testing it. This means the ML developer needs to try as many tools as possible for each step.
Mlflow by Databriks is an Open Machine Learning Platform and Framework. Hundreds of ML tools have been created to cover each phase of the ML lifecycle. Thiago strongly believes in Mlflow.
As for the version of code and data, Github is his preferred tool. All other steps can be processed inside Mlflow: evaluation, packaging, deployment and model serving.
Mlflow brings openness to both the interface and the code. It is designed to work with any ML library, algorithm, deployment tool or language. It’s built around REST APIs and simple data formats that can be used from a variety of tools. You can easily add MLflow to your existing ML code and deploy it inside your team. A model can be viewed as a Lambda function (in the serverless model) so you can immediately run it at a very small cost without having to take care of the needed hardware capacity.
You can also share workflow steps across organisations. Even more options are coming.