In recent years, artificial intelligence has evolved at an exponential rate, especially after a new kind, or perhaps an improvement, of machine learning emerged, namely deep learning.
Deep learning techniques renewed neural networks both by defining new algorithms (or rediscovering old ones, as recurrent networks) and by exploiting the computational power of modern processor units as GPUs.
The first application fields of such deep networks were image, video and speech recognition, and text classification. In this sense, such new networks, increasingly adopted in the last decade, just did the job of previous machine learning algorithms with better performances. Indeed they were superhuman performances, as deep networks recognising cancer better than experienced oncologists, or those outperforming chess world champions.
In a word, all those applications of deep networks are related to “classification”, thus to discriminate objects as belonging to a given cluster and to separate them by putting them in different classes, such as “cancer tissue”, “non-cancer tissue”, etc. But in recent times, deep networks have also been employed to perform the opposite operation, thus “to generate”. This is what Alberto Massidda’s speech at Codemotion Rome 2019 was all about.
Classification, or “discrimination” as it is sometimes termed, is a task which is very useful in any business area. In some sense, it helps to put order on the disorder of data, producing information from them.
In machine learning, machines learn how to discriminate according to two essential paradigms:
- supervised learning
- unsupervised learning
In the first case, the algorithm needs a set of classified data on which to train itself, so as to be able (hopefully) to discriminate new and unclassified data with good precision and accuracy. Examples of supervised learning are interpolation and regression algorithms, classical neural networks, etc. This paradigm has been successfully applied to computer vision, natural language processing, time series forecasting, etc.
Unsupervised learning does not need classified data to work. The training set is just a heap of unstructured data in which the algorithm tries to let emerge an “innermost” and hidden order. Examples of unsupervised learning are clustering algorithms, dimensionality reduction and feature extraction algorithms, self-organising maps, etc.
In any case, both kinds of algorithms learn mathematical functions whose values on a given item are used to discriminate. If the function is positive, then the item is of class A; if it is negative, them the item is of class B, etc.
The widespread success of classifying networks is due to the amount of data which, as in a second flood, the advent of Internet provided to all of us and the continual use of digital media by more than half the people in the world produce daily. In this flood there’s no god who sends the flood on us, we ourselves feed it! This huge amount of unstructured data is the right pasture for machine learning algorithms. And usually, the more data, the better the performance.
Interestingly, deep learning has not just the pro to outperform all previous classification algorithms. This new paradigm also allowed to accomplish the opposite task, thus to generate meaningful data. How is that possible?
The idea is that to generate a new thing we usually assemble pieces of things which already exist. For example, imagine that someone tells us “draw a bluebird on a brown branch”. Even if we never saw any blue bird on a red branch, we may have seen a bluebird on a roof and a brown branch in a park. Consequently, we can easily imagine (and thus draw) the blue bird on the brown branch, thus mixing things we actually saw to imagine things we never saw.
The same conceptual device applies to generative deep networks. In the first place, they are exposed to a huge amount of labelled data. For example, in a directory named “bird”, we put some thousands of pictures of birds, in a directory named “branches” we put some hundreds of pictures of branches, etc. Next, generative deep networks learn a function which is an approximation (or better said, an estimation) of the probability density by which things are sampled from our repository, so to be able to draw things maximising the probability that they match the query we are asked to answer.
Therefore, to generate is reduced to perform a sampling given a probability distribution, which is the empirical distribution of data the algorithm learn about. This means that a generative network can only imagine something it already saw, in one form or another, time and again.
Actually, as Alberto Massida nicely explained (even if the subject is not quite simple and technicalities may drown the newcomer) there’s an important difference between the explicit density estimation and the implicit density estimation to stress. The former postulates a specific form for the density to estimate which is parametrised by some variables, which are calibrated on the training data; the latter does not assume any stochastic model a priority and just tries to imitate the data patterns.
Generative networks stem from this second class of probability density estimators. They actually imitate data and train themselves to be able to replicate them. Alberto Masidda mentioned many popular algorithms based on generative networks, including WaveNet, Autoencoders, Deep fake, GANs, Face sum and subtraction, etc.
There’s no point here in explaining all those techniques, but listing all these techniques is intended to give just an idea of the extension of this growing research area.