Artificial Intelligence

Machine Learning

16 марта

15 minutes

About Machine Learning in Simple Terms

In this article, we'll start exploring the field of machine learning. We'll try to understand what it is, what types exist, and what problems it can solve.

Vyacheslav Gorash

3D Graphics and Machine Learning Developer with 6 years of experience

This article is part of a series on the fundamentals of machine learning.

The internet currently has a huge number of articles about artificial intelligence, machine learning, and neural networks. And they're of very different levels, from very simple to those requiring serious mathematical knowledge. Therefore, when there's a desire to write another one (or rather — not one, but a small cycle), you need to immediately imagine what niche this article will occupy in all this variety.

And I decided to try to create some intermediate option from simple to complex. I assume that my series of articles is designed for a reader with knowledge of mathematics at the level of high school grades in a regular (non-physics-math) school and wanting to quite deeply understand the field of ML. So, if this topic interests you, let's get started.

Algorithms, ML, and Cakes

First of all, we need to understand what machine learning is as a whole. And for this, we'll have to go through the inevitable comparison of machine learning with classical algorithms.

A classical algorithm can be thought of as an instruction, each point of which is strictly executed. For example, it's easy to imagine a cake recipe: take so many grams of the first ingredient, add so much of the second, then more, then bake for a certain time at a certain temperature. And, if the recipe is well-written and the quality of ingredients doesn't change, following the recipe exactly will produce identical cakes.

But this exact following is also a major disadvantage of the classical approach: lack of flexibility. Let's imagine that the quality of some ingredient has changed, for example, flour. As a result, if we put it in strictly according to the recipe, the dough will be too thin or too thick. A confectioner will likely notice this and adjust the recipe. That is, the executor themselves starts changing the parameters of the algorithm they're working with. This is the basis of machine learning.

But for a person, such a recipe change is largely intuitive and based on their experience and common sense. For a computer, however, only instructions exist. And essentially, to implement machine learning, we need to create an algorithm that changes another algorithm.

The target algorithm (which we're actually changing) is usually called the model of machine learning. The process of tuning this model is what we call training. Thus, without going beyond strict instructions, we can achieve the algorithm flexibility we need.

Supervised or Unsupervised?

Now we need to figure out exactly how we perform training. And here we can draw an analogy with how people learn, or rather — children.

Let's say we're teaching a child the names of animals. We have a set of pictures, we show them to the child and ask who is in it. If the child names the animal incorrectly, we prompt them with who is actually there. And in this way, the child gradually learns all the names. This variant is called supervised learning. In other words, it implies the presence of someone who knows all the answers and can check the learner's answers.

From a machine learning perspective, everything will be very similar. But we'll have to resort to mathematical notation for the first time. So, our model (recall, this is the algorithm we change during training) we'll denote with the letter $F$ . Input data (in our case — the image we show to the model) — $X$ .* For each image, the model produces its prediction (what animal is in the picture). Let's denote such a prediction with the symbol $\widehat{y}$ (read as "y-hat"). In the end, we get:

$\widehat{y}\ = \ F(X)$

Also, for each $X$ we know the correct answer $y$ (y, but without the hat). Now all that's left is to compare them and understand how far the model's answer is from the truth. For this, we introduce the concept of a loss function. This function (let's denote it as $L$ ) compares the model's answer with the correct one and outputs a number $l$ :

$l\ = \ L(y,\ \widehat{y})$

If the model's answer matches the correct one, the number $l$ will be zero. Otherwise, it will be greater the further the model's answer is from the correct one. From here, we can conclude that the model should learn so that $l$ for any $X$ is as small as possible, ideally equal to zero. For this, we change our model according to certain rules, that is, the function $F$ . That is, speaking in mathematical language, the supervised learning problem reduces to selecting such a function $F$ that the sum of all losses is minimal:

$\sum_{i = 1}^{N}{L(y_{i},\ F(X_{i}))}\ \rightarrow \ 0$

$N$ in this formula is the number of training examples (data + correct answer) that we have.

However, there's another method, which, however, will work in a slightly different situation. Let's say we ask the same child to sort cards with animal drawings into three different boxes so that animals in one box are as similar to each other as possible. In this case, there's simply no known correct solution. There are a huge number of ways to solve the task, and each will be correct in some way (for example, by size, by color, by species, if the child already knows what that is, and so on). Learning in this form is unsupervised learning, that is, we don't show the child a known correct variant and ask them to repeat it. Instead, we give some initial conditions and the task itself. I'll repeat, this is only suitable for solving some tasks. For example, learning animal names, as we considered above, is not very possible without knowing the correct answers.

In the case of unsupervised learning, it's slightly more difficult to formally describe the process. The notation will be similar to the supervised case. We have our function $F(X)$ , which produces a result $\widehat{y}$ .* But there's no known correct $y$ in this case. What do we optimize then?

The answer is some internal quality function. It can be very different depending on the task. In the example above about the child and animal cards — it's some measure of similarity from the child's perspective, and it's different for different children. We try to minimize or maximize the value of this function depending on the task conditions. That is, in this case, everything is much less template-based and can vary greatly depending on the algorithm.

There's a third learning variant. But to describe it, let's move from teaching a child to another example (why — you'll understand now). Let's say we're biologist scientists studying mouse behavior. In a cage, a mouse has two buttons. If the mouse presses the first one, it gets food. If the second one — it gets an electric shock. Understandably, the mouse will press only the first button after some time. Now let's change the condition: food will appear from pressing the buttons in turn. The mouse will initially press the first one, but, to its surprise, instead of food, it will get a shock. After some number of attempts, the mouse will find a variant of how to press the buttons.

This type of learning is called reinforcement learning. Note that this method differs from those discussed above. On one hand, we don't show the mouse the correct sequence of presses, it finds it itself. That is, there's no teacher. But at the same time, we interact with the mouse, rewarding or punishing it. This distinguishes the method from unsupervised learning. There, we have no interaction at all.

Within the article series, I won't cover reinforcement learning, as it's a very extensive field with many nuances. So we'll limit ourselves to just a verbal description.

What Can Supervised Learning Do?

Now that we've figured out how to teach an algorithm something, let's see how this can be applied in practice.

And first, let's look at the problems that supervised learning can solve. Recall, in this case, we already have known correct answers. And the first such problem will be classification. Actually, we've already talked about this problem. Yes, this is that very guessing of animal names. In the case of classification, for each set of input data (this isn't necessarily an image, it can be text, video, numbers, a graph, and much more). The most important feature of the problem is what we expect at the output. And we expect a class, that is, one element from a finite set. For example, in the case described above, it will be animal names. And here the limitation of the set is very important. That is, there can be 10, 100, 1000 classes, but not infinity. And the number of classes is unchanging, we can't add a new one during the algorithm's operation. Usually, classes are encoded with numbers. For this, we simply number classes from zero to the maximum value. This is done because it's much easier for a computer to work with a number than with text.

Classification itself is also divided into types depending on how many classes there are and how they can be defined. The simplest type of classification is binary. The name speaks for itself. We have a choice of two options. For example, yes or no. If there are more than two options — it's already multi-class classification.

And it's very important not to confuse it with multi-label classification. Here we can assign several class labels at once. Let's imagine a situation where we need to distinguish photos of dogs and cats. We get three classes (dog, cat, none). But what to do if there's both a dog and a cat in the picture? We can introduce a fourth class (dog and cat together). But if there weren't two options, as in our case, but more, the number of classes would very quickly exceed all reasonable limits (for three species — that's already 8 classes, for four — 16, and so on). It's much more practical to allow assigning not just one class, but several at once (or zero). Then we only need two class labels. If there's no one in the picture, the output is zero labels. If there's a dog or cat in the picture, there will be one label. If both — two labels at once.

The next problem we'll look at is regression. As an example, we can use determining an animal's weight from a photo (although I'll emphasize again, most machine learning methods are applicable not only to photos, but to any input data). Here the main difference from classification is that our output is now not discrete (several possible options), but continuous. That is, the output can be absolutely any number, but most often from a given range.

By the way, it's worth noting that a classification problem can be represented as regression. For example, we have two classes. Then we can output a number from 0 to 1. If the number is less than 0.5 — it's the first class. If greater — the second. This number is essentially the probability that we have class 2. Similarly with multi-label. Only we predict not one such number, but several, one for each class. If the number is above the threshold (in our case 0.5), we consider that this class is present.

It's slightly more complicated with multi-class classification, when we need one class out of several. Here we can proceed in a similar way. We predict the probabilities of all classes, and then simply choose the most probable one for our case.

The next two problems mostly concern image and video processing. These are segmentation and detection. Here we need to not just say that an object is in the picture, but show where exactly it is. The difference between them is how we show this. In detection, we draw a rectangle around the object (so-called bounding box). In segmentation, we paint the entire object. That is, this is a more precise description of the object's boundaries. In the case of detection, the output data is the centers and sizes of the frames, as well as their class labels. In the case of segmentation, just numbers are not enough for us. Here the output is another image, in which the areas where we found objects are painted (this is the so-called segmentation mask).

Separately, I want to note recognition problems. This is a much more complex area, where the output is not numbers or pictures, but text — the most difficult category of data to process. Mainly, there are two types of recognition: text recognition (image → text) and speech recognition (sound → text). Such problems already require much more complex approaches to solve them.

What Can Unsupervised Learning Do?

Historically, unsupervised learning was most often used for clustering and dimensionality reduction problems. Clustering is grouping objects by their similarity. That is, the problem we considered when we started talking about unsupervised learning. Moreover, the number of clusters (groups of objects) can be both set in advance and determined by the algorithm itself during the process.

Dimensionality reduction is a task sufficiently close to clustering, but slightly more difficult to understand. Here we try to encode objects with numbers or sets of numbers so that the distance (difference between numbers) is smaller the closer the objects are to each other. Let's say we're encoding words. Let the word "red" be 1. Then we encode "scarlet" as 2 (close to red), "pink" as 5 (slightly further), and "blue" as 40 (far). That is, we represent our objects in a more compact form, but preserve the connections between them.

Today, unsupervised learning is actively used in generation problems. These are the well-known GPT models (stands for Generative Pretrained Transformer, a separate article about transformers is planned in the future), and image generators, and video generators. However, it's worth noting that training such models is a very complex process, and unsupervised learning can be only one of many stages.

Conclusion

In this article, I only overviewed the field of machine learning. In subsequent articles, I plan to delve deeper into topics and tell not only what machine learning algorithms do, but also how they do it. I'll try to examine algorithms in detail, starting from the simplest and ending with complex ones, such as neural networks of various architectures.

Questions and Answers

How We Taught Neural Networks to Reproduce Physically Accurate Textures

Describing an approach that converts complex multi-layered materials into compact neural representations and renders them in real time using ray tracing and hardware acceleration on GPU tensor cores.

On-Premise Deployment: Understanding and Benefits

This article explains what on-premise deployment is, its benefits for control, security, customization, and financial predictability, and offers practical recommendations for DevOps teams.

About Machine Learning in Simple Terms

Algorithms, ML, and Cakes

Supervised or Unsupervised?

What Can Supervised Learning Do?

What Can Unsupervised Learning Do?

Conclusion

Questions and Answers

Ready to discuss your project?

We will be happy to advise you in any of the available ways.

About Machine Learning in Simple Terms

Algorithms, ML, and Cakes

Supervised or Unsupervised?

What Can Supervised Learning Do?

What Can Unsupervised Learning Do?

Conclusion

Questions and Answers

How is machine learning different from regular programming?

What is a loss function and why is it needed?

When should I use supervised learning versus unsupervised learning?

What is the difference between classification and regression?

What is segmentation and how does it differ from detection?

Can a classification problem be solved using regression methods?

Why is dimensionality reduction needed?

Ready to discuss your project?

We will be happy to advise you in any of the available ways.