[{"data":1,"prerenderedAt":784},["ShallowReactive",2],{"article-id-en-ml-basic-1":3},{"id":4,"title":5,"body":6,"description":466,"extension":767,"meta":768,"navigation":778,"path":25,"seo":779,"stem":782,"__hash__":783},"content/en/blog/ml-basic-1.mdx","Ml Basic 1",{"type":7,"value":8,"toc":759},"minimark",[9,662],[10,11,12,16,33,36,39,44,47,50,53,60,71,75,78,85,151,191,256,297,370,445,461,470,477,533,536,543,546,553,556,560,563,570,581,588,595,601,604,607,618,624,631,635,642,648,655,659],"section-md",{},[13,14,15],"p",{},"This article is part of a series on the fundamentals of machine learning.",[17,18,19,27],"ul",{},[20,21,22],"li",{},[23,24,26],"a",{"href":25},"/en/blog/ml-basic-1","Part 1. About Machine Learning in Simple Terms",[20,28,29],{},[23,30,32],{"href":31},"/en/blog/ml-basic-2","Part 2. Linear Regression as Simple as It Gets",[13,34,35],{},"The internet currently has a huge number of articles about artificial\nintelligence, machine learning, and neural networks. And they're of very\ndifferent levels, from very simple to those requiring serious mathematical\nknowledge. Therefore, when there's a desire to write another one (or rather\n— not one, but a small cycle), you need to immediately imagine what niche\nthis article will occupy in all this variety.",[13,37,38],{},"And I decided to try to create some intermediate option from simple to\ncomplex. I assume that my series of articles is designed for a reader with\nknowledge of mathematics at the level of high school grades in a regular\n(non-physics-math) school and wanting to quite deeply understand the field\nof ML. So, if this topic interests you, let's get started.",[40,41,43],"h2",{"id":42},"algorithms-ml-and-cakes","Algorithms, ML, and Cakes",[13,45,46],{},"First of all, we need to understand what machine learning is as a whole.\nAnd for this, we'll have to go through the inevitable comparison of machine\nlearning with classical algorithms.",[13,48,49],{},"A classical algorithm can be thought of as an instruction, each point of\nwhich is strictly executed. For example, it's easy to imagine a cake recipe:\ntake so many grams of the first ingredient, add so much of the second, then\nmore, then bake for a certain time at a certain temperature. And, if the\nrecipe is well-written and the quality of ingredients doesn't change,\nfollowing the recipe exactly will produce identical cakes.",[13,51,52],{},"But this exact following is also a major disadvantage of the classical\napproach: lack of flexibility. Let's imagine that the quality of some\ningredient has changed, for example, flour. As a result, if we put it in\nstrictly according to the recipe, the dough will be too thin or too thick.\nA confectioner will likely notice this and adjust the recipe. That is, the\nexecutor themselves starts changing the parameters of the algorithm they're\nworking with. This is the basis of machine learning.",[13,54,55,56],{},"But for a person, such a recipe change is largely intuitive and based on\ntheir experience and common sense. For a computer, however, only instructions\nexist. And essentially, to implement machine learning, we need to create\n",[57,58,59],"strong",{},"an algorithm that changes another algorithm.",[13,61,62,63,66,67,70],{},"The target algorithm (which we're actually changing) is usually called the\n",[57,64,65],{},"model"," of machine learning. The process of tuning this model is what we\ncall ",[57,68,69],{},"training",". Thus, without going beyond strict instructions, we can\nachieve the algorithm flexibility we need.",[40,72,74],{"id":73},"supervised-or-unsupervised","Supervised or Unsupervised?",[13,76,77],{},"Now we need to figure out exactly how we perform training. And here we can\ndraw an analogy with how people learn, or rather — children.",[13,79,80,81,84],{},"Let's say we're teaching a child the names of animals. We have a set of\npictures, we show them to the child and ask who is in it. If the child names\nthe animal incorrectly, we prompt them with who is actually there. And in\nthis way, the child gradually learns all the names. This variant is called\n",[57,82,83],{},"supervised learning",". In other words, it implies the presence of someone\nwho knows all the answers and can check the learner's answers.",[13,86,87,88,111,112,126,127,150],{},"From a machine learning perspective, everything will be very similar. But\nwe'll have to resort to mathematical notation for the first time. So, our\nmodel (recall, this is the algorithm we change during training) we'll denote\nwith the letter ",[89,90,93],"span",{"className":91},[92],"katex",[94,95,97],"math",{"xmlns":96},"http://www.w3.org/1998/Math/MathML",[98,99,100,107],"semantics",{},[101,102,103],"mrow",{},[104,105,106],"mi",{},"F",[108,109,106],"annotation",{"encoding":110},"application/x-tex",". Input data (in our case — the image we show to the\nmodel) — ",[89,113,115],{"className":114},[92],[94,116,117],{"xmlns":96},[98,118,119,124],{},[101,120,121],{},[104,122,123],{},"X",[108,125,123],{"encoding":110},".* For each image, the model produces its prediction (what\nanimal is in the picture). Let's denote such a prediction with the symbol\n",[89,128,130],{"className":129},[92],[94,131,132],{"xmlns":96},[98,133,134,147],{},[101,135,136],{},[137,138,140,143],"mover",{"accent":139},"true",[104,141,142],{},"y",[144,145,146],"mo",{"stretchy":139},"^",[108,148,149],{"encoding":110},"\\widehat{y}"," (read as \"y-hat\"). In the end, we get:",[13,152,153],{},[89,154,156],{"className":155},[92],[94,157,158],{"xmlns":96},[98,159,160,188],{},[101,161,162,168,172,175,177,179,183,185],{},[137,163,164,166],{"accent":139},[104,165,142],{},[144,167,146],{"stretchy":139},[169,170,171],"mtext",{}," ",[144,173,174],{},"=",[169,176,171],{},[104,178,106],{},[144,180,182],{"stretchy":181},"false","(",[104,184,123],{},[144,186,187],{"stretchy":181},")",[108,189,190],{"encoding":110},"\\widehat{y}\\  = \\ F(X)",[13,192,193,194,207,208,221,222,225,226,240,241,255],{},"Also, for each ",[89,195,197],{"className":196},[92],[94,198,199],{"xmlns":96},[98,200,201,205],{},[101,202,203],{},[104,204,123],{},[108,206,123],{"encoding":110}," we know the correct answer ",[89,209,211],{"className":210},[92],[94,212,213],{"xmlns":96},[98,214,215,219],{},[101,216,217],{},[104,218,142],{},[108,220,142],{"encoding":110}," (y, but without the hat).\nNow all that's left is to compare them and understand how far the model's\nanswer is from the truth. For this, we introduce the concept of a ",[57,223,224],{},"loss\nfunction."," This function (let's denote it as ",[89,227,229],{"className":228},[92],[94,230,231],{"xmlns":96},[98,232,233,238],{},[101,234,235],{},[104,236,237],{},"L",[108,239,237],{"encoding":110},") compares the model's\nanswer with the correct one and outputs a number ",[89,242,244],{"className":243},[92],[94,245,246],{"xmlns":96},[98,247,248,253],{},[101,249,250],{},[104,251,252],{},"l",[108,254,252],{"encoding":110},":",[13,257,258],{},[89,259,261],{"className":260},[92],[94,262,263],{"xmlns":96},[98,264,265,294],{},[101,266,267,269,271,273,275,277,279,281,284,286,292],{},[104,268,252],{},[169,270,171],{},[144,272,174],{},[169,274,171],{},[104,276,237],{},[144,278,182],{"stretchy":181},[104,280,142],{},[144,282,283],{"separator":139},",",[169,285,171],{},[137,287,288,290],{"accent":139},[104,289,142],{},[144,291,146],{"stretchy":139},[144,293,187],{"stretchy":181},[108,295,296],{"encoding":110},"l\\  = \\ L(y,\\ \\widehat{y})",[13,298,299,300,313,314,327,328,341,342,355,356,369],{},"If the model's answer matches the correct one, the number ",[89,301,303],{"className":302},[92],[94,304,305],{"xmlns":96},[98,306,307,311],{},[101,308,309],{},[104,310,252],{},[108,312,252],{"encoding":110}," will be zero.\nOtherwise, it will be greater the further the model's answer is from the\ncorrect one. From here, we can conclude that the model should learn so that\n",[89,315,317],{"className":316},[92],[94,318,319],{"xmlns":96},[98,320,321,325],{},[101,322,323],{},[104,324,252],{},[108,326,252],{"encoding":110}," for any ",[89,329,331],{"className":330},[92],[94,332,333],{"xmlns":96},[98,334,335,339],{},[101,336,337],{},[104,338,123],{},[108,340,123],{"encoding":110}," is as small as possible, ideally equal to zero. For this,\nwe change our model according to certain rules, that is, the function ",[89,343,345],{"className":344},[92],[94,346,347],{"xmlns":96},[98,348,349,353],{},[101,350,351],{},[104,352,106],{},[108,354,106],{"encoding":110},".\nThat is, speaking in mathematical language, the supervised learning problem\nreduces to selecting such a function ",[89,357,359],{"className":358},[92],[94,360,361],{"xmlns":96},[98,362,363,367],{},[101,364,365],{},[104,366,106],{},[108,368,106],{"encoding":110}," that the sum of all losses is\nminimal:",[13,371,372],{},[89,373,375],{"className":374},[92],[94,376,377],{"xmlns":96},[98,378,379,442],{},[101,380,381,401,432,434,437,439],{},[382,383,384,387,398],"msubsup",{},[144,385,386],{},"∑",[101,388,389,392,394],{},[104,390,391],{},"i",[144,393,174],{},[395,396,397],"mn",{},"1",[104,399,400],{},"N",[101,402,403,405,407,414,416,418,420,422,428,430],{},[104,404,237],{},[144,406,182],{"stretchy":181},[408,409,410,412],"msub",{},[104,411,142],{},[104,413,391],{},[144,415,283],{"separator":139},[169,417,171],{},[104,419,106],{},[144,421,182],{"stretchy":181},[408,423,424,426],{},[104,425,123],{},[104,427,391],{},[144,429,187],{"stretchy":181},[144,431,187],{"stretchy":181},[169,433,171],{},[144,435,436],{},"→",[169,438,171],{},[395,440,441],{},"0",[108,443,444],{"encoding":110},"\\sum_{i = 1}^{N}{L(y_{i},\\ F(X_{i}))}\\  \\rightarrow \\ 0",[13,446,447,460],{},[89,448,450],{"className":449},[92],[94,451,452],{"xmlns":96},[98,453,454,458],{},[101,455,456],{},[104,457,400],{},[108,459,400],{"encoding":110}," in this formula is the number of training examples (data + correct\nanswer) that we have.",[13,462,463],{},[464,465],"img",{"alt":466,"height":467,"src":468,"width":469},"","4.406038932633421in","/img/blog/ml-basic-1/image1.png","6.284351487314086in",[13,471,472,473,476],{},"However, there's another method, which, however, will work in a slightly\ndifferent situation. Let's say we ask the same child to sort cards with\nanimal drawings into three different boxes so that animals in one box are\nas similar to each other as possible. In this case, there's simply no known\ncorrect solution. There are a huge number of ways to solve the task, and\neach will be correct in some way (for example, by size, by color, by\nspecies, if the child already knows what that is, and so on). Learning in\nthis form is ",[57,474,475],{},"unsupervised learning,"," that is, we don't show the child a\nknown correct variant and ask them to repeat it. Instead, we give some\ninitial conditions and the task itself. I'll repeat, this is only suitable\nfor solving some tasks. For example, learning animal names, as we considered\nabove, is not very possible without knowing the correct answers.",[13,478,479,480,500,501,518,519,532],{},"In the case of unsupervised learning, it's slightly more difficult to\nformally describe the process. The notation will be similar to the supervised\ncase. We have our function ",[89,481,483],{"className":482},[92],[94,484,485],{"xmlns":96},[98,486,487,497],{},[101,488,489,491,493,495],{},[104,490,106],{},[144,492,182],{"stretchy":181},[104,494,123],{},[144,496,187],{"stretchy":181},[108,498,499],{"encoding":110},"F(X)",", which produces a result ",[89,502,504],{"className":503},[92],[94,505,506],{"xmlns":96},[98,507,508,516],{},[101,509,510],{},[137,511,512,514],{"accent":139},[104,513,142],{},[144,515,146],{"stretchy":139},[108,517,149],{"encoding":110},".*\nBut there's no known correct ",[89,520,522],{"className":521},[92],[94,523,524],{"xmlns":96},[98,525,526,530],{},[101,527,528],{},[104,529,142],{},[108,531,142],{"encoding":110}," in this case. What do we optimize then?",[13,534,535],{},"The answer is some internal quality function. It can be very different\ndepending on the task. In the example above about the child and animal\ncards — it's some measure of similarity from the child's perspective, and\nit's different for different children. We try to minimize or maximize the\nvalue of this function depending on the task conditions. That is, in this\ncase, everything is much less template-based and can vary greatly depending\non the algorithm.",[13,537,538],{},[464,539],{"alt":466,"height":540,"src":541,"width":542},"4.341666666666667in","/img/blog/ml-basic-1/image2.png","6.270138888888889in",[13,544,545],{},"There's a third learning variant. But to describe it, let's move from\nteaching a child to another example (why — you'll understand now). Let's\nsay we're biologist scientists studying mouse behavior. In a cage, a mouse\nhas two buttons. If the mouse presses the first one, it gets food. If the\nsecond one — it gets an electric shock. Understandably, the mouse will\npress only the first button after some time. Now let's change the condition:\nfood will appear from pressing the buttons in turn. The mouse will initially\npress the first one, but, to its surprise, instead of food, it will get a\nshock. After some number of attempts, the mouse will find a variant of how\nto press the buttons.",[13,547,548,549,552],{},"This type of learning is called ",[57,550,551],{},"reinforcement learning."," Note that this\nmethod differs from those discussed above. On one hand, we don't show the\nmouse the correct sequence of presses, it finds it itself. That is, there's\nno teacher. But at the same time, we interact with the mouse, rewarding or\npunishing it. This distinguishes the method from unsupervised learning.\nThere, we have no interaction at all.",[13,554,555],{},"Within the article series, I won't cover reinforcement learning, as it's\na very extensive field with many nuances. So we'll limit ourselves to just\na verbal description.",[40,557,559],{"id":558},"what-can-supervised-learning-do","What Can Supervised Learning Do?",[13,561,562],{},"Now that we've figured out how to teach an algorithm something, let's see\nhow this can be applied in practice.",[13,564,565,566,569],{},"And first, let's look at the problems that supervised learning can solve.\nRecall, in this case, we already have known correct answers. And the first\nsuch problem will be ",[57,567,568],{},"classification."," Actually, we've already talked about\nthis problem. Yes, this is that very guessing of animal names. In the case\nof classification, for each set of input data (this isn't necessarily an\nimage, it can be text, video, numbers, a graph, and much more). The most\nimportant feature of the problem is what we expect at the output. And we\nexpect a class, that is, one element from a finite set. For example, in the\ncase described above, it will be animal names. And here the limitation of\nthe set is very important. That is, there can be 10, 100, 1000 classes, but\nnot infinity. And the number of classes is unchanging, we can't add a new\none during the algorithm's operation. Usually, classes are encoded with\nnumbers. For this, we simply number classes from zero to the maximum value.\nThis is done because it's much easier for a computer to work with a number\nthan with text.",[13,571,572,573,576,577,580],{},"Classification itself is also divided into types depending on how many\nclasses there are and how they can be defined. The simplest type of\nclassification is ",[57,574,575],{},"binary",". The name speaks for itself. We have a choice\nof two options. For example, yes or no. If there are more than two options —\nit's already ",[57,578,579],{},"multi-class classification",".",[13,582,583,584,587],{},"And it's very important not to confuse it with ",[57,585,586],{},"multi-label","\nclassification. Here we can assign several class labels at once. Let's imagine\na situation where we need to distinguish photos of dogs and cats. We get\nthree classes (dog, cat, none). But what to do if there's both a dog and a\ncat in the picture? We can introduce a fourth class (dog and cat together).\nBut if there weren't two options, as in our case, but more, the number of\nclasses would very quickly exceed all reasonable limits (for three species —\nthat's already 8 classes, for four — 16, and so on). It's much more practical\nto allow assigning not just one class, but several at once (or zero). Then\nwe only need two class labels. If there's no one in the picture, the output\nis zero labels. If there's a dog or cat in the picture, there will be one\nlabel. If both — two labels at once.",[13,589,590,591,594],{},"The next problem we'll look at is ",[57,592,593],{},"regression",". As an example, we can\nuse determining an animal's weight from a photo (although I'll emphasize\nagain, most machine learning methods are applicable not only to photos, but\nto any input data). Here the main difference from classification is that\nour output is now not discrete (several possible options), but continuous.\nThat is, the output can be absolutely any number, but most often from a\ngiven range.",[13,596,597],{},[464,598],{"alt":466,"height":599,"src":600,"width":542},"5.884722222222222in","/img/blog/ml-basic-1/image3.png",[13,602,603],{},"By the way, it's worth noting that a classification problem can be\nrepresented as regression. For example, we have two classes. Then we can\noutput a number from 0 to 1. If the number is less than 0.5 — it's the first\nclass. If greater — the second. This number is essentially the probability\nthat we have class 2. Similarly with multi-label. Only we predict not one\nsuch number, but several, one for each class. If the number is above the\nthreshold (in our case 0.5), we consider that this class is present.",[13,605,606],{},"It's slightly more complicated with multi-class classification, when we\nneed one class out of several. Here we can proceed in a similar way. We\npredict the probabilities of all classes, and then simply choose the most\nprobable one for our case.",[13,608,609,610,613,614,617],{},"The next two problems mostly concern image and video processing. These are\n",[57,611,612],{},"segmentation"," and ",[57,615,616],{},"detection",". Here we need to not just say that an\nobject is in the picture, but show where exactly it is. The difference between\nthem is how we show this. In detection, we draw a rectangle around the object\n(so-called bounding box). In segmentation, we paint the entire object. That\nis, this is a more precise description of the object's boundaries. In the\ncase of detection, the output data is the centers and sizes of the frames,\nas well as their class labels. In the case of segmentation, just numbers are\nnot enough for us. Here the output is another image, in which the areas where\nwe found objects are painted (this is the so-called segmentation mask).",[13,619,620],{},[464,621],{"alt":466,"height":622,"src":623,"width":542},"3.8777777777777778in","/img/blog/ml-basic-1/image4.png",[13,625,626,627,630],{},"Separately, I want to note ",[57,628,629],{},"recognition"," problems. This is a much more\ncomplex area, where the output is not numbers or pictures, but text — the\nmost difficult category of data to process. Mainly, there are two types of\nrecognition: text recognition (image → text) and speech recognition\n(sound → text). Such problems already require much more complex approaches\nto solve them.",[40,632,634],{"id":633},"what-can-unsupervised-learning-do","What Can Unsupervised Learning Do?",[13,636,637,638,641],{},"Historically, unsupervised learning was most often used for clustering and\ndimensionality reduction problems. ",[57,639,640],{},"Clustering"," is grouping objects by\ntheir similarity. That is, the problem we considered when we started talking\nabout unsupervised learning. Moreover, the number of clusters (groups of\nobjects) can be both set in advance and determined by the algorithm itself\nduring the process.",[13,643,644,647],{},[57,645,646],{},"Dimensionality reduction"," is a task sufficiently close to clustering,\nbut slightly more difficult to understand. Here we try to encode objects\nwith numbers or sets of numbers so that the distance (difference between\nnumbers) is smaller the closer the objects are to each other. Let's say we're\nencoding words. Let the word \"red\" be 1. Then we encode \"scarlet\" as 2\n(close to red), \"pink\" as 5 (slightly further), and \"blue\" as 40 (far).\nThat is, we represent our objects in a more compact form, but preserve the\nconnections between them.",[13,649,650,651,654],{},"Today, unsupervised learning is actively used in ",[57,652,653],{},"generation"," problems.\nThese are the well-known GPT models (stands for Generative Pretrained\nTransformer, a separate article about transformers is planned in the future),\nand image generators, and video generators. However, it's worth noting that\ntraining such models is a very complex process, and unsupervised learning\ncan be only one of many stages.",[40,656,658],{"id":657},"conclusion","Conclusion",[13,660,661],{},"In this article, I only overviewed the field of machine learning. In\nsubsequent articles, I plan to delve deeper into topics and tell not only\nwhat machine learning algorithms do, but also how they do it. I'll try to\nexamine algorithms in detail, starting from the simplest and ending with\ncomplex ones, such as neural networks of various architectures.",[663,664,666,681,694,707,720,733,746],"faq",{"title":665},"Questions and Answers",[667,668,670,676],"faq-item",{"value":669},"item-1",[671,672,673],"template",{"v-slot:question":466},[13,674,675],{},"How is machine learning different from regular programming?",[671,677,678],{"v-slot:answer":466},[13,679,680],{},"In classical programming, a developer manually writes all the rules and\ninstructions — the algorithm strictly follows a given recipe. In machine\nlearning, the algorithm adjusts its own parameters based on data. Roughly\nspeaking, instead of writing rules, we give the computer examples and\nask it to find patterns on its own.",[667,682,684,689],{"value":683},"item-2",[671,685,686],{"v-slot:question":466},[13,687,688],{},"What is a loss function and why is it needed?",[671,690,691],{"v-slot:answer":466},[13,692,693],{},"A loss function measures how far the model's prediction is from the\ncorrect answer. The larger the error, the greater the function's value.\nThe goal of training is to adjust the model's parameters so that the\ntotal loss across all examples is as small as possible.",[667,695,697,702],{"value":696},"item-3",[671,698,699],{"v-slot:question":466},[13,700,701],{},"When should I use supervised learning versus unsupervised learning?",[671,703,704],{"v-slot:answer":466},[13,705,706],{},"If you have labeled data — that is, for every input example you know the\ncorrect answer — use supervised learning. This covers classification,\nregression, detection, and other tasks. If there are no correct answers\nand you need to find structure in the data (for example, group similar\nobjects or compress data descriptions) — unsupervised learning is the\nway to go.",[667,708,710,715],{"value":709},"item-4",[671,711,712],{"v-slot:question":466},[13,713,714],{},"What is the difference between classification and regression?",[671,716,717],{"v-slot:answer":466},[13,718,719],{},"In classification, the model predicts one of several predefined classes\n— for example, \"cat,\" \"dog,\" or \"bird.\" The set of options is finite.\nIn regression, the output is a number from a continuous range — for\nexample, an animal's weight or an apartment's price.",[667,721,723,728],{"value":722},"item-5",[671,724,725],{"v-slot:question":466},[13,726,727],{},"What is segmentation and how does it differ from detection?",[671,729,730],{"v-slot:answer":466},[13,731,732],{},"Both tasks work with images and determine the location of objects. In\ndetection, a rectangle (bounding box) is drawn around each object. In\nsegmentation, the exact contour of the object is highlighted pixel by\npixel — this provides more precise information about its boundaries, but\nrequires more computational resources.",[667,734,736,741],{"value":735},"item-6",[671,737,738],{"v-slot:question":466},[13,739,740],{},"Can a classification problem be solved using regression methods?",[671,742,743],{"v-slot:answer":466},[13,744,745],{},"Yes, this is a common technique. For binary classification, the model\ncan output a number from 0 to 1, which is interpreted as the probability\nof belonging to the second class. If the number is below 0.5, we assign\nit to the first class; if above — to the second. A similar approach\nworks for multi-label classification.",[667,747,749,754],{"value":748},"item-7",[671,750,751],{"v-slot:question":466},[13,752,753],{},"Why is dimensionality reduction needed?",[671,755,756],{"v-slot:answer":466},[13,757,758],{},"Many real-world datasets are described by hundreds or thousands of\nfeatures, which makes them hard to process and visualize. Dimensionality\nreduction encodes data with fewer numbers while preserving important\nrelationships between objects. Objects that are semantically close remain\nclose in the new representation as well.",{"title":466,"searchDepth":760,"depth":760,"links":761},2,[762,763,764,765,766],{"id":42,"depth":760,"text":43},{"id":73,"depth":760,"text":74},{"id":558,"depth":760,"text":559},{"id":633,"depth":760,"text":634},{"id":657,"depth":760,"text":658},"mdx",{"readTime":769,"image":770,"date":771,"dateModified":772,"tags":773,"authors":776},"15 minutes","/img/blog/ml-basic-1/preview.png","2026-03-16","2026-04-20",[774,775],"Artificial Intelligence","Machine Learning",[777],"vgorash",true,{"title":780,"description":781},"About Machine Learning in Simple Terms","In this article, we'll start exploring the field of machine learning. We'll try to understand what it is, what types exist, and what problems it can solve.","en/blog/ml-basic-1","OquMbnWN-lBolaPLTjRpHEjCsTWc6mjv5Rr-j6RGNis",1777111174094]