Paolo Gironi - appunti di analisi dei dati,seo,statistica, retroinformatica

How to Use Decision Trees to Classify Data

Decision Trees are a type of machine learning algorithm that uses a tree structure to divide data based on logical rules and predict the class of new data. They are easy to interpret and adaptable to different types of data, but can also suffer from problems such as overfitting, complexity, and imbalance.
Let’s understand a bit more about them and examine a simple example of use in R.

The Gradient Descent Algorithm Explained Clearly: From Intuition to Practice

A blindfolded person on a mountain

Imagine standing on a mountainous terrain, completely blindfolded. Your goal: reach the lowest point in the valley. You can’t see anything, but you can feel the slope of the ground beneath your feet. What do you do? You move in the direction where the ground goes down, one step at a time. If it slopes more steeply to the left, you go left. If it drops more to the right, you go right. With each step, you feel the slope again and redirect yourself.

This strategy, so simple and natural, is exactly what neural networks use to learn. Every time an AI model improves — learning to recognize a face, translate a sentence, or generate text — it does so by descending through a mathematical landscape, one step at a time, following the slope.

It’s called gradient descent, and it’s arguably the most important algorithm in modern machine learning.

Infographic: the blindfolded explorer metaphor for gradient descent, with three steps: Sensor, Action, Cycle

The Hypergeometric Distribution

We have seen that the binomial distribution is based on the hypothesis of an infinite population N, a condition that can be practically realized by sampling from a finite population with replacement.

If this does not occur, meaning if we are sampling from a population without replacement, we must use the hypergeometric distribution. (In reality, if N is large, the hypergeometric probability density function tends towards the binomial).

The hypergeometric distribution is used to calculate the probability of obtaining a certain number of successes in a series of binary trials (yes or no), which are dependent and have a variable probability of success.

The hypergeometric distribution allows us to answer questions like:

If I take a sample of size N, in which M elements meet certain requirements, what is the probability of drawing x elements that meet those requirements?

The Negative Binomial Distribution (or Pascal Distribution)

The negative binomial distribution describes the number of trials needed to achieve a certain number of successes in a series of independent trials. For example, it could be used to calculate the probability of getting three heads when flipping a coin 5 times, assuming the coin is balanced and therefore the probability of getting heads on each flip is 50%.

The negative binomial distribution is useful in many fields, including statistics, economics, biology, and physics. And also in “our” SEO.

First Steps into the World of Probability: Sample Space, Events, Permutations, and Combinations

Probability and combinatorics are two fundamental concepts in mathematics and statistics that help us understand and interpret many phenomena in everyday life. In this introductory post, we’ll “touch upon” the main concepts together, seeing how they can be applied in various contexts.