Logistic Regression: Predicting the Outcome of an Event

Logistic regression is a statistical model used to predict the probability of an event based on a set of independent variables. It’s particularly useful when you want to classify an event as belonging or not to a specific category (for example, whether a customer will buy a product or not, or whether a patient will develop a disease or not).

It is a Supervised Machine Learning algorithm that can be used to model the probability of a specific class or event. It is used when the data is linearly separable – that is, if there exists a line or plane that can be used to uniquely separate the data into different classes – and the outcome is binary or dichotomous. This means that logistic regression is typically used for binary classification problems (Yes/No, Correct/Incorrect, True/False, etc.),

In this post, I will demonstrate how to perform binomial logistic regression to create a classification model, in order to predict binary responses on a given set of predictors.

Continue reading “Logistic Regression: Predicting the Outcome of an Event”

Non-Parametric Tests: The Wilcoxon Test for Non-Normal Data

The Wilcoxon test is a non-parametric test used to compare two independent samples, or a sample with a known reference value.
The test is used when the data do not follow a normal distribution, or when the distribution parameters are unknown.

Continue reading “Non-Parametric Tests: The Wilcoxon Test for Non-Normal Data”

The Beta Distribution Explained Simply

The Beta distribution is a crucial probability distribution in Bayesian statistics.

In theoretical probability problems, we know the exact probability value of a single event, making it relatively straightforward to apply basic probability calculation rules to reach the desired result.

In real life, however, it’s much more common to deal with collections of observations, and it’s from this data that we must derive probability estimates.

Continue reading “The Beta Distribution Explained Simply”

Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)

In various posts, particularly those on regression analysis, variance analysis, and time series, we’ve come across terms that seem deliberately designed to scare the reader.
The aim of these articles is to explain these key concepts simply, beyond the apparent complexity (something I really wanted when I was a student, instead of facing texts written in a purposely convoluted and unnecessarily difficult way).
So, it’s time to spend a few words on three very important concepts that often recur in statistical analysis and need to be well understood. The reality is much, much clearer than it seems, so… don’t be afraid!

Continue reading “Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)”

Analysis of Variance, ANOVA. Explained simply

Analysis of Variance (ANOVA) is a parametric test that evaluates the differences between the means of two or more data groups.
It is a statistical hypothesis test that is widely used in scientific research and allows to determine if the means of at least two populations are different.
As a minimum prerequisite, a continuous dependent variable and a categorical independent variable that divides the data into comparison groups are required.

Continue reading “Analysis of Variance, ANOVA. Explained simply”