Paolo Gironi - appunti di analisi dei dati,seo,statistica, retroinformatica

Descriptive Statistics: Measures of Variability (or Dispersion)

Measures of variability are used to describe the degree of dispersion of observations around a central tendency index.

In other words, measures of variability allow us to assess how data are spread around a central value, which may be represented, for example, by the mean or the median. They provide valuable information about the distribution of data, enabling a better understanding of the phenomenon under observation.

The techniques for measuring the variability of datasets are numerous. Among them, the most widely known (and most commonly used) are:

the range
the mean deviation and the variance
the standard deviation
the coefficient of variation

We will also visualise the concepts of central tendency and dispersion by revisiting skewness and introducing the concept of kurtosis.

Continue reading “Descriptive Statistics: Measures of Variability (or Dispersion)”

Probability Distributions: Discrete Distributions and the Binomial

A random variable (also called a stochastic variable) is a variable that can take on different values depending on some random phenomenon. In many statistics textbooks it is simply abbreviated as r.v. It is a numerical value.

When probability values are assigned to all the possible numerical values of a random variable x, the result is a probability distribution.

In even simpler terms: a random variable is a variable whose values are each associated with a probability of being observed. The set of all possible values of a random variable and their associated probabilities is called a probability distribution. The sum of all probabilities is 1.

Continue reading “Probability Distributions: Discrete Distributions and the Binomial”

Hypothesis Testing: A Step-by-Step Guide

In everyday life, we often have to make decisions based on incomplete information.

We may need to decide, for instance, whether a certain educational procedure is more effective than another, whether a new drug has genuinely positive effects on the course of a disease, and so on.

Hypothesis testing is a statistical procedure that allows us to pose a question on the basis of sample information, in order to reach a statistically significant decision.

In clearer and more direct terms: is my experimental finding due to chance? Hypothesis testing is precisely a statistical procedure for verifying whether chance is a plausible explanation of an experimental result.

Continue reading “Hypothesis Testing: A Step-by-Step Guide”

The t Distribution and Hypothesis Testing

In a previous article we presented the concept of hypothesis testing—a statistical method widely used to determine the validity of a claim based on a sample of data.

In the examples we proposed, however, we knew the value of the population standard deviation, sigma. In practice, this is a rather rare case, which allowed us to use the normal distribution and compute the Z-score.

If instead we do not know the population sigma, or if we are working with small samples, we must turn to a different type of distribution, called the t distribution or Student’s distribution.

Put more simply and clearly:

Student’s t distribution is a probability distribution used to assess the statistical significance of results when dealing with small sample sizes and uncertainty about the variance.

Continue reading “The t Distribution and Hypothesis Testing”

The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples

In a previous article we discussed hypothesis testing as it relates to a single measurement: the sample mean.

There are, however, numerous situations in which we need to carry out statistical analysis involving two samples. Think, for example, of the case where we want to study the difference between men and women with respect to the results of a given examination.

Continue reading “The Two-Sample t-Test: How to Test a Hypothesis for Dependent or Independent Samples”