Statistics and SEO

Over the years, I’ve been writing a series of posts that I hope can serve as an introduction to the main “foundational” topics in the field of descriptive statistics, inferential statistics, and time series analysis. I’m grouping them here so they can form a path — a way to embark on a journey that I hope will be stimulating.

Statistics and SEO: The Topics

1. The Data: The 4 Scales of Measurement
Quantitative and qualitative data | The 4 levels of measurement | Nominal scale | Ordinal scale | Interval scale | Ratio scale | Complexity levels of measurement types

2. Descriptive Statistics: Measures of Position and Central Tendency
Measures of central tendency | Arithmetic mean | Weighted mean | Geometric mean | Harmonic mean | Trimmed mean | Median | Mode | Relationship between mean, median and mode | Quartiles, deciles and percentiles | The five-number summary | Box-plot

3. Descriptive Statistics: Measures of Variability (or Dispersion) — coming soon
Range | Mean deviation | Variance | Standard deviation | Coefficient of variation | Shape of a distribution | Kurtosis

4. First Steps into the World of Probability: Sample Space, Events, Permutations, and Combinations
Probability | Additivity principle for incompatible events | Multiplication principle | Permutations | Combinations | The binomial distribution as an application

5. Probability Distributions: Discrete Distributions – The Binomial — coming soon
Discrete and continuous variables | Bernoulli random variable | Binomial distribution | Binomial coefficient | Mean, expected value, variance | The hypergeometric distribution

6. The Beta Distribution Explained Simply
An important probability distribution in Bayesian statistics | A practical example using R

7. The Geometric Distribution — coming soon
How many attempts until the first success? | Examples | Using R or TI-83

8. The Hypergeometric Distribution
Starting from the formula | The hypergeometric distribution explained with examples | The urn and balls example | Further reading

9. The Negative Binomial Distribution (or Pascal Distribution)
Definition | Usage examples | Differences between the geometric and Pascal distributions

10. The Poisson Distribution — coming soon
Lambda: the average rate of events | Poisson vs Binomial | Practical examples | SEO applications

11. The Normal Distribution
Visualizing the “normality” of data | Transforming data | The empirical rule | Standardization | Examples | Chebyshev’s inequality

12. The Central Limit Theorem: Why Statistics Works (Even When Data Isn’t Normal)
What is the CLT | Why it matters | Simulation in R | The practical rule: how large should n be? | Standard error | Daily organic traffic example | When the CLT is not enough

13. Hypothesis Testing — coming soon
Statistical hypotheses | Type I and II errors | One-tailed or two-tailed? | Setting null and alternative hypotheses | Significance level | Choosing the distribution | Drawing conclusions | Power of a test | Determining sample size

14. The t-Distribution and Hypothesis Testing — coming soon
A brief historical digression | Example | The p-value approach | Confidence intervals | The t-test with R

15. The Two-Sample t-Test — coming soon
Independent samples hypothesis test | Paired t-test for dependent samples | Example

16. Confidence Intervals: What They Are, How to Calculate Them (and What They Do NOT Mean)
What is a CI | The 95% misconception | CI for means | CI for proportions | CI vs hypothesis testing | Confidence levels: 90%, 95%, 99% | What affects CI width | Practical example: organic CTR

17. Guide to Statistical Tests for A/B Analysis
Z test | Student’s t-test | Welch’s t-test | Chi-square test | Analysis of Variance (ANOVA) | Mann-Whitney U test | Fisher’s exact test | Regression analysis | Comparative overview table

18. Bayesian Statistics: How to Learn from Data, One Step at a Time
Frequentist vs Bayesian | Bayes’ theorem: derivation and components (prior, likelihood, posterior, evidence) | Numerical example in R: ad campaign click rate | Sequential updating | Informative and non-informative priors | Credible interval vs confidence interval | When to use the Bayesian approach

19. Anomaly Detection: How to Identify Outliers in Your Data
Why recognizing anomalies matters | Working dataset: simulated sessions with injected anomalies | Method 1: z-score and the empirical rule | Method 2: IQR and Tukey’s method | Method 3: Grubbs’ test and iterative approach | Comparing the three methods on web traffic data

20. Contingency Tables and Conditional Probability — coming soon
Two-way tables and marginal distributions | Conditional probability | Dependence and independence

21. The Chi-Square Test: Goodness of Fit and Test of Independence
Goodness of Fit test | Understanding through examples | Using R | The Independence test

22. Statistical Parametric and Non-Parametric Tests
Parametric tests: the power of normality | Non-parametric tests: versatility and creativity

23. Analysis of Variance, ANOVA. Explained Simply
ANOVA: a parametric test | Why ANOVA instead of multiple t-tests? | One-way ANOVA | The ANOVA table | Using R

24. The Gini Index: What It Is, Why It Matters, How to Calculate It in R — coming soon
The Lorenz curve | Example | The concentration index R | Computing in R | The Gini index worldwide

25. Correlation and Regression Analysis – Linear Regression
Simple Regression | Pearson’s R | R-squared | Spearman’s rank correlation | Regression equation | Outliers and leverage points | Model assumptions | Residual analysis | Other correlation coefficients

26. Multiple Regression Analysis, Explained Simply
The multiple regression equation | What information can I extract? | Prerequisites | Getting started | How good is my model? | Summary

27. Logistic Regression: Predicting the Outcome of an Event
How logistic regression works | Example in R: Titanic survival probability | The logit equation | Summary | Resources

28. Time Series Analysis and Forecasting in R
What is a time series | Classical analysis and decomposition | The four classic components | Creating time series in R | Smoothing techniques | SEO applications | Moving averages | LOESS decomposition | Holt-Winters exponential smoothing | ARIMA models

29. Multicollinearity, Heteroscedasticity, Autocorrelation: Three Difficult-Sounding Concepts (Explained Simply)
Multicollinearity | How to reduce the problem | Heteroscedasticity | Autocorrelation

30. Understanding the Basics of Machine Learning: A Beginner’s Guide — coming soon
Introduction | What is ML | Supervised and unsupervised ML | Main stages | How to get started | Jupyter Lab and Google Colab

Additional Topics

Non-Parametric Tests: The Wilcoxon Test for Non-Normal Data
Wilcoxon signed-rank test | Wilcoxon rank-sum test | Practical examples with R

The Monte Carlo Method Explained Simply with Real-World Applications
What is Monte Carlo simulation | Random sampling | Practical applications | R examples

How to Use Decision Trees to Classify Data
Decision tree algorithm | Classification and regression trees | Practical examples

The Gradient Descent Algorithm Explained Simply
How gradient descent works | Learning rate | Practical implementation


This blog is listed on R-bloggers.com, an aggregator of R tutorials and news from the R community.