Statistics is indisputably one of the most important components and basis for any machine learning application. Thus, basic knowledge in various areas of this subfield is also indispensable if one wants to understand the algorithms behind machine learning more precisely.
In general, statistical methods deal with being able to express a relationship between different variables and these inferences also mathematically. In other words, cause-effect relationships (Causation) are sought:
- How much better does the grade on an exam get if you study more?
- How does the election result change depending on the campaign that was run?
- Is it safer to fly by plane or to take the train?
In order to be able to examine such correlations more precisely, data analysis also includes tools for evaluating, displaying and summarizing large amounts of data. Graphical evaluations, such as bar charts, pie charts or line charts, are just as much a part of the statistical repertoire as the calculation of mean values or medians.
Some of our Articles in the Field of Statistics
What is the Standard Deviation?
Understand Standard Deviation: Definition, Calculation & Interpretation. Learn How to Measure Data Variability with Examples. Read More.
What is the Selection Bias?
Learn how selection bias can skew your data analysis. Avoid errors in decision-making. Read more about selection bias in this informative article.
tSNE: t-distributed stochastic neighbor embedding
Visualize complex data with t-SNE: a powerful dimensionality reduction technique. Learn how it works and its applications in data science.
Principal Component Analysis – easily explained!
Principal Component Analysis explained with examples and defining the prerequisites.
Population and Sample – simply explained!
Definition of population and sample with examples, advantages of sampling and sampling methods.
Correlation and Causation – easily explained!
Correlation and causality: explain differences using examples, prove correlation coefficient and causality.
Normal Distribution – easily explained!
Normal distribution with definition, calculation example and the distinction between density function and distribution function.
Expected Value – easily explained!
Expected Value explained with examples and difference to arithmetic mean shown.
Difference between statistical methods and stochastics
In everyday language, probability theory is often mistakenly assigned to statistics, although this is not true. Statistics is merely a subfield of so-called stochastics. In addition to data analysis, this also includes probability theory, i.e. all calculations relating to random experiments such as coin tossing, dice rolling or betting.
This is important because statistical methods do not include probability calculations, even though this is sometimes erroneously claimed. Statistical calculations are clearly more important for machine learning algorithms and form one of the most significant foundations for ML. Probabilities are only used within artificial intelligence when outputting results. A machine learning algorithm will never be able to make a prediction with complete certainty. Instead, results are output with probabilities to express how certain the algorithm is about the outcome. So a probability of 99.5% means that the model is very sure that its prediction will be correct.
Statistical methods are one of the most important foundations for understanding and correctly applying models in the field of machine learning. The contributions in this chapter aim to explain the methods that are indispensable for basic machine learning.