Statistics is indisputably one of the most important components and basis for any machine learning application. Thus, basic knowledge in various areas of this subfield is also indispensable if one wants to understand the algorithms behind machine learning more precisely.
In general, statistical methods deal with being able to express a relationship between different variables and these inferences also mathematically. In other words, cause-effect relationships (Causation) are sought:
- How much better does the grade on an exam get if you study more?
- How does the election result change depending on the campaign that was run?
- Is it safer to fly by plane or to take the train?
In order to be able to examine such correlations more precisely, data analysis also includes tools for evaluating, displaying and summarizing large amounts of data. Graphical evaluations, such as bar charts, pie charts or line charts, are just as much a part of the statistical repertoire as the calculation of mean values or medians.
Some of our Articles in the Field of Statistics
What is Gibbs Sampling?
Explore Gibbs sampling: Learn its applications, implementation, and how it's used in real-world data analysis.
What is a Bias?
Unveiling Bias: Exploring its Impact and Mitigating Measures. Understand, recognize, and address bias in this insightful guide.
What is the Variance?
Explore variance's role in statistics and data analysis. Understand how it measures data dispersion.
What is the Kullback-Leibler Divergence?
Explore Kullback-Leibler Divergence, a vital metric in information theory and machine learning, and its applications.
What is the Maximum Likelihood Estimation?
Unlocking insights: Understand Maximum Likelihood Estimation (MLE), a potent statistical tool for parameter estimation and data modeling.
What is the Variance Inflation Factor (VIF)?
Learn how Variance Inflation Factor (VIF) detects multicollinearity in regression models for better data analysis.
Difference between statistical methods and stochastics
In everyday language, probability theory is often mistakenly assigned to statistics, although this is not true. Statistics is merely a subfield of so-called stochastics. In addition to data analysis, this also includes probability theory, i.e. all calculations relating to random experiments such as coin tossing, dice rolling or betting.
This is important because statistical methods do not include probability calculations, even though this is sometimes erroneously claimed. Statistical calculations are clearly more important for machine learning algorithms and form one of the most significant foundations for ML. Probabilities are only used within artificial intelligence when outputting results. A machine learning algorithm will never be able to make a prediction with complete certainty. Instead, results are output with probabilities to express how certain the algorithm is about the outcome. So a probability of 99.5% means that the model is very sure that its prediction will be correct.
Conclusion
Statistical methods are one of the most important foundations for understanding and correctly applying models in the field of machine learning. The contributions in this chapter aim to explain the methods that are indispensable for basic machine learning.