Skip to content

Normal Distribution – easily explained!

The normal distribution, or Gaussian distribution, is the most important continuous probability distribution since almost all values we have in our environment are normally distributed. Body height (within a gender), the 100m times of a swimmer in different races but also something as special as the weight of several coffee packets follow the Gaussian distribution from a sufficiently large sample. 

If we perform a random experiment, such as measuring the times of a swimmer again and again, then we want to obtain a so-called density function. This tells us how often a certain event occurs. For example, we might be interested in how likely it is that the swimmer completes the 100m in a time of 1:15 min. Additionally, we might be interested in the probability that the athlete swims the 100m in under or at most 1:15 min. We can answer this question with the help of the distribution function. The distribution function indicates the probability with which the result of the random experiment is less than or equal to a certain value. 

What is the definition of the Normal Distribution?

A continuous random variable X with a density function f(x) of the form

\(\) \[f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \cdot e^{-\frac{1}{2} \cdot \frac{(x – \mu)^2}{\sigma}}\]

with the expected value µ and the variance σ² is called normally distributed (short: N(µ, σ²)). The expected value µ…

  • … is a real number, so it can also become negative.
  • … is the X-coordinate of the maximum of density function.

The variance σ²…

  • … is the squared standard deviation σ.
  • … must always be greater than 0.
  • … determines how much the graph is stretched or compressed horizontally. Low variance means that the graph is narrow. 

What is the Density function?

In connection with the normal distribution, the density function is usually shown with its well-known bell curve. In short, this graph is used to read off the probability of this event occurring for an expected value X. The probability of this event occurring is determined by the probability of this event occurring.

The graph depicts the normal distribution of heights in centimeters measured in male subjects. The expected value µ = 180 indicates that the majority of the subjects were 180cm tall. The variance σ² in this example is 7. The probability for the expected value X = 176 is about 5%, i.e. a random male test subject is exactly 176cm tall with a probability of 5%.

Das Bild zeigt die charackteristische Glockenkurve der Normalverteilung / Normal Distribution auf kariertem Hintergrund.
Normal Distribution Density Function | Source: Author

What is the distribution function?

The distribution function F(x) of the normal distribution is defined by

\(\) \[f(x) = \frac{1}{\sigma \sqrt{2 \pi}} \cdot \int_{- \infty}^{x} e^{-\frac{1}{2} \cdot \frac{(x – \mu)^2}{\sigma}} \]

Thus, the integral of the density function f(x) in the range from – to the random variable X. Accordingly, the distribution function indicates how high the probability is that the random variable takes on a value of less than or equal to X: 

\(\) \[ f(x) = Prob(X \leq x) \]

For the expected value X = 176, we obtain a probability of about 6.7% in the distribution function. A random, male person is thus shorter or exactly 176cm tall with a probability of 6.7%. 

What is the empirical rule of normal distribution?

The empirical rule, also known as the 68-95-99.7 rule, is a statistical guideline for normal distribution. It states that:

This rule can be helpful in interpreting and understanding data that follows a normal distribution. For example, if we know that a data set is normally distributed and we calculate its mean and standard deviation, we can use the empirical rule to estimate the proportion of the data that falls within certain ranges.

Standard Deviation
Example of the Standard Deviation of Grades | Source: Author

It is important to note that the empirical rule is only an approximation and does not apply to all normal distributions. Also, it only applies to continuous data that follow a normal distribution, and not to categorical or discrete data. Nevertheless, the empirical rule can be a useful tool for gaining insight into normally distributed data.

What are the alternatives to the normal distribution?

While the normal distribution is a commonly used distribution for modeling continuous random variables in statistics, there are situations where other distributions may be more appropriate. Here are some of the alternatives to the normal distribution:

  1. Binomial distribution: used to model the number of successes in a fixed number of trials, with each trial having a binary outcome (e.g., heads or tails)
  2. Poisson distribution: used to model the number of events occurring in a fixed interval of time or space, when the events are rare and random
  3. Exponential distribution: used to model the time between consecutive events occurring in a Poisson process, such as the time between two earthquakes or the time between two customers arriving at a store
  4. Gamma distribution: a family of distributions that includes the exponential distribution as a special case, and can be used to model the waiting time until a specified number of events occur in a Poisson process
  5. Beta distribution: used to model probabilities or proportions that have a bounded range, such as the proportion of voters in favor of a particular candidate
  6. Weibull distribution: used to model the time to failure of a system, with the failure rate increasing or decreasing over time
  7. Uniform distribution: used to model random variables with a constant probability density function over a finite range.

It is important to choose the appropriate distribution based on the nature of the data and the research question at hand. Since choosing the right distribution is immensely important for the later results and there are so many different choices, we will now look at how to find the optimal distribution for the data set.

How to choose the appropriate data distribution?

When working with data, it is crucial to choose the appropriate distribution for the given dataset. Selecting the wrong distribution can lead to incorrect assumptions about the data and affect the results of any analysis or modeling performed.

One approach to choosing the correct distribution is to examine the characteristics of the data. For instance, if the data has a single peak or mode, it may be appropriate to assume a normal distribution. Alternatively, if the data is positively skewed, it may be appropriate to assume a log-normal or gamma distribution. On the other hand, if the data is negatively skewed, it may be appropriate to assume an inverse gamma or Weibull distribution.

Another approach is to use statistical tests to compare the fit of different distributions to the data. Some commonly used tests include the Kolmogorov-Smirnov test, the Anderson-Darling test, and the Chi-Squared test. These tests can help to determine which distribution provides the best fit for the data.

It is also important to consider the context of the analysis or modeling. For example, if the data represents a count of discrete events, it may be appropriate to assume a Poisson or negative binomial distribution. If the data represents a proportion, a beta distribution may be more appropriate.

Ultimately, choosing the correct distribution for data requires careful consideration and understanding of the data and the context in which it will be used.

What are hypothesis tests and how do they use the normal distribution?

Hypothesis testing is a statistical method used to determine whether a hypothesis about a population parameter is likely to be true or not based on sample data. The normal distribution is a commonly used distribution in hypothesis testing because many natural phenomena are distributed normally. In hypothesis testing, we start by formulating a null hypothesis, which is a statement about the population parameter that we are testing. We then collect sample data and use it to calculate a test statistic.

If the sample data support the null hypothesis, we accept it. If the sample data contradicts the null hypothesis, we reject it and conclude that the alternative hypothesis is more likely to be true. In order to determine whether the null hypothesis should be rejected, we compare the test statistic to a critical value determined by the significance level and the degrees of freedom. If the test statistic is greater than the critical value, we reject the null hypothesis.

The normal distribution is often used in hypothesis testing because many natural phenomena are distributed normally. This means that if we are testing a hypothesis about a normally distributed population, we can use the properties of the normal distribution to make inferences about the population parameter. For example, we can use the mean and standard deviation of a sample to estimate the mean and standard deviation of the population, and we can use the properties of the normal distribution to calculate the probability of observing a certain value or range of values.

However, it is important to note that not all phenomena are normally distributed. If the data is not normally distributed, we may need to use a different distribution in hypothesis testing. There are many different probability distributions, each with its own set of properties and applications. Choosing the correct distribution for a particular set of data requires careful consideration of the nature of the data and the hypothesis being tested.

This is what you should take with you

  • The normal distribution is a fundamental concept in statistics and probability theory.
  • It is widely used to model various phenomena in the natural and social sciences.
  • The empirical rule provides a useful guideline for understanding the distribution of data.
  • While the normal distribution is a common and useful model, it is important to consider alternative distributions when appropriate.
  • Choosing the correct distribution of data is essential for accurate statistical analysis.
  • Hypothesis testing is a powerful tool that relies on normal distribution to make inferences about population parameters.
  • Understanding the normal distribution and its properties is an important foundation for further study in statistics and data analysis.

Other Articles on the Topic of Normal Distribution

  • You can find a concise summary of the topic here.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner