Confidence Interval - easily explained!

A confidence interval comprises a series of values that contain the true value of a parameter with a certain probability. This parameter is usually a statistical key figure, such as the mean or the variance, whose true value cannot be calculated precisely with the given data. In many cases, the confidence interval is calculated from a sample and consists of an estimate for a parameter and a margin of error that indicates the range in which the measured value could lie. A confidence level is also specified, which indicates how high the probability is that the measured value lies within this range.

In this article, we look at confidence intervals and explain why we need to work with statistical estimates at all and how confidence intervals can help us to express this uncertainty. We will explain the concept using a simple example and show how the confidence interval can be calculated and how it can be interpreted. We will also look at the factors that influence the interval and look at some extensions.

Why do Uncertainties occur in Statistical Estimates?

Uncertainty often arises in statistics because we have to work with samples, as it is often not practical or efficient to survey the whole population to get the true value. In many cases, it is therefore sufficient to survey as representative a sample of the population as possible in order to obtain the most accurate estimate of the key figures we are looking for.

Assuming we want to find out how tall the average person in New York City is, it would not be economical to conduct a survey that asks every registered resident. Moreover, even if the resources were available, some of the respondents might not answer or we might have an outdated database and therefore not cover everyone. As a result, it is very difficult for us to find out the actual average height of a New York City resident. For this reason, we have to rely on random samples, so that we only survey a selected number of residents, for example, 10,000.

However, there is uncertainty as to whether the average height of the 10,000 respondents corresponds to the actual value for the entire city, which is why we have to rely on probabilities. Some factors can increase the accuracy of our estimate, such as increasing the sample size or improving data quality by ensuring that all neighborhoods are included in the survey. Nevertheless, uncertainty remains a central point in the statistical analysis, which must be quantified and communicated. Confidence intervals help here by quantifying how certain one can be about an estimate.

How is a Confidence Interval constructed and how is it calculated?

The basis of a confidence interval is the central estimated value around which the interval revolves and which was calculated from the sample. In our case, this is the average height, which was calculated from the survey of 10,000 New Yorkers.

Next, we need the limits of the interval. The margin of error, also known as the marginal error, indicates how far the true value could be from the estimate. It therefore specifies a maximum and a minimum value within which the actual value of the parameter can lie. The following steps are taken to calculate this interval:

1. Calculation of the Standard Error

This value measures how widely the sample statistic is spread around the true parameter and is calculated using this formula:

\(\) \[SE = \frac{s}{\sqrt{n}} \]

Here, \(s\) is the standard deviation of the data set and \(n\) is the size of the sample. As can be seen, the standard error decreases when the standard deviation is lower, i.e. the data points are closer together and the deviation from the mean is not as large. In addition, the standard error decreases as the sample size increases, as it is in the denominator of the fraction. As the sample size increases, we cover a larger part of the population, and fluctuations in the data become smaller.

2. Calculation of the Margin of Error

The so-called confidence level determines how wide the confidence interval is. It indicates how certain we can be that the calculated confidence interval also contains the true value of the parameter. A confidence level of 95% means, for example, that the true value of the parameter lies within the confidence interval 95 times and outside the interval five times for 100 random samples. A higher confidence level stands for a higher certainty that the true value lies within the interval, but ensures that the interval becomes wider.

To get from the confidence level to an actual numerical value, either the z-distribution or the t-distribution is used, the differences between which are explained in more detail in the following section. The corresponding value of the distribution for the confidence level, for example, 1.96 for the z-distribution for a confidence level of 95%, is then multiplied by the standard error to obtain the margin of error:

\(\)\[ \text{Margin of Error} = z \cdot \text{SE} \]

3. Determine the Confidence Interval

Using the values calculated so far, the confidence interval is then obtained by calculating the sample value, for example, the mean value or another statistical indicator, and then subtracting the margin of error and adding it to the value so that the sample value lies in the middle of the confidence interval.

Example:

Let’s say we asked 10,000 random New York City residents how tall they are and measured an average height of \(\bar{x} = 170cm\) with a standard deviation of \(s= 10cm\). We can then calculate the standard error:

\(\)\[SE = \frac{s}{\sqrt{n}}\ = \frac{10}{\sqrt{10000}} = 0.1\]

For our evaluation, we choose a confidence level of 95% and use the z-distribution with the corresponding z-value of 1.96. This gives us the margin of error:

\(\)\[\text{Margin of Error} = z \cdot SE = 1.96 \cdot 0.1 = 0.196 \]

This gives us the following confidence interval for this sample:

\(\)\[\bar{x} \pm z \cdot SE = 170 \pm 0.196 \]

Written differently, this results in this value range:

\(\)\[\left[169.804 cm\ ; 170.196 cm\right] \]

What is the difference between the z-value and the t-value?

The z and t distributions are two different random distributions with a similar shape, but which differ in certain details. These differences mean that they should be used for different applications.

The z-distribution, or standard normal distribution, has a mean of 0 and a standard deviation of 1. It has the characteristic bell-shaped curve that is symmetrical about the y-axis. The z-values should be used for the confidence intervals if:

The sample is large, i.e. about \(n \geq 30\), because then the results are more accurate due to the central limit theorem and are less influenced by random fluctuations.
The standard deviation of the entire population is known. This is an important criterion that allows us to better calculate the uncertainty.

In many cases, the z-value is used to determine the confidence interval, as the values for the different confidence levels remain constant and do not change with the sample size.

The t-distribution has a similar shape to the z-distribution, with the difference that it has so-called “wider tails”, i.e. the graph is slightly higher at the edges. This means that extreme values are more likely in this distribution. The t-distribution changes with the size of the sample and approachesch mit steigender Stichprobengröße immer stärker der z-Verteilung an, weil dann präzisere Schätzungen möglich sind.

Difference between z and t Distribution | Source: Author

The t-values are interesting for the confidence intervals if the following conditions are met:

The sample is small, i.e. \(n < 30\). If fewer data are available, the uncertainty is greater and the t-distribution should be used.
The standard deviation of the entire population is not known. This forces us to estimate the standard deviation of the population from the sample, which makes the calculations less accurate.

How to interpret the Confidence Level and what are the Misconceptions?

The confidence level is used to indicate how reliable the method for calculating the confidence interval really is. However, this is not the same as how certain we are about a specific interval.

With a confidence level of 95%, we are talking about the fact that when samples are repeatedly taken and the confidence interval is recalculated, the true value lies within the calculated interval in 95 out of 100 cases. The probability of 95% therefore does not indicate that the true value lies within the interval with a 95% probability. The interval is fixed and the value can either lie within it or not. There are also the following misunderstandings that often arise when interpreting the confidence level:

“95% of the data lies within the confidence interval.”: A confidence interval does not give a direct statement about the distribution of the data, but only about the estimate of a parameter, such as the mean or the variance. Many data may lie both within the interval and outside the interval, as the interval only refers to the estimated parameter.
“The true value lies within the interval with a probability of 95%.”: The true value of the parameter is unknown and can lie either in the interval or outside the interval. The probability refers to the uncertainty as to whether the method provides correct intervals and not to the value itself. Strictly speaking, this statement is therefore not correct, as the true value of the parameter is not a random variable and is therefore not stochastic. The upper and lower limits of the confidence interval, on the other hand, are stochastic. However, this formulation is very often encountered in reality (as in this article).
“A wide confidence interval is worse than a narrow one.”: A wider interval often occurs when there is more uncertainty in the data set, i.e. a higher standard deviation or a small sample. However, it is also more honest than a narrow interval, which may underestimate the uncertainty.
“A single interval guarantees precision.”: The confidence level refers to the long-term accuracy of the interval calculation method, not a single interval. Even with a high confidence level, the true value may not lie within the confidence interval. Regardless of the confidence level, the statistical uncertainty remains.

The confidence level describes the certainty of the method and should not be confused with the probability that the true value lies within the interval.

What Extensions are there for Confidence Intervals?

Confidence intervals are an important tool in statistics, which is why various applications and methods have been developed over time that are based on these intervals or modified to be used for non-normally distributed data. In this section, we look at the most important extensions of confidence intervals to demonstrate the adaptability and versatility of this method.

One-sided vs. two-sided intervals

Generally speaking, the confidence interval defines a range of values in which the true value of a parameter, such as the mean or the variance, lies. Depending on the question, a distinction is made between two-sided intervals, in which both directions of deviation from the estimated value are taken into account, and one-sided intervals, in which only one direction of deviation is considered:

Two-sided intervals: With two-sided intervals, the estimated value of the parameter lies in the middle of the confidence interval and the true value may be smaller or larger than the estimated value. In our previous example of estimating the average height in New York City, for example, the true average can be greater than the estimated value of 170 cm, but it can also be smaller. In our example, this resulted in an interval with the following limits: [169.804cm; 170.196cm]. Two-sided intervals are mainly used when there is no specific assumption about the direction of the deviation and therefore both directions should be included in the estimate. The calculation of the two-sided confidence interval works in detail as described above and can be summarized with the following formula:

\(\)\[\bar{x} \pm z\cdot SE = \bar{x} \pm z \cdot \frac{s}{\sqrt{n}} \]

One-sided intervals: With one-sided confidence intervals, on the other hand, only one direction of deviation from the estimated value is considered, so that only values that are larger or only values that are smaller are included in the interval. In application, one-sided intervals are beneficial if we are only interested in one direction of deviation, as limit values or safety standards must be adhered to. For example, it could make sense for a machine to produce a maximum of 100 units per hour, as otherwise not enough raw materials can be supplied. The confidence interval should therefore be below 100 and an upward deviation is not permitted. In this case, the confidence interval is calculated very similarly to the two-sided case:

\(\)\[\bar{x} – z \cdot SE = \bar{x} – z \cdot \frac{s}{\sqrt{n}}\]

However, it is important to note that the z-value for one-sided intervals is not the same as for a two-sided interval, even if the confidence level is identical. For a confidence level of 95%, for example, the z-value for a two-sided interval is 1.96, and for a one-sided interval 1.65.

Hypothesis Tests

Confidence intervals and hypothesis te s ts are two closely related concepts within statistics that are used to make reliable statements about the estimation of parameters. While the hypothesis test only checks whether a specific value is plausible, the confidence interval gives it a broader perspective, as you can see whether the hypothesis lies within the interval.

To understand the procedure in more detail, let’s look at the example of a company that manufactures batteries. The company claims that its batteries have an average life of 50 hours. It therefore makes the so-called null hypothesis that the true mean value of all its batteries is 50 hours, i.e. \(H_0: \mu = 50 \). Since it is impossible to examine all batteries ever produced, we take a closer look at a sample of 30 products for which we determine the service life. The aim of the hypothesis test is now to determine whether the sample supports the hypothesis about the true value.

Two-Tail Hypothesis Test — Graphical Interpretation of a Hypothesis Test | Source: Author

Assume that the confidence interval of our sample is between 45 and 49 hours with a confidence level of 95%. As we have now learned, the confidence interval means that an interval is formed 95 times out of 100 samples, which includes the true value. As we can see, the company’s hypothesis, i.e. an average value of 50 hours, does not lie within this interval, meaning that we can reject the null hypothesis with a significance level of five percent. Rather, it is more likely that the actual average service life is slightly less than 50 hours.

Confidence intervals are an important tool when testing hypotheses because, unlike pure probability values, they also provide information about the precision of the result and define the probable interval in which the actual value is likely to lie. This increases the interpretability and makes the procedure much more intuitive.

Bootstrapping

Depending on the application, the basic assumptions of the confidence intervals may be not given, e.g. the data does not represent a normal distribution. This can happen quickly with very small samples. Instead of assuming a normal distribution, we use the bootstrapping method and generate artificial samples to estimate the distribution of the parameter. To do this, a value is randomly drawn from the pool of existing data points and this is then put back. A new sample of any size can then be created from this.

Let’s assume we are investigating the average sleep duration of various test subjects, but only have a sample of ten people with whom we still want to calculate the confidence interval for the average value. To do this, we randomly draw one of the ten people from the “pot” a thousand times and save them in a new sample. We then put the test person back into the “pot” so that the same person can theoretically be drawn twice in succession. With this new data set of a thousand people, which we have artificially created from just ten people, we can calculate a confidence interval and an estimated parameter as usual.

The bootstrapping method is not only suitable for very small samples but can also be used, for example, if the data is asymmetrical, otherwise distributed, or if no statement can be made about a possible distribution. The bootstrapping algorithm gives the artificial data set a normal distribution, as the principle of “draw and put back” follows a normal distribution.

This is what you should take with you

The confidence interval defines a range of values around an estimated parameter from a sample that contains the actual parameter of the population with a certain probability.
One-sided intervals, which only allow deviations in one direction, and two-sided intervals, which allow deviations in both directions, can be determined.
The z-value is used to calculate the margin of error if a sufficiently large sample is available and the standard deviation of the population is known. If one of these characteristics is not fulfilled, the t-value should be used.
The confidence interval plays a particularly important role in hypothesis tests, as it makes this method more intuitive by not only providing a statement as to whether the hypothesis is rejected or not, but also the range of values.
With the help of bootstrapping, confidence intervals can also be calculated for very small samples or asymmetrical data distributions.