Skip to content

Correlation and Causation

Correlation refers to the relationship between two statistical variables. The two variables are then dependent on each other and change together.  A positive correlation of two variables, therefore, means that an increase in A also leads to an increase in B. The correlation is undirected. It is therefore also true in the reverse case and an increase in variable B also changes the slope of A to the same extent.

Causation, on the other hand, describes a cause-effect relationship between two variables. Causation between A and B, therefore, means that the increase in A is also the cause of the increase in B. The difference quickly becomes clear with a simple example:

A study could very likely find a positive correlation between a person’s risk of skin cancer and the number of times they visit the outdoor pool. So if a person visits the outdoor pool frequently, then their risk of developing skin cancer also increases. A clear positive correlation. But is there also causation between outdoor swimming pool visits and skin cancer? Probably not, because that would mean that only outdoor swimming pool visits are the cause of the increased risk of skin cancer.

It is much more likely that people who spend more time in outdoor swimming pools are also exposed to significantly more sunlight. If they do not take sufficient precautions with sunscreen or similar, more sunburns can occur, which increases the risk of skin cancer. It is clear that the correlation between outdoor swimming pool visits and skin cancer risk is not causal. 

A variety of curious correlations that very likely do not show causation can be found at tylervigen.com.

Das Liniendiagramm zeigt zwei Linien, die eine Korrelation darstellt. Der Margarinekonsum und die Scheidungsrate nehmen in dem Zeitraum beide proportional ab.
Correlation between divorce rate and margarine consumption in Maine (USA) | Photo: tylervigen.com

For example, there is a very high correlation between the divorce rate in the American state of Maine and the per capita consumption of margarine. Whether this is also causation can be doubted.

Correlation Coefficient

The Correlation Coefficient indicates how strong the correlation between the two variables is. In the example of tylervigen.com, this correlation is very strong at 99.26% and means that the two variables move almost 1 to 1, i.e. an increase in the consumption of Margarine by 10% also leads to an increase in the divorce rate by 10%.  The correlation coefficient can also assume negative values.

A correlation coefficient smaller than 0 describes the Anti-Correlation and states that the two variables behave in opposite ways. For example, a negative correlation exists between current age and remaining life expectancy. The older a person gets, the shorter his or her remaining life expectancy. 

How do you prove Causation?

In order to reliably prove causation, scientific experiments are conducted. In these experiments, people or test objects are divided into groups (you can read more about how this happens in our article about Sampling), so that in the best case all characteristics of the participants are similar or identical except for the characteristic that is assumed to be the cause.

For the “skin cancer outdoor swimming pool case”, this means that we try to form two groups in which both groups of participants have similar or preferably even the same characteristics, such as age, gender, physical health, and exposure to sunlight per week. Now it is examined whether the outdoor swimming pool visits of one group (note: the exposed sun exposure must remain constant), changes the skin cancer risk compared to the group that did not go to the outdoor swimming pool. If this change exceeds a certain level, one can speak of causation.

This is what you should take with you

  • Only in very few cases does a correlation also imply causation.
  • Correlation means that two variables always change together. Causation, on the other hand, means that the change in one variable is the cause of the change in the other.
  • The correlation coefficient indicates the strength of the correlation. It can be either positive or negative. If the coefficient is negative, it is called anticorrelation.
  • To prove a causation one needs complex experiments. 

Other Articles on the Topic of Correlation and Causation

  • Detailed definitions of the terms can be found here.
close
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner