Correlation refers to the relationship between two statistical variables. The two variables are then dependent on each other and change together. A positive correlation of two variables, therefore, means that an increase in A also leads to an increase in B. The association is undirected. It is therefore also true in the reverse case and an increase in variable B also changes the slope of A to the same extent.
Causation, on the other hand, describes a cause-effect relationship between two variables. Causation between A and B, therefore, means that the increase in A is also the cause of the increase in B. The difference quickly becomes clear with a simple example:
A study could very likely find a positive correlation between a person’s risk of skin cancer and the number of times they visit the outdoor pool. So if a person visits the outdoor pool frequently, then their risk of developing skin cancer also increases. A clear positive association. But is there also causation between outdoor swimming pool visits and skin cancer? Probably not, because that would mean that only outdoor swimming pool visits are the cause of the increased risk of skin cancer.
It is much more likely that people who spend more time in outdoor swimming pools are also exposed to significantly more sunlight. If they do not take sufficient precautions with sunscreen or similar, more sunburns can occur, which increases the risk of skin cancer. It is clear that the correlation between outdoor swimming pool visits and skin cancer risk is not causal.
A variety of curious correlations that very likely do not show causation can be found at tylervigen.com.
For example, there is a very high association between the divorce rate in the American state of Maine and the per capita consumption of margarine. Whether this is also causation can be doubted.
What are the Types of Correlation?
In general, there are two types of contexts that can be distinguished:
- Linear or Non-Linear: The dependencies are linear if the changes in variable A always trigger a change with a constant factor in variable B. If this is not the case, the dependency is said to be non-linear.
- Positive or Negative: If the increase in variable A leads to an increase in variable B, then there is a positive correlation. If, on the other hand, the increase in A leads to a decrease in B, then the dependency is negative.
What is the Correlation Coefficient?
The Correlation Coefficient indicates how strong the association between the two variables is. In the example of tylervigen.com, this correlation is very strong at 99.26% and means that the two variables move almost 1 to 1, i.e. an increase in the consumption of Margarine by 10% also leads to an increase in the divorce rate by 10%. The correlation coefficient can also assume negative values.
A correlation coefficient smaller than 0 describes the Anti-Correlation and states that the two variables behave in opposite ways. For example, a negative association exists between current age and remaining life expectancy. The older a person gets, the shorter his or her remaining life expectancy.
How do you prove Causation?
In order to reliably prove causation, scientific experiments are conducted. In these experiments, people or test objects are divided into groups (you can read more about how this happens in our article about Sampling), so that in the best case all characteristics of the participants are similar or identical except for the characteristic that is assumed to be the cause.
For the “skin cancer outdoor swimming pool case”, this means that we try to form two groups in which both groups of participants have similar or preferably even the same characteristics, such as age, gender, physical health, and exposure to sunlight per week. Now it is examined whether the outdoor swimming pool visits of one group (note: the exposed sun exposure must remain constant), changes the skin cancer risk compared to the group that did not go to the outdoor swimming pool. If this change exceeds a certain level, one can speak of causation.
This is what you should take with you
- Only in very few cases does a correlation also imply causation.
- Correlation means that two variables always change together. Causation, on the other hand, means that the change in one variable is the cause of the change in the other.
- The correlation coefficient indicates the strength of the association. It can be either positive or negative. If the coefficient is negative, it is called anticorrelation.
- To prove a causation one needs complex experiments.
Other Articles on the Topic of Correlation and Causation
- Detailed definitions of the terms can be found here.