Correlation refers to the relationship between two statistical variables. The two variables are then dependent on each other and change together. A positive correlation of two variables, therefore, means that an increase in A also leads to an increase in B. The association is undirected. It is therefore also true in the reverse case and an increase in variable B also changes the slope of A to the same extent.
Causation, on the other hand, describes a cause-effect relationship between two variables. Causation between A and B, therefore, means that the increase in A is also the cause of the increase in B. The difference quickly becomes clear with a simple example:
A study could likely find a positive correlation between a person’s risk of skin cancer and the number of times they visit the outdoor pool. So if a person visits the outdoor pool frequently, their risk of developing skin cancer also increases. A clear positive association. But is there also a causal effect between outdoor swimming pool visits and skin cancer? Probably not, because that would mean that only outdoor swimming pool visits are the cause of the increased risk of skin cancer.
It is much more likely that people who spend more time in outdoor swimming pools are also exposed to significantly more sunlight. If they do not take sufficient precautions with sunscreen or similar, more sunburns can occur, which increases the risk of skin cancer. The correlation between outdoor swimming pool visits and skin cancer risk is not causal.
A variety of curious correlations that very likely do not show causation can be found at tylervigen.com.
For example, there is a very high association between the divorce rate in the American state of Maine and the per capita consumption of margarine. Whether this is also causation can be doubted.
What are the Types of Correlation?
In general, two types of contexts can be distinguished:
- Linear or Non-Linear: The dependencies are linear if the changes in variable A always trigger a change with a constant factor in variable B. If this is not the case, the dependency is said to be non-linear. For example, there is a linear correlation between height and body weight. With every new centimeter of height gained, one is very likely to also gain a fixed amount of body weight, as long as one’s stature does not change. A non-linear relationship exists, for example, between the development of sales and the development of a company’s share price. With a 30% increase in sales, the stock price may increase by 10%, but with the subsequent 30% increase in sales, the stock price may only increase by 5%.
- Positive or Negative: If the increase in variable A leads to an increase in variable B, then there is a positive correlation. If, on the other hand, the increase in A leads to a decrease in B, then the dependency is negative.
What is the Correlation Coefficient?
The Correlation Coefficient indicates how strong the association between the two variables is. In the example of tylervigen.com, this correlation is very strong at 99.26% and means that the two variables move almost 1 to 1, i.e. an increase in the consumption of Margarine by 10% also leads to an increase in the divorce rate by 10%. This is illustrated in the screenshot above, as margarine consumption and the divorce rate decrease almost in parallel. This shows that a decrease in margarine consumption also leads to a decrease in the divorce rate.
The coefficient can also assume negative values. A coefficient smaller than 0 describes the Anti-Correlation and states that the two variables behave in opposite ways. For example, a negative association exists between current age and remaining life expectancy. The older a person gets, the shorter his or her remaining life expectancy.
How do you prove Causation?
To reliably prove causation, scientific experiments are conducted. In these experiments, people or test objects are divided into groups (you can read more about how this happens in our article about Sampling), so that in the best case all characteristics of the participants are similar or identical except for the characteristic that is assumed to be the cause.
For the “skin cancer outdoor swimming pool case”, this means that we try to form two groups in which both groups of participants have similar or preferably even the same characteristics, such as age, gender, physical health, and exposure to sunlight per week. Now it is examined whether the outdoor swimming pool visits of one group (note: the exposed sun exposure must remain constant), changes the skin cancer risk compared to the group that did not go to the outdoor swimming pool. If this change exceeds a certain level, one can speak of causation.
This is what you should take with you
- Only in very few cases does a correlation also imply causation.
- Correlation means that two variables always change together. Causation, on the other hand, means that the change in one variable is the cause of the change in the other.
- The correlation coefficient indicates the strength of the association. It can be either positive or negative. If the coefficient is negative, it is called anticorrelation.
- To prove a causal effect one needs complex experiments.
What is the Standard Deviation?
Explanation of the standard deviation and the relationship to the variance.
What is the Selection Bias?
Explanation of selection bias with examples.
tSNE: t-distributed stochastic neighbor embedding
Explanation of tSNE including example in Python.
Principal Component Analysis – easily explained!
Principal Component Analysis explained with examples and defining the prerequisites.
Population and Sample – simply explained!
Definition of population and sample with examples, advantages of sampling and sampling methods.
Normal distribution with definition, calculation example and the distinction between density function and distribution function.
Expected Value – easily explained!
Expected Value explained with examples and difference to arithmetic mean shown.
Other Articles on the Topic of Correlation and Causation
- Detailed definitions of the terms can be found here.