Selection bias occurs when a sample is not chosen completely at random and is therefore no longer representative. Selection Bias means that the distribution of characteristics is not the same as in the population.
What is the Selection Bias?
Selection bias refers to a bias in the composition of samples that can lead to the distortion of data in surveys or studies. Therefore, one must be careful when interpreting such data. Selection bias is not necessarily directly apparent but only becomes apparent upon close examination of the sample determination process.
A perfect sample is one in which every person in the population has the same probability of being represented. If this circumstance is not given, the sample is biased. The bias can be stronger or weaker, depending on the situation.
What are some examples of Selection Bias?
Suppose we want to find out how much money people spend on consumption on average. However, surveying all German adults would be too time-consuming and costly. Therefore, we decide to take a random sample and go to Munich’s city center to do so and survey random passers-by.
Due to the random selection of participants, we assume that our sample is meaningful. However, the publication of the results has been met with massive criticism. The accusation: selection bias! The following problems arise with our selection:
- Does every German adult really have the same probability of appearing in the sample? What about adults from Berlin or Hamburg?
- Is the income level of Munich comparable to the German average? If not, what impact does that have on our sample?
- What about adults who generate their consumer spending primarily on the Internet and e-commerce? How would these individuals change the results?
- What age groups do we encounter on a Friday afternoon in Munich? Which age groups might not be represented at that time?
The sampling errors do not always have to be as obvious as in our example. Sometimes they cannot even be prevented and must be included in the interpretation of the results.
Another example of selection bias for many is the choice of profession. If one relies only on experiences and opinions from one’s close family and friends, one is already subject to bias. This selection covers only a certain range of professions and is not representative of all possible professions. This distorts the result because people do not obtain any information at all or only very little information about certain occupational groups.
The following examples are also subject to bias:
- Surveys: People can decide for themselves whether or not to participate in a survey. This inevitably leads to bias because a certain group of people, those who do not participate in surveys, are not present in the sample.
- Direct Questioning: The way the results of the sample were collected can also have an impact on the bias. Most people will probably be uncomfortable admitting in a direct interview that they have ever driven drunk. In a written survey, on the other hand, more participants might answer honestly.
What are the types of Selection Bias?
There are a variety of reasons for selection bias. Here we have listed only the types of sampling bias that are most common:
- Attrition bias occurs when participants have dropped out of the study or survey prematurely and are not counted in the final results for this reason. It is important not to make the mistake of simply removing these subjects from the sample because, for example, the treatment did not work for them.
- A similar phenomenon is the so-called volunteer bias, where the bias arises because the participants actively agree to be part of the sample. Consenting to participate can already be a characteristic that distinguishes the sample from the population and thus distorts the result. In reality, this bias is often difficult to prevent. However, it should be taken into account when interpreting the results.
- Social bias occurs when the type of survey or study makes it highly likely that people will not answer truthfully. This can lead to the problem that not a truthful answer is given, but one that is socially acceptable or that puts the respondent in a better light.
How can Selection Bias be prevented?
The most important point in preventing selection bias is first of all the awareness of possible problems in one’s own experimental setup. In addition, some sampling biases simply cannot be prevented. If you want to conduct a large-scale study, for example in the medical field, you have to rely on volunteer participants and volunteer bias cannot be prevented.
Thus, no general tips can be given on how to avoid selection bias, as this depends strongly on the individual case. The only important thing is, to be honest when publishing the results and to provide as much information as possible about the sample creation. It is always helpful to be open about possible problems and to be transparent.
This is what you should take with you
- Selection bias, or sampling bias, occurs when a sample is not chosen completely at random and is therefore no longer representative.
- There are many different types of selection bias, such as volunteer bias or attrition bias, which can occur depending on the experiment.
- Possible strategies to prevent sampling bias depend on the individual case. However, it is important to transparently show how the samples were created when publishing the results.
What is the Standard Deviation?
Explanation of the standard deviation and the relationship to the variance.
tSNE: t-distributed stochastic neighbor embedding
Explanation of tSNE including example in Python.
Principal Component Analysis – easily explained!
Principal Component Analysis explained with examples and defining the prerequisites.
Population and Sample – simply explained!
Definition of population and sample with examples, advantages of sampling and sampling methods.
Correlation and Causation – easily explained!
Correlation and causality: explain differences using examples, prove correlation coefficient and causality.
Normal distribution with definition, calculation example and the distinction between density function and distribution function.
Expected Value – easily explained!
Expected Value explained with examples and difference to arithmetic mean shown.
Other Articles on the Topic of Selection Bias
- The University of Oxford has published a collection of biases here.