The samples are individual elements of all objects (e.g. society) from which data are collected in a study. These can then be used for statistical analysis.
The population is the summary of all units under investigation. The aim of statistical analysis is to be able to make statements about this group.
These are used to conduct scientific experiments and determine if there is a statistical relationship between several variables (Correlation and Causation).
A brief example: On the evening of the Bundestag election (election of the German parliament), the first projection with results is shown punctually at 6 p.m.. Since the polling stations do not close until this time, only a fraction of all votes cast can be counted, the sample. The purpose of the extrapolation is to make an accurate statistical statement about what the result will be for all votes cast, the basic population. As the evening progresses and more ballots are counted, the extrapolation also approaches the actual later election result and reflects reality more and more accurately.
Population vs. Sample Examples
|How much money does a German citizen spend on food per month?||All German citizens (over 18 years)||10,000 randomly encountered supermarket visitors|
|How old is the average student at the University of Stuttgart?||All students enrolled at the University of Stuttgart||Survey of students visiting Stuttgart University Library on a Saturday|
|How long is a song on the streaming platform Spotify?||All songs uploaded to the platform at the time, exclusive podcasts||100,000 randomly selected songs available in Germany|
4 Reasons for using samples instead of population
- Practicability: It is easier and more feasible to collect data only from the sample, rather than the entire population.
- Resource efficiency: The study saves costs for the survey, for example, through less time spent by the researchers or lower logistical costs, such as travel costs.
- Necessity: Depending on the research question, it may also be nearly impossible to study the entire population. For example, the U.S. only conducts a complete census every 10 years. Due to the lack of mandatory reporting in the states, this represents such a large expense that it can only be taken once a decade.
- Simpler data management: Due to the smaller number of people surveyed, less data is generated overall. Thus, there are lower costs for storing and processing the data. In addition, the calculations can also be performed much more quickly and easily.
To obtain a sample of a population, two types of sampling are distinguished:
Probability sampling is characterized by the fact that each element of a population has an equal chance of being part of the sample. For a population of 100 people, for example, this means that each person has a 1 in 100 (= 1%) chance of becoming part of the unit of study. These methods are usually very costly and time-consuming.
Non-probability sampling is the exact opposite. In this case, not all elements of the population have the same probability of becoming part of the study. An example of this would be if the University of Stuttgart wanted to make an evaluation for all German students, but only surveyed students from its own university for the study. This saves the research team the time and expense of interviewing and studying students outside of Stuttgart.
This is what you should take with you
- The samples are individual elements of all objects from which data are collected in an investigation.
- The population is the summary of all unit of study.
- The use of samples is preferable to the use of the entire population for various reasons, such as practicality or resource efficiency.
- Samples can be collected either by random sampling or by non-random sampling. The difference is that in random sampling, all elements of the population have the same probability of appearing in the sample. In the non-random sample, this is not the case.
What is the Standard Deviation?
Explanation of the standard deviation and the relationship to the variance.
What is the Selection Bias?
Explanation of selection bias with examples.
tSNE: t-distributed stochastic neighbor embedding
Explanation of tSNE including example in Python.
Principal Component Analysis – easily explained!
Principal Component Analysis explained with examples and defining the prerequisites.
Correlation and Causation – easily explained!
Correlation and causality: explain differences using examples, prove correlation coefficient and causality.
Normal distribution with definition, calculation example and the distinction between density function and distribution function.
Expected Value – easily explained!
Expected Value explained with examples and difference to arithmetic mean shown.
Other Articles on the Topic of Population and Sample
- The selection procedures for research units are described in more detail here.