Skip to content

Population and Sample – simply explained!

The samples are individual elements of all objects (e.g. society) from which data are collected in a study. These can then be used for statistical analysis. 

The population is the summary of all units under investigation. The aim of statistical analysis is to be able to make statements about this group. 

These are used to conduct scientific experiments and determine if there is a statistical relationship between several variables (Correlation and Causation).

Das Bild zeigt mehrere Menschengruppen. Die größte ist die gesamte Population und die kleinere die Stichprobe.
Population and Sample | Source: Author

A brief example: On the evening of the Bundestag election (election of the German parliament), the first projection with results is shown punctually at 6 p.m.. Since the polling stations do not close until this time, only a fraction of all votes cast can be counted, the sample. The purpose of the extrapolation is to make an accurate statistical statement about what the result will be for all votes cast, the basic population. As the evening progresses and more ballots are counted, the extrapolation also approaches the actual later election result and reflects reality more and more accurately.

What are the types of the population?

In statistics, a distinction is made between three types of population based on the number of elements and the actual countability of this population.

  • Finite population: The finite population comprises a finite number of members, which can therefore be measured within integers. A finite population represents, for example, the workforce of a company or the total number of households in an area or country. The majority of the populations studied can be represented by a finite population.
  • Infinite population: The infinite population, on the other hand, contains an infinite number of members. It is therefore not possible to examine the entire population. This group includes, for example, the number of all possible coin tosses or the number of bacteria in a certain environment, neither of which can be represented finitely.
  • Theoretical population: The theoretical population comprises a group of people, animals or objects that are considered for a statistical study and are theoretically finite, but this population simply cannot be determined. For example, when trying to make a statement about all people who have ever lived on planet Earth. In the same way, however, the total number of people with a certain genetic characteristic is also a theoretical population, as it is simply not possible to genetically examine every human being.

Knowledge of these types of populations is essential for the selection of a suitable sampling method and, above all, for correct statistical conclusions. Without this knowledge, incorrect generalizations can be made.

Population vs. Sample Examples

Research QuestionPopulationSample
How much money does a German citizen spend on food per month?All German citizens (over 18 years)10,000 randomly encountered supermarket visitors
How old is the average student at the University of Stuttgart?All students enrolled at the University of StuttgartSurvey of students visiting Stuttgart University Library on a Saturday
How long is a song on the streaming platform Spotify?All songs uploaded to the platform at the time, exclusive podcasts100,000 randomly selected songs available in Germany
Practical examples for population and sample

4 Reasons for using samples instead of population

  • Practicability: It is easier and more feasible to collect data only from the sample, rather than the entire population.
  • Resource efficiency: The study saves costs for the survey, for example, through less time spent by the researchers or lower logistical costs, such as travel costs.   
  • Necessity: Depending on the research question, it may also be nearly impossible to study the entire population. For example, the U.S. only conducts a complete census every 10 years. Due to the lack of mandatory reporting in the states, this represents such a large expense that it can only be taken once a decade.
  • Simpler data management: Due to the smaller number of people surveyed, less data is generated overall. Thus, there are lower costs for storing and processing the data. In addition, the calculations can also be performed much more quickly and easily.

Sampling Methods

To obtain a sample of a population, two types of sampling are distinguished: 

Probability sampling is characterized by the fact that each element of a population has an equal chance of being part of the sample. For a population of 100 people, for example, this means that each person has a 1 in 100 (= 1%) chance of becoming part of the unit of study. These methods are usually very costly and time-consuming. 

Non-probability sampling is the exact opposite. In this case, not all elements of the population have the same probability of becoming part of the study. An example of this would be if the University of Stuttgart wanted to evaluate all German students, but only surveyed students from its university for the study. This saves the research team the time and expense of interviewing and studying students outside of Stuttgart. 

In addition to this very general subdivision, more detailed sampling methods can also be found:

  • Stratified random sampling: Here, the population is divided into subgroups that are formed depending on certain characteristics, such as age or gender. A sample is then formed from each of these subgroups, the size of which depends on the ratio of the size of the subgroup in the population. This procedure ensures that the overall sample is also representative of the population.
  • Cluster samples: The cluster sample creates clusters from the population. These can be regional, for example, such as cities or districts. A random selection is then taken from each cluster. This method can be more efficient than a random selection of the entire population if the clusters are as homogeneous as possible. Heterogeneous clusters, on the other hand, leads to less efficient results than a random sample of the entire population.
  • Systematic random sample: In this method, the members of the method are sorted according to a specific characteristic and then the nth member is always included in the examination unit. With a large population and a simple characteristic for sorting, it can lead to a more efficient random sample.
  • Random sampling: This classic method is used to create quick and inexpensive samples. It involves selecting people who are readily available or easy to reach. A survey of selected visitors to a weekly market is an example of a random sample. However, this method can lead to serious distortions if the selection is not representative of the population.

The choice of the appropriate sampling method depends on various factors, such as the research question, the characteristics of the population, the resources available, and the desired level of precision and accuracy. It is important to consider these factors carefully

How to find the right size for the study unit?

Before starting the statistical analysis and collection of data, it should be determined how large the selection size should ideally be. This value depends on several influencing factors. One of the most important factors here is the size of the population itself. If the population is larger, the study unit should also be correspondingly larger. The sampling method also affects the required sample size. In a random sample, for example, as many members as possible should be part of the sample to prevent bias.

In addition, a certain buffer should always be planned for the size of the sample, especially for longer-term studies, as problems may arise in the course of the experiment that requires members to be left out of the sample, which reduces the sample size.

The desired degree of precision is another characteristic that influences the sample size. If a higher degree of precision is to be achieved, more members must be included in the sample. The desired confidence interval of the hypothesis, for example, also plays an important role here. The variability of the characteristics in the population also plays an important role. Greater variability requires a larger sample size.

Das Diagramm zeigt die Glockenkurve mit dem Erwartungswert (Expected Value) in Orange in der Mitte der Kurve.
Confidence interval for a normal distribution | Source: Author

Finally, the available resources should also be taken into account in order to determine the size of the sample. In many cases, the time and cost budget of the study limits the size of the sample.

It is therefore clear that many different factors have an influence on the size of the sample and should therefore be taken into account. The most important points here include the size of the population, the sampling method, the desired degree of accuracy and the available budget. There are also some formulas and software tools that can help calculate a suitable sample size based on these characteristics.

This is what you should take with you

  • The samples are individual elements of all objects from which data are collected in an investigation.
  • The population is the summary of all units of study.
  • The use of samples is preferable to the use of the entire population for various reasons, such as practicality or resource efficiency.
  • Samples can be collected either by random sampling or by non-random sampling. The difference is that in random sampling, all elements of the population have the same probability of appearing in the sample. In the non-random sample, this is not the case.
Median

What is the Median?

Learn about the median and its significance in data analysis. Explore its computation, applications, and limitations.

Arima

What is the ARIMA Model?

Master time series forecasting with ARIMA models: Learn to analyze and predict trends in data. Step-by-step guide with Python examples.

Game Theory / Spieltheorie

What is Game Theory?

Discover the power of game theory and its real-world applications in policy making, negotiation, and decision-making. Learn more in this article.

Multivariate Analysis / Multivariate Analyse

What is Multivariate Analysis?

Unlock the power of multivariate analysis: Explore techniques to analyze and uncover relationships in your data in our comprehensive guide.

Bayesian Statistics / Bayessche Statistik

What are Bayesian Statistics?

Unlocking insights with Bayesian statistics: Optimize decision-making and quantify uncertainty for robust data analysis.

Confidence Intervals / Konfidenzintervalle

What are Confidence Intervals?

Quantify uncertainty and make informed decisions with Confidence Intervals: Measure the reliability of estimates and enhance statistical analysis.

  • The selection procedures for research units are described in more detail here.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner