Skip to content

What is the Median?

The median is a statistical measure that represents the central value of a dataset. It is the value that separates the dataset into two equal halves, with half of the values being higher and the other half being lower than the median. In this article, we will explore what the median is, how it is calculated, and its uses in statistics and data analysis.

What is the Median?

The median is a measure of central tendency in statistics. It is the value that separates a dataset into two equal halves. To calculate it, we first need to sort the values in the dataset in either ascending or descending order. Once the values are sorted, we can find the middle value or the average of the two middle values if there are an even number of values in the dataset. The resulting value is the median.

Median
Example for seven numeric values | Source: Author

For example, let’s consider the following dataset of 7 values: 4, 6, 8, 10, 11, 15, 20. To find the median, we first sort the values in ascending order: 4, 6, 8, 10, 11, 15, 20. Since there are 7 values, the middle value is the fourth value, which is 10. Therefore, the median of this dataset is 10.

How do you calculate the Median?

The process of calculating this measure is always the same and can be described by this algorithm:

  1. Sort the values in the dataset in either ascending or descending order.
  2. Determine the middle value(s) of the dataset based on the number of values in the dataset:
    • If there are an odd number of values, the median is the middle value.
    • If there are an even number of values, it is the average of the two middle values.
  3. The resulting value is the median.

What is the difference between the Median and the Mean?

When it comes to measuring central tendency, the median and mean are two of the most commonly used measures. Although they both provide information on the center of the data, they have different properties and are appropriate for different types of data.

The mean is calculated by adding up all the values in the dataset and dividing by the total number of values. It is sensitive to extreme values, also known as outliers, and tends to be skewed towards the direction of the outliers. In contrast, the median is the middle value in the dataset when the values are arranged in ascending or descending order. It is less sensitive to outliers, making it a robust measure of central tendency.

There, we can already see one difference between these two measures. In most cases, the mean is a value outside of the dataset due to the way it is calculated. Only in very rare cases, it takes on a value that is already in the dataset. The median, however, is usually inside the dataset if the number of data points is uneven.

Mean
Comparison of Mean and Median in a Dataset | Source: Author

Here are some key differences between the median and mean:

  1. Outliers: As mentioned, the mean is influenced by extreme values or outliers, while the other measure of central tendency is not. If there are outliers in the data, the mean may not be a representative measure of central tendency, and the median may be a better choice.
  2. Skewness: The mean tends to be skewed towards the direction of the outliers, which can make it inappropriate for datasets that are skewed. The median, on the other hand, is robust to skewness and can provide a more accurate representation of central tendency in these cases.
  3. Sample size: The mean is affected by the sample size, while the median is not. As the sample size increases, the mean becomes more stable and reliable, whereas the other remains the same.
  4. Type of data: The median is appropriate for ordinal and interval data, while the mean is appropriate for interval and ratio data.

In summary, the median and mean are both useful measures of central tendency, but their suitability depends on the properties of the data. If the data is skewed or contains outliers, the former may be a better choice, while the latter may be more appropriate for large, normally distributed datasets.

What are the use cases of the Median?

It is a useful measure in statistics and data analysis and has several applications in various fields, including:

  1. Descriptive statistics: The median is a common measure of central tendency used in descriptive statistics to summarize a dataset. It is used to provide an overview of the typical or central value in the dataset.
  2. Skewed data: The measure is more robust than the mean for skewed datasets because it is less affected by extreme values or outliers.
  3. Income distribution: It is used to measure income inequality in a population because it represents the income level at which half of the population earns more and half earns less.
  4. Survival analysis: The median is used in survival analysis to represent the time at which half of the subjects in a study have an event or fail.

More generally there are a lot of other applications where this measure of central tendency is used.

In statistics, it is used as a measure of central tendency that is not affected by extreme values, unlike the mean. This makes it useful in analyzing data sets with outliers or skewed distributions. It is also used in inferential statistics to calculate confidence intervals and estimate parameters.

In finance, this measure is used to calculate the middle price of stocks or assets, which can provide a more accurate representation of the market than the mean price. It is also used in real estate to determine the median price of homes or apartments in a particular area, which can be helpful for both buyers and sellers.

In healthcare, it is used to calculate the survival time of patients with a certain disease or condition. This can help doctors and researchers evaluate the effectiveness of treatments and develop new therapies.

In transportation, the median is used to calculate the travel time or distance between locations, which can help in planning and optimizing routes.

The median is also used in social sciences to calculate the income or wealth of a population, which can help in evaluating economic trends and policies. It is also used in education to determine the expected test scores of students, which can be helpful in evaluating academic performance and identifying areas for improvement.

What are the advantages and disadvantages?

The median has several advantages over other measures of central tendency, such as the mean:

  1. Robustness: It is a more robust measure than the mean for skewed datasets because it is less affected by extreme values or outliers.
  2. Easy to calculate: The measure is easy to calculate, even for large datasets.
  3. Meaningful for ordinal data: The median is meaningful for ordinal data, where the values have a natural order but the differences between the values are not meaningful.

But the statistical measure also has some disadvantages:

  1. Not sensitive to all values: The median is not sensitive to all values in the dataset, but only to the middle value(s). Therefore, it may not be the best measure of central tendency for datasets with many extreme values.
  2. Can be ambiguous: It can be ambiguous if there are ties or repeated values in the dataset. In such cases, there may be more than one value.

What are the limitations of the Median?

While the median is a useful measure of central tendency, it also has some limitations. In some situations, other measures may be more appropriate. Here are some limitations of it and situations where other measures of central tendency may be more appropriate:

  1. Skewed Distributions: When the distribution of data is skewed, the median may not be a good measure of central tendency. In such cases, the mean may be more appropriate.
  2. Outliers: If a dataset has extreme values or outliers, the median may be affected. In such cases, the trimmed mean or winsorized mean may be a better measure of central tendency.
  3. Small Sample Sizes: For small sample sizes, the median may not be as reliable as the mean. In such cases, the mode may be a better measure of central tendency.
  4. Continuous Data: The median may not be useful for continuous data, especially if the data is evenly distributed. In such cases, the mean or geometric mean may be a better measure of central tendency.
  5. Nominal Data: The median cannot be calculated for nominal data, which are data that do not have any order or hierarchy. In such cases, the mode is the appropriate measure of central tendency.

In summary, the choice of the appropriate measure of central tendency depends on the distribution of data, presence of outliers, sample size, type of data, and research question. Therefore, it is important to understand the strengths and limitations of each measure and choose the one that best suits the research question at hand.

What is the difference between the Median and other Quantiles?

The median is a measure of central tendency that is often used in statistics to describe the distribution of a dataset. It is a type of quantile, which is a value that divides a dataset into equal portions based on the ranking of the values. While it is a type of quantile, there are some key differences between the median and other quantiles.

One important difference is that the median is the middle value in a dataset, whereas other quantiles divide the dataset into equal portions but do not necessarily fall in the middle. For example, the first quartile (Q1) is the value that separates the lowest 25% of the dataset from the rest, while the third quartile (Q3) separates the highest 25% of the dataset from the rest.

Another difference is that the central tendency measure is more robust to outliers than other quantiles. Outliers are values that are much larger or smaller than the other values in the dataset, and they can have a significant impact on the mean and other measures of central tendency. Because the median is based on the middle value of the dataset, it is less affected by outliers than the mean or other quantiles.

Finally, the median is often used when the dataset is not normally distributed or when there are extreme values in the dataset. Other quantiles are more appropriate when the dataset is normally distributed and does not have extreme values.

How do you calculate the measure in Python?

In Python, you can calculate the measure using the equally named median() the function provided by the statistics module. Here is an example:

Alternatively, you can also use the NumPy library:

This will output the same result as before.

This is what you should take with you

  • The median is a measure of central tendency that represents the middle value in a dataset.
  • It is useful in situations where extreme values or outliers may skew the data.
  • Compared to the mean, it is less affected by outliers and is more robust.
  • However, it may not be the best measure of central tendency in all situations, and other measures like the mode or mean may be more appropriate.
  • It can be easily calculated in Python using the NumPy library.

Thanks to Deepnote for sponsoring this article! Deepnote offers me the possibility to embed Python code easily and quickly on this website and also to host the related notebooks in the cloud.

Variance Inflation Factor (VIF) / Varianzinflationsfaktor

What is the Variance Inflation Factor (VIF)?

Learn how Variance Inflation Factor (VIF) detects multicollinearity in regression models for better data analysis.

Dummy Variable Trap

What is the Dummy Variable Trap?

Escape the Dummy Variable Trap: Learn About Dummy Variables, Their Purpose, the Trap's Consequences, and how to detect it.

R-Squared / Bestimmtheitsmaß

What is the R-squared?

Introduction to R-Squared: Learn its Significance, Calculation, Limitations, and Practical Use in Regression Analysis.

Arima

What is the ARIMA Model?

Master time series forecasting with ARIMA models: Learn to analyze and predict trends in data. Step-by-step guide with Python examples.

Game Theory / Spieltheorie

What is Game Theory?

Discover the power of game theory and its real-world applications in policy making, negotiation, and decision-making. Learn more in this article.

Multivariate Analysis / Multivariate Analyse

What is Multivariate Analysis?

Unlock the power of multivariate analysis: Explore techniques to analyze and uncover relationships in your data in our comprehensive guide.

This link will get you to my Deepnote App where you can find all the code that I used in this article and can run it yourself.

Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner