Skip to content

Pandas Series – easily explained!

A Pandas Series is a one-dimensional array that can have a labeled, i.e., not a numerical, index. A series can also be thought of as a column in a table that can store data of various types.

What do you need Pandas for?

The basic installation of Python already brings four different data structures in which any data type can be stored:

  • The list is an ordered collection of elements, which is changeable and can also contain duplicate elements.
  • The tuple is in effect a list, with the difference that it is no longer changeable. So no elements can be added or removed afterward.
  • The set does not allow duplicate entries. At the same time, the arrangement of the elements within the set is variable. The set itself can be changed, but the individual elements cannot be changed afterward.
  • Since Python version 3.7, a dictionary is an ordered collection of elements that can be changed. In the earlier versions, the dictionary is unordered.

Although you can already handle many use cases with these data structures, there are situations where they are not sufficient. For example, tables cannot be displayed with these native structures. Therefore, there are so-called modules, such as Pandas or NumPy, which make it possible to use further functionalities that would otherwise not be available in the basic installation. At the same time, it is also often the case that data structures from modules are significantly more performant than Python’s standard data objects. NumPy arrays, for example, have been optimized for vector and matrix calculations.

Likewise, there are modules for many other applications that complement the Python programming language and its functionality. The modules TensorFlow or Scikit-Learn, for example, are used for the creation of Machine Learning models.

What are the components of a Pandas Series?

The Pandas Series is one of the most basic data structures used in Pandas. It is a one-dimensional, ordered data structure that can be used, for example, to store the information in a table column or the numbers in a vector.

It offers the possibility to use a labeled index. However, if this is not explicitly specified, a numerical index starting with zero is automatically set. Thus it becomes also clear that the order of the elements in the Pandas Series plays an important role. The Series is said to be an ordered data structure. This means that two Pandas Series with the same elements in a different order are not the same object.

The simplest Pandas Series of all is the empty Series, which can be defined as follows:

import pandas as pd

series_1 = pd.Series()
print(series_1)

Out:
Series([], dtype: float64)

There are some parameters that can be specified within the Pandas Series function that change the properties of the object. If these are not explicitly specified, they will either be set automatically or the default value will be used. The following parameters can, but do not have to be set:

  • data: This parameter defines the data to be stored in the series. Different data structures can be used, such as a list, a dictionary, or even a single value.
  • index: With the help of the index, a labeled index for the elements in the series can be defined. If the parameter is not set, the elements will be numbered automatically, starting at zero.
  • dtype: The optional parameter dtype sets the data types of the series. This is especially useful if all data in the series are of the same data type. For example, you can then define whether numbers are to be stored as integers or decimals.
  • name: With this parameter, the series can be named. This is especially useful if the Series is to be part of a DataFrame. Then the name is the corresponding column name in the DataFrame.
  • copy: This parameter can only take the values True or False and is therefore a Boolean value. It specifies whether the passed data should be saved as a copy or not. In most cases, however, it is not of great importance.

When all these parameters are used, a completely defined Series looks like this:

series_1 = pd.Series([1, 2, 3], index = ["A", "B", "C"], dtype = "int64", name = "Series of Integers", copy = False)
print(series_1)

Out:
A    1
B    2
C    3
Name: Series of Integers, dtype: int64

How can data be retrieved from a Series?

When querying data from a Pandas Series, we use the index in square brackets, as we already know from the Python list. If a textual index is available we use it, otherwise, the numeric index can be used:

series_1 = pd.Series(["first element", 2, "third element"])
print(series_1[0])

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
print(series_1["A"])

Out:
first element
first element

It can happen that we only know the element we want to query from the Series, but not the corresponding index. However, in the Pandas Series, it is not as easy to find out the associated index as it is, for example, with the list. One way to do this is to convert the Series to a list and then use the “.index” function to find out the corresponding index:

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
list(series_1).index(2)

Out:
1

How can values in a series be overwritten or added?

Existing values in a series can be overwritten by calling the corresponding index:

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_1["A"] = 1
series_1

Out:
A                1
B                2
C    third element
dtype: object

This call can also be used if new values are to be included in an existing series. To do this, simply use an index that has not yet been used:

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_1["D"] = "fourth_element"
series_1

Out:
A     first element
B                 2
C     third element
D    fourth_element
dtype: object

How to query data with a condition?

Especially with numeric data, it can be useful to query data from a series that fulfill a certain condition. For this purpose, the corresponding filter is defined in square brackets instead of the index. For example, this can be used to output all elements of the series that are greater than four:

series_1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index = ["A", "B", "C", "D", "E", "F", "G", "H"])
series_1[series_1 > 4]

Out:
E    5
F    6
G    7
H    8
dtype: int64

If you want to use several conditions at the same time, you can define them in successive square brackets. Note that the conditions are connected with a logical “and”, which means that only values that meet all conditions are output. This allows us to filter our series for all values greater than four but not equal to eight, for example:

series_1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index = ["A", "B", "C", "D", "E", "F", "G", "H"])
series_1[series_1 > 4][series_1 != 8]

Out:
E    5
F    6
G    7
dtype: int64

How to create a Series from a Dictionary?

Besides a list, you can also use a Python Dictionary as data for a Pandas Series. This has the advantage that the index does not have to be specified explicitly. The keys of the dictionary are used as an index for the series and the values of the dictionary as data:

dict_1 = {"A": 1, "B": 2, "C": 3}
print(pd.Series(dict_1))

Out:
A    1
B    2
C    3
dtype: int64

What is the Pandas Series used for?

The Pandas Series is mainly used in connection with the Pandas DataFrames. They are used to build these table-like data structures since each column of the DataFrame consists of a separate Series.

It can also be used to store one-dimensional data with different data types and perform calculations with them. In the field of machine learning, they can also be used to store one-dimensional vectors and perform complex calculations such as vector multiplications.

How to create a DataFrame from a Pandas Series?

A DataFrame is basically a collection of several Pandas Series. Thus, it can be created relatively easily by naming the Series used.

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_2 = pd.Series([4, 5, 6], index = ["A", "B", "C"])
series_3 = pd.Series(["i", "don't", "know"], index = ["A", "B", "C"])

pd.DataFrame([series_1, series_2, series_3])

Out:

It is important that the Series objects either all have the same index or no index. Otherwise, a separate column will be created for each different index, for which the other rows have no value:

series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_2 = pd.Series([4, 5, 6], index = ["A", "B", "C"])
series_3 = pd.Series(["i", "don't", "know"], index = ["A", "B", "D"])

pd.DataFrame([series_1, series_2, series_3])

Out:

What is the difference between a Pandas Series and a Python List?

At this point in the article, you might have gotten the impression that the Pandas Series and the Python List are two very similar data structures whose main difference is that the list can only use numeric indexes, while the Pandas Series also allows textual indexes.

The main difference between the Series and list is not in their functionality or structure but in their application possibilities. In the field of Data Science, the Series is mainly used as a preliminary stage for tabular data, which in turn are to be illustrated in diagrams. This means that the Series can be viewed in concrete terms.

The list, on the other hand, is used to temporarily store complex data structures, so it tends to stay in the background and serves as a tool for complex calculations. The main differences are mainly:

  • Homogeneity: A Pandas series requires all data elements to be of the same data type, while a Python list can contain elements of different data types.
  • Memory efficiency: Pandas series are more memory efficient than Python lists because they use NumPy arrays internally, which are more compact and faster for numerical computations.
  • Vectorization: Pandas series support vectorized operations that allow us to perform calculations on entire sequences of data at once, making data processing faster and more efficient. Python lists, on the other hand, do not support vectorized operations.

What is the difference between a Pandas Series and a Python Dictionary?

Although both Pandas Series and Python Dictionaries are key-value pairs, there are some important differences between them:

  • Indexing: In a Pandas Series, a user-defined index can be used, which does not necessarily have to be numeric or sequential. The Python Dictionary, on the other hand, can only use hashable objects as keys, so usually, either strings, numbers, or tuples are used.
  • Order: The elements in a Pandas Series are ordered. The Python Dictionary was originally an unordered data structure, but this has changed since Python version 3.7.
  • Data type: A Pandas Series can only use a single data type for all elements. This does not mean that no more data types can be stored, but they are all converted to an “object”. The Python Dictionary, on the other hand, natively allows different data types as values.
  • Functions: A Pandas Series can be used for data manipulation and analysis by using the built-in functions, for example for count or average. The Python Dictionary does not have these functions for data manipulation.
  • Memory consumption: The memory consumption for a Pandas Series is significantly higher than for a comparable Python Dictionary, as the data is stored in a table format with index and column labels. The structure in dictionaries, on the other hand, is much more memory-optimized.

The Pandas Series is a specialized data structure for data analysis and is also optimized for this purpose. The Python Dictionary, on the other hand, is a more general data structure that can be used for a wide range of applications.

This is what you should take with you

  • The Pandas Series is a one-dimensional array that can have a labeled, i.e., not a numerical, index.
  • It is mainly used together with Pandas DataFrames. Here you can think of a Series as a single table column.
  • The Series resembles the Python List in many functionalities with the difference that the List does not allow textual indices, but the Series does.
Classes and Objects in Python / Klassen und Objekte in Python

What are Classes and Objects in Python?

Mastering Python's Object-Oriented Programming: Explore Classes, Objects, and their Interactions in our Informative Article!

Threading and Multiprocessing in Python.

What is Threading and Multiprocessing in Python?

Boost your Python performance and efficiency with threading and multiprocessing techniques. Learn how to harness parallel processing power.

Anaconda Python

What is Anaconda for Python?

Learn the essentials of Anaconda in Python for efficient package management and data science workflows. Boost your productivity today!

Regular Expressions

What are Regular Expressions?

Unlock powerful text manipulation in Python with regular expressions. Master patterns, syntax, and advanced techniques for data processing.

Object-Oriented Programming / Objektorientierte Programmierung

What is Object-Oriented Programming?

Master Object-Oriented Programming concepts in Python with our beginner's guide. Learn to create reusable code and optimize your coding skills.

Plotly

What is Plotly?

Learn how to create interactive visualizations and dashboards with Plotly, a Python data visualization library.

  • You can find more useful information about the Pandas Series in the detailed tutorial from w3schools.
Das Logo zeigt einen weißen Hintergrund den Namen "Data Basecamp" mit blauer Schrift. Im rechten unteren Eck wird eine Bergsilhouette in Blau gezeigt.

Don't miss new articles!

We do not send spam! Read everything in our Privacy Policy.

Cookie Consent with Real Cookie Banner