A Pandas Series is a one-dimensional array that can have a labeled, i.e., not a numerical, index. A series can also be thought of as a column in a table that can store data of various types.
What do you need Pandas for?
The basic installation of Python already brings four different data structures in which any data type can be stored:
- The list is an ordered collection of elements, which is changeable and can also contain duplicate elements.
- The tuple is in effect a list, with the difference that it is no longer changeable. So no elements can be added or removed afterward.
- The set does not allow duplicate entries. At the same time, the arrangement of the elements within the set is variable. The set itself can be changed, but the individual elements cannot be changed afterward.
- Since Python version 3.7, a dictionary is an ordered collection of elements that can be changed. In the earlier versions, the dictionary is unordered.
Although you can already handle many use cases with these data structures, there are situations where they are not sufficient. For example, tables cannot be displayed with these native structures. Therefore, there are so-called modules, such as Pandas or NumPy, which make it possible to use further functionalities that would otherwise not be available in the basic installation. At the same time, it is also often the case that data structures from modules are significantly more performant than Python’s standard data objects. NumPy arrays, for example, have been optimized for vector and matrix calculations.
Likewise, there are modules for many other applications that complement the Python programming language and its functionality. The modules TensorFlow or Scikit-Learn, for example, are used for the creation of Machine Learning models.
What are the components of a Pandas Series?
The Pandas Series is one of the most basic data structures used in Pandas. It is a one-dimensional, ordered data structure that can be used, for example, to store the information in a table column or the numbers in a vector.
It offers the possibility to use a labeled index. However, if this is not explicitly specified, a numerical index starting with zero is automatically set. Thus it becomes also clear that the order of the elements in the Pandas Series plays an important role. The Series is said to be an ordered data structure. This means that two Pandas Series with the same elements in a different order are not the same object.
The simplest Pandas Series of all is the empty Series, which can be defined as follows:
import pandas as pd
series_1 = pd.Series()
print(series_1)
Out:
Series([], dtype: float64)
There are some parameters that can be specified within the Pandas Series function that change the properties of the object. If these are not explicitly specified, they will either be set automatically or the default value will be used. The following parameters can, but do not have to be set:
- data: This parameter defines the data to be stored in the series. Different data structures can be used, such as a list, a dictionary, or even a single value.
- index: With the help of the index, a labeled index for the elements in the series can be defined. If the parameter is not set, the elements will be numbered automatically, starting at zero.
- dtype: The optional parameter dtype sets the data types of the series. This is especially useful if all data in the series are of the same data type. For example, you can then define whether numbers are to be stored as integers or decimals.
- name: With this parameter, the series can be named. This is especially useful if the Series is to be part of a DataFrame. Then the name is the corresponding column name in the DataFrame.
- copy: This parameter can only take the values True or False and is therefore a Boolean value. It specifies whether the passed data should be saved as a copy or not. In most cases, however, it is not of great importance.
When all these parameters are used, a completely defined Series looks like this:
series_1 = pd.Series([1, 2, 3], index = ["A", "B", "C"], dtype = "int64", name = "Series of Integers", copy = False)
print(series_1)
Out:
A 1
B 2
C 3
Name: Series of Integers, dtype: int64
How can data be retrieved from a Series?
When querying data from a Pandas Series, we use the index in square brackets, as we already know from the Python list. If a textual index is available we use it, otherwise, the numeric index can be used:
series_1 = pd.Series(["first element", 2, "third element"])
print(series_1[0])
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
print(series_1["A"])
Out:
first element
first element
It can happen that we only know the element we want to query from the Series, but not the corresponding index. However, in the Pandas Series, it is not as easy to find out the associated index as it is, for example, with the list. One way to do this is to convert the Series to a list and then use the “.index” function to find out the corresponding index:
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
list(series_1).index(2)
Out:
1
How can values in a series be overwritten or added?
Existing values in a series can be overwritten by calling the corresponding index:
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_1["A"] = 1
series_1
Out:
A 1
B 2
C third element
dtype: object
This call can also be used if new values are to be included in an existing series. To do this, simply use an index that has not yet been used:
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_1["D"] = "fourth_element"
series_1
Out:
A first element
B 2
C third element
D fourth_element
dtype: object
How to query data with a condition?
Especially with numeric data, it can be useful to query data from a series that fulfill a certain condition. For this purpose, the corresponding filter is defined in square brackets instead of the index. For example, this can be used to output all elements of the series that are greater than four:
series_1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index = ["A", "B", "C", "D", "E", "F", "G", "H"])
series_1[series_1 > 4]
Out:
E 5
F 6
G 7
H 8
dtype: int64
If you want to use several conditions at the same time, you can define them in successive square brackets. Note that the conditions are connected with a logical “and”, which means that only values that meet all conditions are output. This allows us to filter our series for all values greater than four but not equal to eight, for example:
series_1 = pd.Series([1, 2, 3, 4, 5, 6, 7, 8], index = ["A", "B", "C", "D", "E", "F", "G", "H"])
series_1[series_1 > 4][series_1 != 8]
Out:
E 5
F 6
G 7
dtype: int64
How to create a Series from a Dictionary?
Besides a list, you can also use a Python Dictionary as data for a Pandas Series. This has the advantage that the index does not have to be specified explicitly. The keys of the dictionary are used as an index for the series and the values of the dictionary as data:
dict_1 = {"A": 1, "B": 2, "C": 3}
print(pd.Series(dict_1))
Out:
A 1
B 2
C 3
dtype: int64
What is the Pandas Series used for?
The Pandas Series is mainly used in connection with the Pandas DataFrames. They are used to build these table-like data structures since each column of the DataFrame consists of a separate Series.
It can also be used to store one-dimensional data with different data types and perform calculations with them. In the field of machine learning, they can also be used to store one-dimensional vectors and perform complex calculations such as vector multiplications.
How to create a DataFrame from a Pandas Series?
A DataFrame is basically a collection of several Pandas Series. Thus, it can be created relatively easily by naming the Series used.
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_2 = pd.Series([4, 5, 6], index = ["A", "B", "C"])
series_3 = pd.Series(["i", "don't", "know"], index = ["A", "B", "C"])
pd.DataFrame([series_1, series_2, series_3])
Out:
It is important that the Series objects either all have the same index or no index. Otherwise, a separate column will be created for each different index, for which the other rows have no value:
series_1 = pd.Series(["first element", 2, "third element"], index = ["A", "B", "C"])
series_2 = pd.Series([4, 5, 6], index = ["A", "B", "C"])
series_3 = pd.Series(["i", "don't", "know"], index = ["A", "B", "D"])
pd.DataFrame([series_1, series_2, series_3])
Out:
What is the difference between a Pandas Series and a Python List?
At this point in the article, you might have gotten the impression that the Pandas Series and the Python List are two very similar data structures whose main difference is that the list can only use numeric indexes, while the Pandas Series also allows textual indexes.
The main difference between the Series and list is not in their functionality or structure but in their application possibilities. In the field of Data Science, the Series is mainly used as a preliminary stage for tabular data, which in turn are to be illustrated in diagrams. This means that the Series can be viewed in concrete terms.
The list, on the other hand, is used to temporarily store complex data structures, so it tends to stay in the background and serves as a tool for complex calculations. The main differences are mainly:
- Homogeneity: A Pandas series requires all data elements to be of the same data type, while a Python list can contain elements of different data types.
- Memory efficiency: Pandas series are more memory efficient than Python lists because they use NumPy arrays internally, which are more compact and faster for numerical computations.
- Vectorization: Pandas series support vectorized operations that allow us to perform calculations on entire sequences of data at once, making data processing faster and more efficient. Python lists, on the other hand, do not support vectorized operations.
What is the difference between a Pandas Series and a Python Dictionary?
Although both Pandas Series and Python Dictionaries are key-value pairs, there are some important differences between them:
- Indexing: In a Pandas Series, a user-defined index can be used, which does not necessarily have to be numeric or sequential. The Python Dictionary, on the other hand, can only use hashable objects as keys, so usually, either strings, numbers, or tuples are used.
- Order: The elements in a Pandas Series are ordered. The Python Dictionary was originally an unordered data structure, but this has changed since Python version 3.7.
- Data type: A Pandas Series can only use a single data type for all elements. This does not mean that no more data types can be stored, but they are all converted to an “object”. The Python Dictionary, on the other hand, natively allows different data types as values.
- Functions: A Pandas Series can be used for data manipulation and analysis by using the built-in functions, for example for count or average. The Python Dictionary does not have these functions for data manipulation.
- Memory consumption: The memory consumption for a Pandas Series is significantly higher than for a comparable Python Dictionary, as the data is stored in a table format with index and column labels. The structure in dictionaries, on the other hand, is much more memory-optimized.
The Pandas Series is a specialized data structure for data analysis and is also optimized for this purpose. The Python Dictionary, on the other hand, is a more general data structure that can be used for a wide range of applications.
This is what you should take with you
- The Pandas Series is a one-dimensional array that can have a labeled, i.e., not a numerical, index.
- It is mainly used together with Pandas DataFrames. Here you can think of a Series as a single table column.
- The Series resembles the Python List in many functionalities with the difference that the List does not allow textual indices, but the Series does.
What are Conditional Statements in Python?
Learn how to use conditional statements in Python. Understand if-else, nested if, and elif statements for efficient programming.
What is XOR?
Explore XOR: The Exclusive OR operator's role in logic, encryption, math, AI, and technology.
How can you do Python Exception Handling?
Unlocking the Art of Python Exception Handling: Best Practices, Tips, and Key Differences Between Python 2 and Python 3.
What are Python Modules?
Explore Python modules: understand their role, enhance functionality, and streamline coding in diverse applications.
What are Python Comparison Operators?
Master Python comparison operators for precise logic and decision-making in programming.
What are Python Inputs and Outputs?
Master Python Inputs and Outputs: Explore inputs, outputs, and file handling in Python programming efficiently.
Other Articles on the Topic of Pandas Series
- You can find more useful information about the Pandas Series in the detailed tutorial from w3schools.
Niklas Lang
I have been working as a machine learning engineer and software developer since 2020 and am passionate about the world of data, algorithms and software development. In addition to my work in the field, I teach at several German universities, including the IU International University of Applied Sciences and the Baden-Württemberg Cooperative State University, in the fields of data science, mathematics and business analytics.
My goal is to present complex topics such as statistics and machine learning in a way that makes them not only understandable, but also exciting and tangible. I combine practical experience from industry with sound theoretical foundations to prepare my students in the best possible way for the challenges of the data world.