Pandas Basic Analysis 02: Series

Python
Pandas
Jupyter
Data Analytics
Part two of data analysis with Python Pandas: the series data structure.
Author

Dennis Chua

Published

May 20, 2025

Open In Colab

02 Pandas Data Structure: Series

Content Outline

  • Introduction
  • Create a series from a Pandas data frame
  • Create a series from a Python collection
  • Taking sections of a series
  • Statistical operations on a series
  • Vector operation on a series

Introduction

In Pandas the series is one of the core data structures for computation. A series is a one-dimensional array with labeled index. Like a Python array, a series is an ordered data type: it’s elements can be indexed with the [ ] notation. Element types can be heterogeneous; the index must be a hashable type.While each element of a series is mutable, the length of the series itself can never be updated.

For this notebook, we’re going to demonstrate Pandas series using “ww2_leaders.csv” file that we load before hand to Google Colab.

Create a Series From a Data Frame

import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("sample_data/ww2_leaders.csv")
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     12 non-null     object
 1   Born     12 non-null     object
 2   Died     12 non-null     object
 3   Age      12 non-null     int64 
 4   Title    12 non-null     object
 5   Country  12 non-null     object
dtypes: int64(1), object(5)
memory usage: 708.0+ bytes
df
Name Born Died Age Title Country
0 Franklin Roosevelt 1882-01-30 1945-04-12 63 President United States
1 Joseph Stalin 1878-12-06 1953-03-05 74 Great Leader Soviet Union
2 Adolph Hitler 1889-04-20 1945-04-30 56 Fuhrer Germany
3 Michinomiya Hirohito 1901-04-29 1989-01-07 87 Emperor Japan
4 Charles de Gaulle 1890-11-22 1970-11-09 79 President France
5 Winston Churchill 1874-11-30 1965-01-24 90 Prime Minister United Kingdom
6 Manuel Camacho 1897-04-24 1955-10-13 58 President Mexico
7 Jan Smuts 1870-05-24 1950-09-11 80 Prime Minister South Africa
8 Ibn Saud 1875-01-15 1953-11-09 78 King Saudi Arabia
9 Plaek Phibunsongkhram 1897-07-14 1965-06-11 66 Prime Minister Thailand
10 John Curtin 1885-01-08 1945-07-05 60 Prime Minister Australia
11 Haile Selassie 1892-07-23 1975-08-27 83 Emperor Ethiopia

Recall that a Pandas data frame is a two-dimensional collection made up of rows and columns. Conceptually a data frame row is equivalent to a Pandas series. Using the loc[] method, we can index a row and provision a new series with it.

deGaule = 4
s = pd.Series(df.loc[deGaule])
print(f"{s}\n\nType of s: {type(s)}")
Name       Charles de Gaulle
Born              1890-11-22
Died              1970-11-09
Age                       79
Title              President
Country               France
Name: 4, dtype: object

Type of s: <class 'pandas.core.series.Series'>

A series is made up of a collection of labels (column indices) and a collection of elements (rows). We use the keys() method and the values attribute to retrieve each list accordingly. Just like arrays in Python, we can index into specific elements in the keys and values of a series.

print(f"{s.keys()}\n\n{s.keys()[0]}\n{s.keys()[3]}")
Index(['Name', 'Born', 'Died', 'Age', 'Title', 'Country'], dtype='object')

Name
Age
print(s.values)
['Charles de Gaulle' '1890-11-22' '1970-11-09' np.int64(79) 'President'
 'France']
print(f"Name:\t\t{s.values[0]}\nCountry:\t{s.values[5]}\nAge:\t\t{s.values[3]}")
Name:       Charles de Gaulle
Country:    France
Age:        79

Create a Series From a Python Collection

Can we create our own series object on the fly? Yes, by using Panda’s Series() method and passing as parameters a list of values, of homogeneous or heterogenous types.

s = pd.Series(range(100, 120, 5))
print(f"{s}\n\nType of s: {type(s)}")
0    100
1    105
2    110
3    115
dtype: int64

Type of s: <class 'pandas.core.series.Series'>
s = pd.Series(['AAA', 32.4907, 100])
print(f"{s}\n\nType of s: {type(s)}")
0        AAA
1    32.4907
2        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>
print(f"{s}\n\nType of s: {type(s)}")
0        AAA
1    32.4907
2        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>

By default Pandas will assign integers as indices or labels to the series we’ve just created. If we wanted to provision a series and specify the labels, we do so by passing a second parameter to Series().

s = pd.Series(['AAA', 32.4907, 100], index=['word', 'float', 'integer'])
print(f"{s}\n\nType of s: {type(s)}")
word           AAA
float      32.4907
integer        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>

Alternately we can pass a Python dictionary to create a Pandas series.

us_states_admission = {"Alabama": "1819-12-04", "Illinois": "1818-12-03", "Nevada": "1864-10-31"}
s =  pd.Series(us_states_admission)
print(f"{s}\n\nType of s: {type(s)}")
Alabama     1819-12-04
Illinois    1818-12-03
Nevada      1864-10-31
dtype: object

Type of s: <class 'pandas.core.series.Series'>

Taking Sections of a Series

Conceptually a Pandas series is a list. As with ordered Python collections, we can take slices of a Pandas series. For starters, we can treat a series like a Python array, indexing elements using the [ ] operator.

s = pd.Series(range(1,10))
s
0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
dtype: int64

Again, without supplying any index, Pandas supplies default zero-based indices for our series, as if it were a Python array. Now we can access individual elements.

print(s[5])
6

We can also take slices.

s[2:6]
2    3
3    4
4    5
5    6
dtype: int64
s[:3]
0    1
1    2
2    3
dtype: int64
s[1:7:2]
1    2
3    4
5    6
dtype: int64

If our series came with indices, we can access elements with the array-like syntax shown above. Alternately, we can treat it like a Python dictionary, with the series indices analogous to dictionary keys, and access elements accordingly.

s = pd.Series(['AAA', 32.4907, 100], index=['word', 'float', 'integer'])
s['float']
32.4907
s['word':'float']
word         AAA
float    32.4907
dtype: object

Statistical Operations on Series

Let’s go back to our table of WW2 leaders.

df = pd.read_csv("sample_data/ww2_leaders.csv")
df
Name Born Died Age Title Country
0 Franklin Roosevelt 1882-01-30 1945-04-12 63 President United States
1 Joseph Stalin 1878-12-06 1953-03-05 74 Great Leader Soviet Union
2 Adolph Hitler 1889-04-20 1945-04-30 56 Fuhrer Germany
3 Michinomiya Hirohito 1901-04-29 1989-01-07 87 Emperor Japan
4 Charles de Gaulle 1890-11-22 1970-11-09 79 President France
5 Winston Churchill 1874-11-30 1965-01-24 90 Prime Minister United Kingdom
6 Manuel Camacho 1897-04-24 1955-10-13 58 President Mexico
7 Jan Smuts 1870-05-24 1950-09-11 80 Prime Minister South Africa
8 Ibn Saud 1875-01-15 1953-11-09 78 King Saudi Arabia
9 Plaek Phibunsongkhram 1897-07-14 1965-06-11 66 Prime Minister Thailand
10 John Curtin 1885-01-08 1945-07-05 60 Prime Minister Australia
11 Haile Selassie 1892-07-23 1975-08-27 83 Emperor Ethiopia

In our earlier example, we used the data frame loc() method to isolate a row and provision a new series from this. Pandas also allows us to create series from a data frame column.

age = df['Age']
print(f"Type of age: {type(age)}")
Type of age: <class 'pandas.core.series.Series'>

We can Panda’s describe() operation to report descriptive statistics for this series.

age.describe()
count    12.000000
mean     72.833333
std      11.784684
min      56.000000
25%      62.250000
50%      76.000000
75%      80.750000
max      90.000000
Name: Age, dtype: float64

Furthermore, if we just need the average age, we can use Panda’s mean() function.

age.mean()
np.float64(72.83333333333333)

Vector Operations on Series

Let’s say we’re interested in the age values that are less than or equal to the average age in this series. Pandas let’s use use the [ ] notation to apply a filtering logic, screening out the values that evaluate to false.

age[age <= age.mean()]
0     63
2     56
6     58
9     66
10    60
Name: Age, dtype: int64

Behind the scenes, the filter statement expands to a list of true or false values. Pandas applies this list to return only the elements corresponding to true.

print(age <= age.mean())
0      True
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10     True
11    False
Name: Age, dtype: bool

We can see how this works using our own ad-hoc true-and-false list as a mask. In Pandas lingo this list is also known as a vector of boolean values.

names = pd.Series(df['Name'])
potentates = [False, True, True, True, False, False, False, False, True, False, False, True]
names[potentates]
1            Joseph Stalin
2            Adolph Hitler
3     Michinomiya Hirohito
8                 Ibn Saud
11          Haile Selassie
Name: Name, dtype: object
democratic_leaders = list(map(lambda x: not x, potentates))
names[democratic_leaders]
0        Franklin Roosevelt
4         Charles de Gaulle
5         Winston Churchill
6            Manuel Camacho
7                 Jan Smuts
9     Plaek Phibunsongkhram
10              John Curtin
Name: Name, dtype: object

Now it’s time to look at vector operations applied to series. We can add or multiply a series by a scalar value and Pandas will apply the operation to each element individually.

age = df['Age']
age + 1000
0     1063
1     1074
2     1056
3     1087
4     1079
5     1090
6     1058
7     1080
8     1078
9     1066
10    1060
11    1083
Name: Age, dtype: int64
age * 2
0     126
1     148
2     112
3     174
4     158
5     180
6     116
7     160
8     156
9     132
10    120
11    166
Name: Age, dtype: int64

Taking the age series itself as a parameter, we can carry out vector addition to the series. Pandas follows a one-to-one correspondence. In the example below, the outcome is the same as doubling each age, similar to the notebook cell above.

age + age
0     126
1     148
2     112
3     174
4     158
5     180
6     116
7     160
8     156
9     132
10    120
11    166
Name: Age, dtype: int64

What happens when pass as a parameter a series which doesn’t share the same shape?

age + pd.Series([100, 100])
0     163.0
1     174.0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN
9       NaN
10      NaN
11      NaN
dtype: float64

As we see here, Pandas carries out the vector operation element by element, but leaves the result undefined for elements with no matching parameters