Pandas Basic Analysis 02: Series

Python

Pandas

Jupyter

Data Analytics

Part two of data analysis with Python Pandas: the series data structure.

Author

Dennis Chua

Published

May 20, 2025

02 Pandas Data Structure: Series

Content Outline

Introduction
Create a series from a Pandas data frame
Create a series from a Python collection
Taking sections of a series
Statistical operations on a series
Vector operation on a series

Jupyter notebook

WW2 Leaders CSV

Introduction

In Pandas the series is one of the core data structures for computation. A series is a one-dimensional array with labeled index. Like a Python array, a series is an ordered data type: it’s elements can be indexed with the [ ] notation. Element types can be heterogeneous; the index must be a hashable type.While each element of a series is mutable, the length of the series itself can never be updated.

For this notebook, we’re going to demonstrate Pandas series using “ww2_leaders.csv” file that we load before hand to Google Colab.

Create a Series From a Data Frame

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("sample_data/ww2_leaders.csv")
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 12 entries, 0 to 11
Data columns (total 6 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
 0   Name     12 non-null     object
 1   Born     12 non-null     object
 2   Died     12 non-null     object
 3   Age      12 non-null     int64 
 4   Title    12 non-null     object
 5   Country  12 non-null     object
dtypes: int64(1), object(5)
memory usage: 708.0+ bytes

df

	Name	Born	Died	Age	Title	Country
0	Franklin Roosevelt	1882-01-30	1945-04-12	63	President	United States
1	Joseph Stalin	1878-12-06	1953-03-05	74	Great Leader	Soviet Union
2	Adolph Hitler	1889-04-20	1945-04-30	56	Fuhrer	Germany
3	Michinomiya Hirohito	1901-04-29	1989-01-07	87	Emperor	Japan
4	Charles de Gaulle	1890-11-22	1970-11-09	79	President	France
5	Winston Churchill	1874-11-30	1965-01-24	90	Prime Minister	United Kingdom
6	Manuel Camacho	1897-04-24	1955-10-13	58	President	Mexico
7	Jan Smuts	1870-05-24	1950-09-11	80	Prime Minister	South Africa
8	Ibn Saud	1875-01-15	1953-11-09	78	King	Saudi Arabia
9	Plaek Phibunsongkhram	1897-07-14	1965-06-11	66	Prime Minister	Thailand
10	John Curtin	1885-01-08	1945-07-05	60	Prime Minister	Australia
11	Haile Selassie	1892-07-23	1975-08-27	83	Emperor	Ethiopia

Recall that a Pandas data frame is a two-dimensional collection made up of rows and columns. Conceptually a data frame row is equivalent to a Pandas series. Using the loc[] method, we can index a row and provision a new series with it.

deGaule = 4
s = pd.Series(df.loc[deGaule])
print(f"{s}\n\nType of s: {type(s)}")

Name       Charles de Gaulle
Born              1890-11-22
Died              1970-11-09
Age                       79
Title              President
Country               France
Name: 4, dtype: object

Type of s: <class 'pandas.core.series.Series'>

A series is made up of a collection of labels (column indices) and a collection of elements (rows). We use the keys() method and the values attribute to retrieve each list accordingly. Just like arrays in Python, we can index into specific elements in the keys and values of a series.

print(f"{s.keys()}\n\n{s.keys()[0]}\n{s.keys()[3]}")

Index(['Name', 'Born', 'Died', 'Age', 'Title', 'Country'], dtype='object')

Name
Age

print(s.values)

['Charles de Gaulle' '1890-11-22' '1970-11-09' np.int64(79) 'President'
 'France']

print(f"Name:\t\t{s.values[0]}\nCountry:\t{s.values[5]}\nAge:\t\t{s.values[3]}")

Name:       Charles de Gaulle
Country:    France
Age:        79

Create a Series From a Python Collection

Can we create our own series object on the fly? Yes, by using Panda’s Series() method and passing as parameters a list of values, of homogeneous or heterogenous types.

s = pd.Series(range(100, 120, 5))

print(f"{s}\n\nType of s: {type(s)}")

0    100
1    105
2    110
3    115
dtype: int64

Type of s: <class 'pandas.core.series.Series'>

s = pd.Series(['AAA', 32.4907, 100])

print(f"{s}\n\nType of s: {type(s)}")

0        AAA
1    32.4907
2        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>

print(f"{s}\n\nType of s: {type(s)}")

0        AAA
1    32.4907
2        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>

By default Pandas will assign integers as indices or labels to the series we’ve just created. If we wanted to provision a series and specify the labels, we do so by passing a second parameter to Series().

s = pd.Series(['AAA', 32.4907, 100], index=['word', 'float', 'integer'])

print(f"{s}\n\nType of s: {type(s)}")

word           AAA
float      32.4907
integer        100
dtype: object

Type of s: <class 'pandas.core.series.Series'>

Alternately we can pass a Python dictionary to create a Pandas series.

us_states_admission = {"Alabama": "1819-12-04", "Illinois": "1818-12-03", "Nevada": "1864-10-31"}
s =  pd.Series(us_states_admission)

print(f"{s}\n\nType of s: {type(s)}")

Alabama     1819-12-04
Illinois    1818-12-03
Nevada      1864-10-31
dtype: object

Type of s: <class 'pandas.core.series.Series'>

Taking Sections of a Series

Conceptually a Pandas series is a list. As with ordered Python collections, we can take slices of a Pandas series. For starters, we can treat a series like a Python array, indexing elements using the [ ] operator.

s = pd.Series(range(1,10))
s

0    1
1    2
2    3
3    4
4    5
5    6
6    7
7    8
8    9
dtype: int64

Again, without supplying any index, Pandas supplies default zero-based indices for our series, as if it were a Python array. Now we can access individual elements.

print(s[5])

We can also take slices.

s[2:6]

2    3
3    4
4    5
5    6
dtype: int64

s[:3]

0    1
1    2
2    3
dtype: int64

s[1:7:2]

1    2
3    4
5    6
dtype: int64

If our series came with indices, we can access elements with the array-like syntax shown above. Alternately, we can treat it like a Python dictionary, with the series indices analogous to dictionary keys, and access elements accordingly.

s = pd.Series(['AAA', 32.4907, 100], index=['word', 'float', 'integer'])
s['float']

32.4907

s['word':'float']

word         AAA
float    32.4907
dtype: object

Statistical Operations on Series

Let’s go back to our table of WW2 leaders.

df = pd.read_csv("sample_data/ww2_leaders.csv")
df

	Name	Born	Died	Age	Title	Country
0	Franklin Roosevelt	1882-01-30	1945-04-12	63	President	United States
1	Joseph Stalin	1878-12-06	1953-03-05	74	Great Leader	Soviet Union
2	Adolph Hitler	1889-04-20	1945-04-30	56	Fuhrer	Germany
3	Michinomiya Hirohito	1901-04-29	1989-01-07	87	Emperor	Japan
4	Charles de Gaulle	1890-11-22	1970-11-09	79	President	France
5	Winston Churchill	1874-11-30	1965-01-24	90	Prime Minister	United Kingdom
6	Manuel Camacho	1897-04-24	1955-10-13	58	President	Mexico
7	Jan Smuts	1870-05-24	1950-09-11	80	Prime Minister	South Africa
8	Ibn Saud	1875-01-15	1953-11-09	78	King	Saudi Arabia
9	Plaek Phibunsongkhram	1897-07-14	1965-06-11	66	Prime Minister	Thailand
10	John Curtin	1885-01-08	1945-07-05	60	Prime Minister	Australia
11	Haile Selassie	1892-07-23	1975-08-27	83	Emperor	Ethiopia

In our earlier example, we used the data frame loc() method to isolate a row and provision a new series from this. Pandas also allows us to create series from a data frame column.

age = df['Age']
print(f"Type of age: {type(age)}")

Type of age: <class 'pandas.core.series.Series'>

We can Panda’s describe() operation to report descriptive statistics for this series.

age.describe()

count    12.000000
mean     72.833333
std      11.784684
min      56.000000
25%      62.250000
50%      76.000000
75%      80.750000
max      90.000000
Name: Age, dtype: float64

Furthermore, if we just need the average age, we can use Panda’s mean() function.

age.mean()

np.float64(72.83333333333333)

Vector Operations on Series

Let’s say we’re interested in the age values that are less than or equal to the average age in this series. Pandas let’s use use the [ ] notation to apply a filtering logic, screening out the values that evaluate to false.

age[age <= age.mean()]

0     63
2     56
6     58
9     66
10    60
Name: Age, dtype: int64

Behind the scenes, the filter statement expands to a list of true or false values. Pandas applies this list to return only the elements corresponding to true.

print(age <= age.mean())

0      True
1     False
2      True
3     False
4     False
5     False
6      True
7     False
8     False
9      True
10     True
11    False
Name: Age, dtype: bool

We can see how this works using our own ad-hoc true-and-false list as a mask. In Pandas lingo this list is also known as a vector of boolean values.

names = pd.Series(df['Name'])
potentates = [False, True, True, True, False, False, False, False, True, False, False, True]
names[potentates]

1            Joseph Stalin
2            Adolph Hitler
3     Michinomiya Hirohito
8                 Ibn Saud
11          Haile Selassie
Name: Name, dtype: object

democratic_leaders = list(map(lambda x: not x, potentates))
names[democratic_leaders]

0        Franklin Roosevelt
4         Charles de Gaulle
5         Winston Churchill
6            Manuel Camacho
7                 Jan Smuts
9     Plaek Phibunsongkhram
10              John Curtin
Name: Name, dtype: object

Now it’s time to look at vector operations applied to series. We can add or multiply a series by a scalar value and Pandas will apply the operation to each element individually.

age = df['Age']

age + 1000

0     1063
1     1074
2     1056
3     1087
4     1079
5     1090
6     1058
7     1080
8     1078
9     1066
10    1060
11    1083
Name: Age, dtype: int64

age * 2

0     126
1     148
2     112
3     174
4     158
5     180
6     116
7     160
8     156
9     132
10    120
11    166
Name: Age, dtype: int64

Taking the age series itself as a parameter, we can carry out vector addition to the series. Pandas follows a one-to-one correspondence. In the example below, the outcome is the same as doubling each age, similar to the notebook cell above.

age + age

0     126
1     148
2     112
3     174
4     158
5     180
6     116
7     160
8     156
9     132
10    120
11    166
Name: Age, dtype: int64

What happens when pass as a parameter a series which doesn’t share the same shape?

age + pd.Series([100, 100])

0     163.0
1     174.0
2       NaN
3       NaN
4       NaN
5       NaN
6       NaN
7       NaN
8       NaN
9       NaN
10      NaN
11      NaN
dtype: float64

As we see here, Pandas carries out the vector operation element by element, but leaves the result undefined for elements with no matching parameters