Pandas Prep of DnD5e Monsters Data

Python

Pandas

Jupyter

Data Analytics

Part one of DnD5e Monsters analysis with Pandas.

Author

Dennis Chua

Published

June 7, 2025

D&D 5th Edition Monsters Preparation (Kaggle Dataset)

Content Outline

Introduction
Loading the Data
Data Reorganization and Cleanup
- Split ‘Race + alignment’ column
- Remove duplicate row
- Split ‘Challenge rating (XP)’ column
- Split ‘Armor’ column

Jupyter notebook

Original DnD5e Monsters CSV

Introduction

Dungeons and Dragons (DnD) is a cooperative role-playing game with roots in the tabletop war gaming community. At fifty years old as of this writing, DnD has enjoyed phenomenal success among players world wide. Of all the many challenges player face in the game, the most deadly ones are monsters. Originally drawn from mythology and works of fiction, a myriad of fantastical creatures have emerged from the imagination of countless fans. The catalog has only grown since the first Monster Manual was published in 1977.

In this notebook we take a look at monster game stats for the 5th edition of DnD. The format of the Kaggle data set isn’t suitable for data exploration and analysis (EDA). Using Python language features such as lambda expressions and regex, we’ll pre-process the data before we analyze it any further.

import pandas as pd

Loading the Data

Kaggle has a handy dataset for D&D 5th Edition monsters. Before going to work on this, we download the CSV file, rename it as “DnD5e_monsters_orig.csv” and then upload it to Colab.

df = pd.read_csv('sample_data/DnD5e_monsters_orig.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 324 entries, 0 to 323
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Name                    324 non-null    object
 1   Size                    324 non-null    object
 2   Race + alignment        324 non-null    object
 3   HP                      324 non-null    object
 4   Armor                   324 non-null    object
 5   Speed                   324 non-null    object
 6   Challenge rating  (XP)  324 non-null    object
dtypes: object(7)
memory usage: 17.8+ KB

df.head()

	Name	Size	Race + alignment	HP	Armor	Speed	Challenge rating (XP)
0	Aboleth	Large	aberration, Lawful Evil	135 (18d10+36)	17 (Natural Armor)	10 ft., swim 40 ft.	10 (5,900 XP)
1	Acolyte	Medium	humanoid (any race), Any Alignment	9 (2d8)	10	30 ft.	1/4 (50 XP)
2	Adult Black Dragon	Huge	dragon, Chaotic Evil	195 (17d12+85)	19 (Natural Armor)	40 ft., fly 80 ft., swim 40 ft.	14 (11,500 XP)
3	Adult Blue Dragon	Huge	dragon, Lawful Evil	225 (18d12+108)	19	40 ft., burrow 30 ft., fly 80 ft.	16 (15,000 XP)
4	Adult Brass Dragon	Huge	dragon, Chaotic Good	172 (15d12+75)	18 (Natural Armor)	40 ft., burrow 40 ft., fly 80 ft.	13 (10,000 XP)

Data Reorganization and Cleanup

The data set has eight columns and 324 rows. Looking at the column headers, we’d like to split the Race + alignment information into two separate column. We note that the monster race and alignment and separated by a comma. We can instruct Pandas to divide the infomration along this delimeter. This, then, is the first alteration we perform on our data frame.

def split_racealignment_column(df_orig, inplace=True):
  """
  Splits the 'Race + alignment' column into 'Race' and 'Alignment'.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Race + alignment' column split.
  """
  retval_df = df_orig.copy()
  split_columns_df = retval_df['Race + alignment'].str.split(r',', expand=True)
  retval_df['Race'] = split_columns_df[0].astype(str).str.strip()
  retval_df['Alignment'] = split_columns_df[1].astype(str).str.strip()
  retval_df.drop(columns=['Race + alignment'], inplace=True)
  return retval_df

split_race_alignment_df = split_racealignment_column(df)
split_race_alignment_df.head()

	Name	Size	HP	Armor	Speed	Challenge rating (XP)	Race	Alignment
0	Aboleth	Large	135 (18d10+36)	17 (Natural Armor)	10 ft., swim 40 ft.	10 (5,900 XP)	aberration	Lawful Evil
1	Acolyte	Medium	9 (2d8)	10	30 ft.	1/4 (50 XP)	humanoid (any race)	Any Alignment
2	Adult Black Dragon	Huge	195 (17d12+85)	19 (Natural Armor)	40 ft., fly 80 ft., swim 40 ft.	14 (11,500 XP)	dragon	Chaotic Evil
3	Adult Blue Dragon	Huge	225 (18d12+108)	19	40 ft., burrow 30 ft., fly 80 ft.	16 (15,000 XP)	dragon	Lawful Evil
4	Adult Brass Dragon	Huge	172 (15d12+75)	18 (Natural Armor)	40 ft., burrow 40 ft., fly 80 ft.	13 (10,000 XP)	dragon	Chaotic Good

Problems immediately arise. An extraneous comma chacter is embedded in the subrace data for ‘Imp’ and the various were-monsters (‘Werebear’, ‘Wereboar’, ‘Wererat’, ‘Weretiger’, ‘Werewolf’). When it encounters this, our Pandas str.split() operation fails to correctly distinguish the race from the alignment.

Here are the offending rows isolated with the help of regular expressions.

import re
bad_racealign_df = df[df['Race + alignment'].apply(lambda x: True if re.search('.*,.*,', x) else False)]
bad_racealign_df

	Name	Size	Race + alignment	HP	Armor	Speed	Challenge rating (XP)
173	Imp	Tiny	fiend (devil, shapechanger), Lawful Evil	10 (3d4+3)	13	20 ft., fly 40 ft.	1 (200 XP)
299	Werebear	Medium	humanoid (human, shapechanger), Neutral Good	135 (18d8+54)	10 (In Humanoid Form, 11 In Bear And Hybrid Fo...	30 ft. (40 ft., climb 30 ft. in bear or hybrid...	5 (1,800 XP)
300	Wereboar	Medium	humanoid (human, shapechanger), Neutral Evil	78 (12d8+24)	10 (In Humanoid Form, 11 In Boar And Hybrid Fo...	30 ft. (40 ft. in boar form)	4 (1,100 XP)
301	Wererat	Medium	humanoid (human, shapechanger), Lawful Evil	33 (6d8+6)	12	30 ft.	2 (450 XP)
302	Weretiger	Medium	humanoid (human, shapechanger), Neutral	120 (16d8+48)	12	30 ft. (40 ft. in tiger form)	4 (1,100 XP)
303	Werewolf	Medium	humanoid (human, shapechanger), Chaotic Evil	58 (9d8+18)	11 (In Humanoid Form, 12 In Wolf Or Hybrid Form)	30 ft. (40 ft. in wolf form)	3 (700 XP)

As you can see, re.split() will divide the column along the first comma of the two. This runs counter to our expectation that there should only be one comma delimiter in all the values of the Race + alignment column. To deal with this obstacle, let’s first get the list of indices of the offending rows.

bad_racealign_idx = list(bad_racealign_df.index)
bad_racealign_idx

[173, 299, 300, 301, 302, 303]

With this list, it’s a simple matter up applying a Python lambda that uses regex to subsitute the first comma with a space, updating only the rows in our bad_realign_idx list.

df.loc[bad_racealign_idx, 'Race + alignment'] = df.loc[bad_racealign_idx, 'Race + alignment'].apply(lambda str: re.sub(',', ' ', str, count=1))
df.loc[bad_racealign_idx]

	Name	Size	Race + alignment	HP	Armor	Speed	Challenge rating (XP)
173	Imp	Tiny	fiend (devil shapechanger), Lawful Evil	10 (3d4+3)	13	20 ft., fly 40 ft.	1 (200 XP)
299	Werebear	Medium	humanoid (human shapechanger), Neutral Good	135 (18d8+54)	10 (In Humanoid Form, 11 In Bear And Hybrid Fo...	30 ft. (40 ft., climb 30 ft. in bear or hybrid...	5 (1,800 XP)
300	Wereboar	Medium	humanoid (human shapechanger), Neutral Evil	78 (12d8+24)	10 (In Humanoid Form, 11 In Boar And Hybrid Fo...	30 ft. (40 ft. in boar form)	4 (1,100 XP)
301	Wererat	Medium	humanoid (human shapechanger), Lawful Evil	33 (6d8+6)	12	30 ft.	2 (450 XP)
302	Weretiger	Medium	humanoid (human shapechanger), Neutral	120 (16d8+48)	12	30 ft. (40 ft. in tiger form)	4 (1,100 XP)
303	Werewolf	Medium	humanoid (human shapechanger), Chaotic Evil	58 (9d8+18)	11 (In Humanoid Form, 12 In Wolf Or Hybrid Form)	30 ft. (40 ft. in wolf form)	3 (700 XP)

With the two-comma problem resolved, we can go ahead and split the Race + alignment column into two, removing the original column in the process.

split_racealignment_df = split_racealignment_column(df)

split_racealignment_df.groupby('Race').count()

	Name	Size	HP	Armor	Speed	Challenge rating (XP)	Alignment
Race
Fiend (Shapechanger)	2	2	2	2	2	2	2
aberration	5	5	5	5	5	5	5
beast	87	87	87	87	87	87	87
celestial	6	6	6	6	6	6	6
construct	9	9	9	9	9	9	9
dragon	43	43	43	43	43	43	43
elemental	16	16	16	16	16	16	16
fey	6	6	6	6	6	6	6
fiend	4	4	4	4	4	4	4
fiend (demon)	8	8	8	8	8	8	8
fiend (devil shapechanger)	1	1	1	1	1	1	1
fiend (devil)	9	9	9	9	9	9	9
fiend (shapechanger)	1	1	1	1	1	1	1
giant	10	10	10	10	10	10	10
humanoid (any race)	21	21	21	21	21	21	21
humanoid (dwarf)	1	1	1	1	1	1	1
humanoid (elf)	1	1	1	1	1	1	1
humanoid (gnoll)	1	1	1	1	1	1	1
humanoid (gnome)	1	1	1	1	1	1	1
humanoid (goblinoid)	3	3	3	3	3	3	3
humanoid (grimlock)	1	1	1	1	1	1	1
humanoid (human shapechanger)	5	5	5	5	5	5	5
humanoid (human)	1	1	1	1	1	1	1
humanoid (kobold)	1	1	1	1	1	1	1
humanoid (lizardfolk)	1	1	1	1	1	1	1
humanoid (merfolk)	1	1	1	1	1	1	1
humanoid (orc)	1	1	1	1	1	1	1
humanoid (sahuagin)	1	1	1	1	1	1	1
monstrosity	35	35	35	35	35	35	35
monstrosity (shapechanger)	2	2	2	2	2	2	2
monstrosity (titan)	2	2	2	2	2	2	2
ooze	4	4	4	4	4	4	4
plant	6	6	6	6	6	6	6
swarm of tiny beasts	10	10	10	10	10	10	10
undead	17	17	17	17	17	17	17
undead (shapechanger)	1	1	1	1	1	1	1

OK we come to the second update that we need to perform. Scanning that output above, we’re alerted to two entries that potentially overlap: * Fiend (Shapechanger) * fiend (shapechanger)

In fact, examining the split data frame, we isolate the three suspect rows.

split_racealignment_df.loc[split_racealignment_df['Race'].isin(['Fiend (Shapechanger)','fiend (shapechanger)'])]

	Name	Size	HP	Armor	Speed	Challenge rating (XP)	Race	Alignment
174	Incubus	Medium	66 (12d8 + 12)	15 (Natural Armor)	30 ft., fly 60 ft.	4 (1,100 XP)	Fiend (Shapechanger)	Neutral Evil
268	Succubus	Medium	66 (12d8 + 12)	15 (Natural Armor)	30 ft., fly 60 ft.	4 (1,100 XP)	Fiend (Shapechanger)	Neutral Evil
269	Succubus (Incubus)	Medium	66 (12d8+12)	15 (Natural Armor)	30 ft., fly 60 ft.	4 (1,100 XP)	fiend (shapechanger)	Neutral Evil

It’s clear that index 269 duplicates the information in rows 174 and 268. (In the traditional interpretation of the game, a succcubus is female while an incubus is male. Both monsters are different.) Let’s go ahead and delete row 269.

split_racealignment_df.drop(index=269, inplace=True)

We proceed to the third alteration we want to make to our data frame. The Challenge rating (XP) column holds two distinct pieces of information. Let’s break that into Challenge and XP column, removing the original source column as we go.

def split_challengexp_column(df_orig, inplace=True):
  """
  Splits the 'Challenge rating (XP)' column into 'Challenge' and 'XP'.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Challenge rating (XP)' column split.
  """
  retval_df = df_orig.copy()
  split_columns_df = retval_df['Challenge rating  (XP)'].str.split(r' \(', expand=True)
  retval_df['Challenge'] = split_columns_df[0].astype(str).str.strip()
  retval_df['XP'] = (split_columns_df[1].str.split(' XP\)', expand=True))[0].astype(str).str.strip()
  retval_df.drop(columns=['Challenge rating  (XP)'], inplace=True)
  return retval_df

<>:14: SyntaxWarning: invalid escape sequence '\)'
<>:14: SyntaxWarning: invalid escape sequence '\)'
/tmp/ipykernel_10330/2349129103.py:14: SyntaxWarning: invalid escape sequence '\)'
  retval_df['XP'] = (split_columns_df[1].str.split(' XP\)', expand=True))[0].astype(str).str.strip()

split_challengexp_column_df = split_challengexp_column(split_racealignment_df)
split_challengexp_column_df.head()

	Name	Size	HP	Armor	Speed	Race	Alignment	Challenge	XP
0	Aboleth	Large	135 (18d10+36)	17 (Natural Armor)	10 ft., swim 40 ft.	aberration	Lawful Evil	10	5,900
1	Acolyte	Medium	9 (2d8)	10	30 ft.	humanoid (any race)	Any Alignment	1/4	50
2	Adult Black Dragon	Huge	195 (17d12+85)	19 (Natural Armor)	40 ft., fly 80 ft., swim 40 ft.	dragon	Chaotic Evil	14	11,500
3	Adult Blue Dragon	Huge	225 (18d12+108)	19	40 ft., burrow 30 ft., fly 80 ft.	dragon	Lawful Evil	16	15,000
4	Adult Brass Dragon	Huge	172 (15d12+75)	18 (Natural Armor)	40 ft., burrow 40 ft., fly 80 ft.	dragon	Chaotic Good	13	10,000

The last change we need to make to our data frame is to separate the Armor information into Armor_Class and related notes, called Armor_Info. In the process we create two new columns and remove the original one. Again we call on regular expressions but this time we use a more complex matching pattern: ()(\((.*)\))?

Our regex pattern has three components, or groups, of regular and special characters.

() - Matches one or more integers representing the Armor_Class
() - Matches zero or more white space.
((.*)\))? - Matches the Armor_Info, a parenthetical sequence made of a series of characters followed by a right parenthesis. For example, “(Leather Armor)”. The surrounding ( )? groups the pattern and indicates that we’re looking for at most a single occurence of this.

def split_armor_column(df_orig):
  """
  Splits the 'Armor' column into 'Armor_Class' and 'Armor_Type'.
  Handles cases where Armor has no type specified.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Armor' column split.
  """
  retval_df = df_orig.copy()

  # Initialize new columns
  retval_df['Armor_Class'] = None
  retval_df['Armor_Info'] = None

  for index, row in retval_df.iterrows():
      armor_value = str(row['Armor']).strip()

      # Use regex to split, handling potential missing parentheses
      match = re.match(r'(\d+)(\s*)(\((.*)\))?', armor_value)
      if match:
          retval_df.loc[index, 'Armor_Class'] = match.group(1).strip()
          # Check if there was a type specified in parentheses
          if match.group(3):
            retval_df.loc[index, 'Armor_Info'] = match.group(3).strip().lstrip('(').rstrip(')')
          else:
            # If no type in parentheses, assume Natural Armor
            retval_df.loc[index, 'Armor_Info'] = "Natural Armor"
      else:
          # Handle cases that don't match the expected pattern
          retval_df.loc[index, 'Armor_Class'] = armor_value
          retval_df.loc[index, 'Armor_Info'] = "Unknown" # Or handle as appropriate

  # Drop the original 'Armor' column after processing
  retval_df.drop(columns=['Armor'], inplace=True)
  return retval_df

split_armor_column_df = split_armor_column(split_challengexp_column_df)
split_armor_column_df.head()

	Name	Size	HP	Speed	Race	Alignment	Challenge	XP	Armor_Class	Armor_Info
0	Aboleth	Large	135 (18d10+36)	10 ft., swim 40 ft.	aberration	Lawful Evil	10	5,900	17	Natural Armor
1	Acolyte	Medium	9 (2d8)	30 ft.	humanoid (any race)	Any Alignment	1/4	50	10	Natural Armor
2	Adult Black Dragon	Huge	195 (17d12+85)	40 ft., fly 80 ft., swim 40 ft.	dragon	Chaotic Evil	14	11,500	19	Natural Armor
3	Adult Blue Dragon	Huge	225 (18d12+108)	40 ft., burrow 30 ft., fly 80 ft.	dragon	Lawful Evil	16	15,000	19	Natural Armor
4	Adult Brass Dragon	Huge	172 (15d12+75)	40 ft., burrow 40 ft., fly 80 ft.	dragon	Chaotic Good	13	10,000	18	Natural Armor

We’ve reached a good checkpoint in our work. Before we draw things to a close, let’s store the cleaned-up data to a CSV file. As we don’t need the row headers – the index – we strip this away with the index=False option before writing the data frame to the file.

split_armor_column_df.to_csv('sample_data/DnD5e_monsters_cleaned.csv', index=False)