Pandas Prep of DnD5e Monsters Data

Python
Pandas
Jupyter
Data Analytics
Part one of DnD5e Monsters analysis with Pandas.
Author

Dennis Chua

Published

June 7, 2025

Open In Colab

D&D 5th Edition Monsters Preparation (Kaggle Dataset)

Content Outline

  • Introduction
  • Loading the Data
  • Data Reorganization and Cleanup
    • Split ‘Race + alignment’ column
    • Remove duplicate row
    • Split ‘Challenge rating (XP)’ column
    • Split ‘Armor’ column

Introduction

Dungeons and Dragons (DnD) is a cooperative role-playing game with roots in the tabletop war gaming community. At fifty years old as of this writing, DnD has enjoyed phenomenal success among players world wide. Of all the many challenges player face in the game, the most deadly ones are monsters. Originally drawn from mythology and works of fiction, a myriad of fantastical creatures have emerged from the imagination of countless fans. The catalog has only grown since the first Monster Manual was published in 1977.

In this notebook we take a look at monster game stats for the 5th edition of DnD. The format of the Kaggle data set isn’t suitable for data exploration and analysis (EDA). Using Python language features such as lambda expressions and regex, we’ll pre-process the data before we analyze it any further.

import pandas as pd

Loading the Data

Kaggle has a handy dataset for D&D 5th Edition monsters. Before going to work on this, we download the CSV file, rename it as “DnD5e_monsters_orig.csv” and then upload it to Colab.

df = pd.read_csv('sample_data/DnD5e_monsters_orig.csv')
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 324 entries, 0 to 323
Data columns (total 7 columns):
 #   Column                  Non-Null Count  Dtype 
---  ------                  --------------  ----- 
 0   Name                    324 non-null    object
 1   Size                    324 non-null    object
 2   Race + alignment        324 non-null    object
 3   HP                      324 non-null    object
 4   Armor                   324 non-null    object
 5   Speed                   324 non-null    object
 6   Challenge rating  (XP)  324 non-null    object
dtypes: object(7)
memory usage: 17.8+ KB
df.head()
Name Size Race + alignment HP Armor Speed Challenge rating (XP)
0 Aboleth Large aberration, Lawful Evil 135 (18d10+36) 17 (Natural Armor) 10 ft., swim 40 ft. 10 (5,900 XP)
1 Acolyte Medium humanoid (any race), Any Alignment 9 (2d8) 10 30 ft. 1/4 (50 XP)
2 Adult Black Dragon Huge dragon, Chaotic Evil 195 (17d12+85) 19 (Natural Armor) 40 ft., fly 80 ft., swim 40 ft. 14 (11,500 XP)
3 Adult Blue Dragon Huge dragon, Lawful Evil 225 (18d12+108) 19 40 ft., burrow 30 ft., fly 80 ft. 16 (15,000 XP)
4 Adult Brass Dragon Huge dragon, Chaotic Good 172 (15d12+75) 18 (Natural Armor) 40 ft., burrow 40 ft., fly 80 ft. 13 (10,000 XP)

Data Reorganization and Cleanup

The data set has eight columns and 324 rows. Looking at the column headers, we’d like to split the Race + alignment information into two separate column. We note that the monster race and alignment and separated by a comma. We can instruct Pandas to divide the infomration along this delimeter. This, then, is the first alteration we perform on our data frame.

def split_racealignment_column(df_orig, inplace=True):
  """
  Splits the 'Race + alignment' column into 'Race' and 'Alignment'.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Race + alignment' column split.
  """
  retval_df = df_orig.copy()
  split_columns_df = retval_df['Race + alignment'].str.split(r',', expand=True)
  retval_df['Race'] = split_columns_df[0].astype(str).str.strip()
  retval_df['Alignment'] = split_columns_df[1].astype(str).str.strip()
  retval_df.drop(columns=['Race + alignment'], inplace=True)
  return retval_df
split_race_alignment_df = split_racealignment_column(df)
split_race_alignment_df.head()
Name Size HP Armor Speed Challenge rating (XP) Race Alignment
0 Aboleth Large 135 (18d10+36) 17 (Natural Armor) 10 ft., swim 40 ft. 10 (5,900 XP) aberration Lawful Evil
1 Acolyte Medium 9 (2d8) 10 30 ft. 1/4 (50 XP) humanoid (any race) Any Alignment
2 Adult Black Dragon Huge 195 (17d12+85) 19 (Natural Armor) 40 ft., fly 80 ft., swim 40 ft. 14 (11,500 XP) dragon Chaotic Evil
3 Adult Blue Dragon Huge 225 (18d12+108) 19 40 ft., burrow 30 ft., fly 80 ft. 16 (15,000 XP) dragon Lawful Evil
4 Adult Brass Dragon Huge 172 (15d12+75) 18 (Natural Armor) 40 ft., burrow 40 ft., fly 80 ft. 13 (10,000 XP) dragon Chaotic Good

Problems immediately arise. An extraneous comma chacter is embedded in the subrace data for ‘Imp’ and the various were-monsters (‘Werebear’, ‘Wereboar’, ‘Wererat’, ‘Weretiger’, ‘Werewolf’). When it encounters this, our Pandas str.split() operation fails to correctly distinguish the race from the alignment.

Here are the offending rows isolated with the help of regular expressions.

import re
bad_racealign_df = df[df['Race + alignment'].apply(lambda x: True if re.search('.*,.*,', x) else False)]
bad_racealign_df
Name Size Race + alignment HP Armor Speed Challenge rating (XP)
173 Imp Tiny fiend (devil, shapechanger), Lawful Evil 10 (3d4+3) 13 20 ft., fly 40 ft. 1 (200 XP)
299 Werebear Medium humanoid (human, shapechanger), Neutral Good 135 (18d8+54) 10 (In Humanoid Form, 11 In Bear And Hybrid Fo... 30 ft. (40 ft., climb 30 ft. in bear or hybrid... 5 (1,800 XP)
300 Wereboar Medium humanoid (human, shapechanger), Neutral Evil 78 (12d8+24) 10 (In Humanoid Form, 11 In Boar And Hybrid Fo... 30 ft. (40 ft. in boar form) 4 (1,100 XP)
301 Wererat Medium humanoid (human, shapechanger), Lawful Evil 33 (6d8+6) 12 30 ft. 2 (450 XP)
302 Weretiger Medium humanoid (human, shapechanger), Neutral 120 (16d8+48) 12 30 ft. (40 ft. in tiger form) 4 (1,100 XP)
303 Werewolf Medium humanoid (human, shapechanger), Chaotic Evil 58 (9d8+18) 11 (In Humanoid Form, 12 In Wolf Or Hybrid Form) 30 ft. (40 ft. in wolf form) 3 (700 XP)

As you can see, re.split() will divide the column along the first comma of the two. This runs counter to our expectation that there should only be one comma delimiter in all the values of the Race + alignment column. To deal with this obstacle, let’s first get the list of indices of the offending rows.

bad_racealign_idx = list(bad_racealign_df.index)
bad_racealign_idx
[173, 299, 300, 301, 302, 303]

With this list, it’s a simple matter up applying a Python lambda that uses regex to subsitute the first comma with a space, updating only the rows in our bad_realign_idx list.

df.loc[bad_racealign_idx, 'Race + alignment'] = df.loc[bad_racealign_idx, 'Race + alignment'].apply(lambda str: re.sub(',', ' ', str, count=1))
df.loc[bad_racealign_idx]
Name Size Race + alignment HP Armor Speed Challenge rating (XP)
173 Imp Tiny fiend (devil shapechanger), Lawful Evil 10 (3d4+3) 13 20 ft., fly 40 ft. 1 (200 XP)
299 Werebear Medium humanoid (human shapechanger), Neutral Good 135 (18d8+54) 10 (In Humanoid Form, 11 In Bear And Hybrid Fo... 30 ft. (40 ft., climb 30 ft. in bear or hybrid... 5 (1,800 XP)
300 Wereboar Medium humanoid (human shapechanger), Neutral Evil 78 (12d8+24) 10 (In Humanoid Form, 11 In Boar And Hybrid Fo... 30 ft. (40 ft. in boar form) 4 (1,100 XP)
301 Wererat Medium humanoid (human shapechanger), Lawful Evil 33 (6d8+6) 12 30 ft. 2 (450 XP)
302 Weretiger Medium humanoid (human shapechanger), Neutral 120 (16d8+48) 12 30 ft. (40 ft. in tiger form) 4 (1,100 XP)
303 Werewolf Medium humanoid (human shapechanger), Chaotic Evil 58 (9d8+18) 11 (In Humanoid Form, 12 In Wolf Or Hybrid Form) 30 ft. (40 ft. in wolf form) 3 (700 XP)

With the two-comma problem resolved, we can go ahead and split the Race + alignment column into two, removing the original column in the process.

split_racealignment_df = split_racealignment_column(df)
split_racealignment_df.groupby('Race').count()
Name Size HP Armor Speed Challenge rating (XP) Alignment
Race
Fiend (Shapechanger) 2 2 2 2 2 2 2
aberration 5 5 5 5 5 5 5
beast 87 87 87 87 87 87 87
celestial 6 6 6 6 6 6 6
construct 9 9 9 9 9 9 9
dragon 43 43 43 43 43 43 43
elemental 16 16 16 16 16 16 16
fey 6 6 6 6 6 6 6
fiend 4 4 4 4 4 4 4
fiend (demon) 8 8 8 8 8 8 8
fiend (devil shapechanger) 1 1 1 1 1 1 1
fiend (devil) 9 9 9 9 9 9 9
fiend (shapechanger) 1 1 1 1 1 1 1
giant 10 10 10 10 10 10 10
humanoid (any race) 21 21 21 21 21 21 21
humanoid (dwarf) 1 1 1 1 1 1 1
humanoid (elf) 1 1 1 1 1 1 1
humanoid (gnoll) 1 1 1 1 1 1 1
humanoid (gnome) 1 1 1 1 1 1 1
humanoid (goblinoid) 3 3 3 3 3 3 3
humanoid (grimlock) 1 1 1 1 1 1 1
humanoid (human shapechanger) 5 5 5 5 5 5 5
humanoid (human) 1 1 1 1 1 1 1
humanoid (kobold) 1 1 1 1 1 1 1
humanoid (lizardfolk) 1 1 1 1 1 1 1
humanoid (merfolk) 1 1 1 1 1 1 1
humanoid (orc) 1 1 1 1 1 1 1
humanoid (sahuagin) 1 1 1 1 1 1 1
monstrosity 35 35 35 35 35 35 35
monstrosity (shapechanger) 2 2 2 2 2 2 2
monstrosity (titan) 2 2 2 2 2 2 2
ooze 4 4 4 4 4 4 4
plant 6 6 6 6 6 6 6
swarm of tiny beasts 10 10 10 10 10 10 10
undead 17 17 17 17 17 17 17
undead (shapechanger) 1 1 1 1 1 1 1

OK we come to the second update that we need to perform. Scanning that output above, we’re alerted to two entries that potentially overlap: * Fiend (Shapechanger) * fiend (shapechanger)

In fact, examining the split data frame, we isolate the three suspect rows.

split_racealignment_df.loc[split_racealignment_df['Race'].isin(['Fiend (Shapechanger)','fiend (shapechanger)'])]
Name Size HP Armor Speed Challenge rating (XP) Race Alignment
174 Incubus Medium 66 (12d8 + 12) 15 (Natural Armor) 30 ft., fly 60 ft. 4 (1,100 XP) Fiend (Shapechanger) Neutral Evil
268 Succubus Medium 66 (12d8 + 12) 15 (Natural Armor) 30 ft., fly 60 ft. 4 (1,100 XP) Fiend (Shapechanger) Neutral Evil
269 Succubus (Incubus) Medium 66 (12d8+12) 15 (Natural Armor) 30 ft., fly 60 ft. 4 (1,100 XP) fiend (shapechanger) Neutral Evil

It’s clear that index 269 duplicates the information in rows 174 and 268. (In the traditional interpretation of the game, a succcubus is female while an incubus is male. Both monsters are different.) Let’s go ahead and delete row 269.

split_racealignment_df.drop(index=269, inplace=True)

We proceed to the third alteration we want to make to our data frame. The Challenge rating (XP) column holds two distinct pieces of information. Let’s break that into Challenge and XP column, removing the original source column as we go.

def split_challengexp_column(df_orig, inplace=True):
  """
  Splits the 'Challenge rating (XP)' column into 'Challenge' and 'XP'.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Challenge rating (XP)' column split.
  """
  retval_df = df_orig.copy()
  split_columns_df = retval_df['Challenge rating  (XP)'].str.split(r' \(', expand=True)
  retval_df['Challenge'] = split_columns_df[0].astype(str).str.strip()
  retval_df['XP'] = (split_columns_df[1].str.split(' XP\)', expand=True))[0].astype(str).str.strip()
  retval_df.drop(columns=['Challenge rating  (XP)'], inplace=True)
  return retval_df
<>:14: SyntaxWarning: invalid escape sequence '\)'
<>:14: SyntaxWarning: invalid escape sequence '\)'
/tmp/ipykernel_9819/2349129103.py:14: SyntaxWarning: invalid escape sequence '\)'
  retval_df['XP'] = (split_columns_df[1].str.split(' XP\)', expand=True))[0].astype(str).str.strip()
split_challengexp_column_df = split_challengexp_column(split_racealignment_df)
split_challengexp_column_df.head()
Name Size HP Armor Speed Race Alignment Challenge XP
0 Aboleth Large 135 (18d10+36) 17 (Natural Armor) 10 ft., swim 40 ft. aberration Lawful Evil 10 5,900
1 Acolyte Medium 9 (2d8) 10 30 ft. humanoid (any race) Any Alignment 1/4 50
2 Adult Black Dragon Huge 195 (17d12+85) 19 (Natural Armor) 40 ft., fly 80 ft., swim 40 ft. dragon Chaotic Evil 14 11,500
3 Adult Blue Dragon Huge 225 (18d12+108) 19 40 ft., burrow 30 ft., fly 80 ft. dragon Lawful Evil 16 15,000
4 Adult Brass Dragon Huge 172 (15d12+75) 18 (Natural Armor) 40 ft., burrow 40 ft., fly 80 ft. dragon Chaotic Good 13 10,000

The last change we need to make to our data frame is to separate the Armor information into Armor_Class and related notes, called Armor_Info. In the process we create two new columns and remove the original one. Again we call on regular expressions but this time we use a more complex matching pattern: ()(\((.*)\))?

Our regex pattern has three components, or groups, of regular and special characters.

  • () - Matches one or more integers representing the Armor_Class
  • () - Matches zero or more white space.
  • ((.*)\))? - Matches the Armor_Info, a parenthetical sequence made of a series of characters followed by a right parenthesis. For example, “(Leather Armor)”. The surrounding ( )? groups the pattern and indicates that we’re looking for at most a single occurence of this.
def split_armor_column(df_orig):
  """
  Splits the 'Armor' column into 'Armor_Class' and 'Armor_Type'.
  Handles cases where Armor has no type specified.

  Args:
    df_orig: The input DataFrame.

  Returns:
    A new DataFrame with the 'Armor' column split.
  """
  retval_df = df_orig.copy()

  # Initialize new columns
  retval_df['Armor_Class'] = None
  retval_df['Armor_Info'] = None

  for index, row in retval_df.iterrows():
      armor_value = str(row['Armor']).strip()

      # Use regex to split, handling potential missing parentheses
      match = re.match(r'(\d+)(\s*)(\((.*)\))?', armor_value)
      if match:
          retval_df.loc[index, 'Armor_Class'] = match.group(1).strip()
          # Check if there was a type specified in parentheses
          if match.group(3):
            retval_df.loc[index, 'Armor_Info'] = match.group(3).strip().lstrip('(').rstrip(')')
          else:
            # If no type in parentheses, assume Natural Armor
            retval_df.loc[index, 'Armor_Info'] = "Natural Armor"
      else:
          # Handle cases that don't match the expected pattern
          retval_df.loc[index, 'Armor_Class'] = armor_value
          retval_df.loc[index, 'Armor_Info'] = "Unknown" # Or handle as appropriate

  # Drop the original 'Armor' column after processing
  retval_df.drop(columns=['Armor'], inplace=True)
  return retval_df
split_armor_column_df = split_armor_column(split_challengexp_column_df)
split_armor_column_df.head()
Name Size HP Speed Race Alignment Challenge XP Armor_Class Armor_Info
0 Aboleth Large 135 (18d10+36) 10 ft., swim 40 ft. aberration Lawful Evil 10 5,900 17 Natural Armor
1 Acolyte Medium 9 (2d8) 30 ft. humanoid (any race) Any Alignment 1/4 50 10 Natural Armor
2 Adult Black Dragon Huge 195 (17d12+85) 40 ft., fly 80 ft., swim 40 ft. dragon Chaotic Evil 14 11,500 19 Natural Armor
3 Adult Blue Dragon Huge 225 (18d12+108) 40 ft., burrow 30 ft., fly 80 ft. dragon Lawful Evil 16 15,000 19 Natural Armor
4 Adult Brass Dragon Huge 172 (15d12+75) 40 ft., burrow 40 ft., fly 80 ft. dragon Chaotic Good 13 10,000 18 Natural Armor

We’ve reached a good checkpoint in our work. Before we draw things to a close, let’s store the cleaned-up data to a CSV file. As we don’t need the row headers – the index – we strip this away with the index=False option before writing the data frame to the file.

split_armor_column_df.to_csv('sample_data/DnD5e_monsters_cleaned.csv', index=False)