“It is the little grey cells Mon ami, on which one must rely.”


I have been a fan of Agatha Christie’s books for many years now.
‘And there were None’ was my first Agatha Christie Book, and I have never preferred any detective fiction except hers to date.
Hercule Poirot’s narcissistic but caring demeanour and Miss Marple’s inconspicuous but shrewd nature are so well put on paper, that the characters just feel very real.

I am an aspiring data analyst. This is my first project entirely based on R language.
I wanted to take up a fun topic to work on, and I thought, “What better than our dear old Poirot”.
So here it is……



The dataset is from: https://github.com/JamesJackson1/AgathaChristie
The dataset needed some rearrangements to derive meaningful insights. I have used Power Query to unpivot certain columns, converting the table from wide to long format, and added filters.

The dataset is converted to the following format:

df <- read.csv('D:/agatha/AgathaChristie/analysis/hercule.csv')
knitr::kable(head(df), format = "markdown")
Book Name Gender Relative NumberofMurders England Profession Method Number_Method
4:50 from Paddington Dr Quimper 1 1 3 1 Medical Poisoned 2
4:50 from Paddington Dr Quimper 1 1 3 1 Medical Strangulation 1
A Caribbean Mystery Tim Kendal 1 0 3 0 Business Poisoned 1
A Caribbean Mystery Tim Kendal 1 0 3 0 Business Stabbed 1
A Caribbean Mystery Tim Kendal 1 0 3 0 Business Other 1
After the Funeral Miss Gilchrist 0 0 1 1 Servant BluntInstrument 1


There are many stories where the murderer is a relative of the victim. I wanted to analyse the thought process of Christie’s stories. From the analysis, it was clear that Christie considered the male relatives to be more dangerous than the female ones😂


df1 <- filter(df, Gender == 1 & Relative == 1)
df2 <- filter(df, Relative == 1 & Gender == 0)
relative_gender <- data.frame(Gender = c('Male','Female'), Number_of_Murders = c(nrow(df1),nrow(df2)))
ggplot(data = relative_gender, aes(Gender,Number_of_Murders)) + geom_bar(stat="identity", fill = '#a0fa4b', width = 0.6) + theme_classic() + ylab("Number of Murders") +  geom_text(aes(label=Number_of_Murders), vjust=-0.3, size=3.5)


There are books with multiple murders, like 'The A.B.C Murders' where to kill one person, Franklin Clarke kills 4.

But there are also some joint ventures, where multiple people like Anne Protheroe and Lawrence Redding come together with a common goal to kill one person.

I wanted to check the maximum number of murders and the maximum number of murderers that have been written so far.

The book with maximum murders is very conveniently named: Murder Is Easy 😂


Books_with_most_murders <- df %>% group_by(Book) %>% summarise(Number_of_Murders = max(NumberofMurders))
Books_with_most_murders <- Books_with_most_murders[order(Books_with_most_murders$Number_of_Murders, decreasing = TRUE),]
knitr::kable(head(Books_with_most_murders), format = "markdown")
Book Number of Murders
Murder Is Easy 7
The A.B.C. Murders 4
4:50 from Paddington 3
A Caribbean Mystery 3
Cat Among the Pigeons 3
Dead Man’s Folly 3
Books_with_most_murderers <- df %>% group_by(Book) %>% summarise(Murderers = n_distinct(Name))
Books_with_most_murderers<- Books_with_most_murderers[order(Books_with_most_murderers$Murderers, decreasing = TRUE),]
knitr::kable(head(Books_with_most_murderers), format = "markdown")
Book Murderers
Murder on the Orient Express 11
The Clocks 3
Cat Among the Pigeons 2
Endless Night 2
Hallowe’en Party 2
The Murder at the Vicarage 2



The maximum stories I have read so far are written against the backdrop of London or some other cities in England. So, I checked if it was the case, that there were more England-based books, or do I need to read more to check out the other ones.


df4 <- df %>% group_by(Book) %>% summarise(
  England_yes = max(England),
  .groups = 'drop'
)

data1 <- data.frame(Country = c('England', 'Other than England') , Count_of_Books = c(sum(df4$England_yes), nrow(filter(df4, England_yes == 0))))
ggplot(data = data1, aes( x="", y=Count_of_Books, fill=Country)) + geom_bar(stat="identity") + coord_polar("y", start=0)+scale_fill_brewer(palette="Blues")+ theme_minimal()+
  theme(
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    panel.border = element_blank(),
    axis.text = element_blank(),
    axis.ticks = element_blank(),
    panel.grid  = element_blank(),
    plot.title=element_text(size=14, face="bold")
  ) +  geom_text(aes(label=Count_of_Books), position = position_stack(vjust = 0.5), size=4.5)

In order to make any further analysis I have pre-processed the data a little.


data <- df %>% group_by(Name, Book) %>% summarise(
  Gender = max(Gender))

data2 <-as.data.frame(table(data$Book))
data3 <-data2 %>% filter(data2$Freq>1)
data3 <- data3$books

data3 <- df %>% filter(! Book %in% data3)



There are many methods of killing depicted throughout the books of Agatha Christie. But I wanted to see if she predominantly associates any type of method with gender.

From the analysis, it was clear that she found men to be more capable of strangulating or hitting with a blunt instrument, but women would prefer a gun to shoot.


d <- data3 %>% group_by(Name, Method) %>% summarise(Gender = max(Gender))
method_gender <- d %>% group_by(Method, Gender) %>% summarise(Murders = n())
method_gender$Gender <- replace(method_gender$Gender, method_gender$Gender==1, 'Male') 
method_gender$Gender <- replace(method_gender$Gender, method_gender$Gender==0, 'Female') 
level_order <- c('Poisoned','Strangulation','Stabbed','Shot','BluntInstrument','Other')
ggplot(method_gender, aes(x = Method, y = Murders, fill = Gender)) + geom_bar(stat="identity", position="dodge") + scale_x_discrete(limits = level_order) 



I always find the characters in Agatha Christie’s book’s to be very lifelike.

The thought process of the characters is very well aligned with their personalities depicted in the book. To check if the way of murder is also dependent on that character's personality, I compared the murderer's method of killing with his/her profession.

The outcome was very convincing with the medical professionals using poison the most to kill. But I could see the Aristocrats and Wealthy people using poison too. I guess it is because poison is expensive after all. 😂


prof_method <- data3 %>% group_by(Profession,Method) %>% summarise(Number_of_Murders = sum(Number_Method)) %>% filter(!Profession %in% list('Child','CivilServant'))

ggplot(data = prof_method, aes(Method,Number_of_Murders, fill = Method)) + geom_bar(stat="identity", width = 0.6) + theme_classic() +theme(axis.text.x=element_blank())+ ylab("Number of Murders") +  geom_text(aes(label=Number_of_Murders), vjust=-0.3, size=3.5) + facet_wrap(~Profession,nrow = 6, ncol = 3,strip.position="bottom")



The employment types of the characters have always been very diverse.

But the number of people working for the armed forces, police, solicitors, etc, have predominantly been men. More women are seen to be working for low-paid jobs and as household helps.

But this was the real scenario of the 20th century and is very well depicted in her books.


d <- data3 %>% group_by(Name, Profession) %>%summarise(Gender = max(Gender)) 
prof_gender <- d %>% group_by(Profession, Gender) %>% summarise(Murders = n())
prof_gender$Gender <- replace(prof_gender$Gender, prof_gender$Gender==1, 'Male') 
prof_gender$Gender <- replace(prof_gender$Gender, prof_gender$Gender==0, 'Female') 

ggplot(prof_gender, aes(x = Profession, y = Murders, fill = Gender)) + geom_bar(stat="identity", position="dodge") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))



That was all. Thank you for viewing my first venture in R programming!!!

I have put my little grey cells to use and performed this analysis, I hope you enjoyed reading it!😄