I have been a fan of Agatha Christie’s books for many years
now.
‘And there were None’ was my first Agatha Christie
Book, and I have never preferred any detective fiction except hers to date.
Hercule Poirot’s narcissistic but caring demeanour and
Miss Marple’s inconspicuous but shrewd nature are so well put on
paper, that the characters just feel very real.
I am an aspiring data analyst. This is my first project entirely
based on R language.
I wanted to take up a fun topic to work on,
and I thought, “What better than our dear old Poirot”.
So here it
is……
The dataset is from: https://github.com/JamesJackson1/AgathaChristie
The
dataset needed some rearrangements to derive meaningful
insights. I have used Power Query to unpivot certain columns, converting
the table from wide to long format, and added filters.
The dataset is converted to the following format:
df <- read.csv('D:/agatha/AgathaChristie/analysis/hercule.csv')
knitr::kable(head(df), format = "markdown")
Book | Name | Gender | Relative | NumberofMurders | England | Profession | Method | Number_Method |
---|---|---|---|---|---|---|---|---|
4:50 from Paddington | Dr Quimper | 1 | 1 | 3 | 1 | Medical | Poisoned | 2 |
4:50 from Paddington | Dr Quimper | 1 | 1 | 3 | 1 | Medical | Strangulation | 1 |
A Caribbean Mystery | Tim Kendal | 1 | 0 | 3 | 0 | Business | Poisoned | 1 |
A Caribbean Mystery | Tim Kendal | 1 | 0 | 3 | 0 | Business | Stabbed | 1 |
A Caribbean Mystery | Tim Kendal | 1 | 0 | 3 | 0 | Business | Other | 1 |
After the Funeral | Miss Gilchrist | 0 | 0 | 1 | 1 | Servant | BluntInstrument | 1 |
There are many stories where the murderer is a relative of the
victim. I wanted to analyse the thought process of Christie’s stories.
From the analysis, it was clear that Christie considered the male
relatives to be more dangerous than the female ones😂
df1 <- filter(df, Gender == 1 & Relative == 1)
df2 <- filter(df, Relative == 1 & Gender == 0)
relative_gender <- data.frame(Gender = c('Male','Female'), Number_of_Murders = c(nrow(df1),nrow(df2)))
ggplot(data = relative_gender, aes(Gender,Number_of_Murders)) + geom_bar(stat="identity", fill = '#a0fa4b', width = 0.6) + theme_classic() + ylab("Number of Murders") + geom_text(aes(label=Number_of_Murders), vjust=-0.3, size=3.5)
There are books with multiple murders, like 'The A.B.C Murders' where to kill one person, Franklin Clarke kills 4.
But there are also some joint ventures, where multiple people like
Anne Protheroe and Lawrence Redding come together with a common goal to
kill one person.
I wanted to check the maximum number of murders and the maximum number of murderers that have been written so far.
The book with maximum murders is very conveniently named: Murder Is Easy 😂
Books_with_most_murders <- df %>% group_by(Book) %>% summarise(Number_of_Murders = max(NumberofMurders))
Books_with_most_murders <- Books_with_most_murders[order(Books_with_most_murders$Number_of_Murders, decreasing = TRUE),]
knitr::kable(head(Books_with_most_murders), format = "markdown")
Book | Number of Murders |
---|---|
Murder Is Easy | 7 |
The A.B.C. Murders | 4 |
4:50 from Paddington | 3 |
A Caribbean Mystery | 3 |
Cat Among the Pigeons | 3 |
Dead Man’s Folly | 3 |
Books_with_most_murderers <- df %>% group_by(Book) %>% summarise(Murderers = n_distinct(Name))
Books_with_most_murderers<- Books_with_most_murderers[order(Books_with_most_murderers$Murderers, decreasing = TRUE),]
knitr::kable(head(Books_with_most_murderers), format = "markdown")
Book | Murderers |
---|---|
Murder on the Orient Express | 11 |
The Clocks | 3 |
Cat Among the Pigeons | 2 |
Endless Night | 2 |
Hallowe’en Party | 2 |
The Murder at the Vicarage | 2 |
The maximum stories I have read so far are written against the backdrop of London or some other cities in England. So, I checked if it was the case, that there were more England-based books, or do I need to read more to check out the other ones.
df4 <- df %>% group_by(Book) %>% summarise(
England_yes = max(England),
.groups = 'drop'
)
data1 <- data.frame(Country = c('England', 'Other than England') , Count_of_Books = c(sum(df4$England_yes), nrow(filter(df4, England_yes == 0))))
ggplot(data = data1, aes( x="", y=Count_of_Books, fill=Country)) + geom_bar(stat="identity") + coord_polar("y", start=0)+scale_fill_brewer(palette="Blues")+ theme_minimal()+
theme(
axis.title.x = element_blank(),
axis.title.y = element_blank(),
panel.border = element_blank(),
axis.text = element_blank(),
axis.ticks = element_blank(),
panel.grid = element_blank(),
plot.title=element_text(size=14, face="bold")
) + geom_text(aes(label=Count_of_Books), position = position_stack(vjust = 0.5), size=4.5)
In order to make any further analysis I have pre-processed the data a little.
data <- df %>% group_by(Name, Book) %>% summarise(
Gender = max(Gender))
data2 <-as.data.frame(table(data$Book))
data3 <-data2 %>% filter(data2$Freq>1)
data3 <- data3$books
data3 <- df %>% filter(! Book %in% data3)
There are many methods of killing depicted throughout the books of
Agatha Christie. But I wanted to see if she predominantly associates any
type of method with gender.
From the analysis, it was clear that she found men to be more capable of strangulating or hitting with a blunt instrument, but women would prefer a gun to shoot.
d <- data3 %>% group_by(Name, Method) %>% summarise(Gender = max(Gender))
method_gender <- d %>% group_by(Method, Gender) %>% summarise(Murders = n())
method_gender$Gender <- replace(method_gender$Gender, method_gender$Gender==1, 'Male')
method_gender$Gender <- replace(method_gender$Gender, method_gender$Gender==0, 'Female')
level_order <- c('Poisoned','Strangulation','Stabbed','Shot','BluntInstrument','Other')
ggplot(method_gender, aes(x = Method, y = Murders, fill = Gender)) + geom_bar(stat="identity", position="dodge") + scale_x_discrete(limits = level_order)
I always find the characters in Agatha Christie’s book’s to be very
lifelike.
The thought process of the characters is very well aligned with
their personalities depicted in the book. To check if the way
of murder is also dependent on that character's personality, I compared
the murderer's method of killing with his/her profession.
The outcome was very convincing with the medical professionals using poison the most to kill. But I could see the Aristocrats and Wealthy people using poison too. I guess it is because poison is expensive after all. 😂
prof_method <- data3 %>% group_by(Profession,Method) %>% summarise(Number_of_Murders = sum(Number_Method)) %>% filter(!Profession %in% list('Child','CivilServant'))
ggplot(data = prof_method, aes(Method,Number_of_Murders, fill = Method)) + geom_bar(stat="identity", width = 0.6) + theme_classic() +theme(axis.text.x=element_blank())+ ylab("Number of Murders") + geom_text(aes(label=Number_of_Murders), vjust=-0.3, size=3.5) + facet_wrap(~Profession,nrow = 6, ncol = 3,strip.position="bottom")
The employment types of the characters have always been very diverse.
But the number of people working for the armed forces, police,
solicitors, etc, have predominantly been men. More women are seen to be working for low-paid jobs and as household helps.
But this was the real scenario of the 20th century and is very well depicted in her books.
d <- data3 %>% group_by(Name, Profession) %>%summarise(Gender = max(Gender))
prof_gender <- d %>% group_by(Profession, Gender) %>% summarise(Murders = n())
prof_gender$Gender <- replace(prof_gender$Gender, prof_gender$Gender==1, 'Male')
prof_gender$Gender <- replace(prof_gender$Gender, prof_gender$Gender==0, 'Female')
ggplot(prof_gender, aes(x = Profession, y = Murders, fill = Gender)) + geom_bar(stat="identity", position="dodge") + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
That was all. Thank you for viewing my first venture in R
programming!!!
I have put my little grey cells to use and performed this analysis, I hope you enjoyed reading it!😄