Grad Research: Data-mining Jane Austen

Before there was Mr. Big, there was Mr. Darcy.

Tall and handsome, wealthy and witty, aloof and arrogant…Fitzwilliam Darcy made a poor first impression in Jane Austen’s Pride and Prejudice before winning the affections of protagonist Elizabeth Bennet with his gentlemanliness and kindness. Elizabeth marries him for love, and preserves the family fortune in the process.

The End.

Except not really. Austen’s oeuvre has persisted, thriving through the age of aristocratic suitors, modern dating, and into the era of Tinder.

And Carleton’s Jenna Herdman’s research illustrates this. The second year English PhD has used Google NGram Viewer to track Austen mentions. It scours Google Books, and shows that Austen’s work is mentioned more often as time passes – particularly Pride and Prejudice, on a steady upward trend since about 1990.

Herdman also uses text-mining and distant reading to create data visualizations of the content of Austen’s novels. It helps students understand how they’ve structured and the techniques she’s engaging have helped help academics critique entire bodies of literature and move beyond the literary canon.

There may have been as many as 60,000 novels published in 19th Century England. Reading one per day, it would take more than 160 years to read them all. But by aggregating data on grammar and language, it’s possible to recognize patterns within the full body of work.

“Thematically, Austen novels generally focus on a female protagonist and a marriage plot,” Herdman says. “The heroine surmounts the financial difficulty of her inheritance position by settling into a marriage that, conveniently, fulfills a domestic ideal of having both romantic love and economic security. The heroine rejects the ‘wrong’ choice of husband – often defined by sexual attractiveness, but which will lead to a ruinous union – in favour of the ‘right’ choice.”

In addition to Pride and Prejudice, Herdman used Voyant – a web-based text analysis tool — to analyze patterns in the romantic rivalries in Sense and Sensibility, Northanger Abbey and Mansfield Park.

Dividing each book into 10 segments, Herdman identifies the number of mentions of each romantic rival. The resulting graph shows Darcy, the romantic hero, fluctuating alongside Wickham, who falsely accused Darcy of denying him a lucrative post before the romantic hero ultimately wins the protagonist’s heart, and dominates mentions in the novel’s conclusion.

Read the full story on the Faculty of Graduate Studies and Postdoctoral Affairs page.