Literature by the Numbers

Data journalist Ben Blatt takes his a mathematical approach to the writers of fiction.

Jessica Gross | Longreads | March 2017 | 12 minutes (2,982 words)


If you’ve ever taken a writing class—or enrolled in high school English—you’ve probably been advised to use fewer adverbs. But does a glut of adverbs really degrade writing? Moreover, do the writers who’ve given this advice even follow it?

This is just the opening gambit of data journalist Ben Blatt’s deep dive into the mathematics of literature. In his new book, Nabokov’s Favorite Word Is Mauve: What the Numbers Reveal About the Classics, Bestsellers, and Our Own Writing, Blatt examines the stylistic fingerprints of writers (which follow them even when they write under pen names in different genres), whether Americans are “louder” than Brits in their writing, the differences between how men and women write, whether books are getting simpler (yup), and many other curiosities.

Blatt has a penchant for numbers. In his first book, I Don’t Care if We Never Get Back (co-written with his friend Eric Brewster), Blatt mathematically engineers the ideal baseball road trip. In this new book, he makes a convincing case that words aren’t any less suited for mathematical analysis than baseball is—and that data can actually help us see and appreciate rule-breaking that really works. We spoke by phone about why he’s drawn to treating art as data, as well as some of his most compelling findings.

* * *

I’m not sure if you chose the title Nabokov’s Favorite Word Is Mauve or if your publisher did—but if it was you, I wondered if you could walk me through that choice. Was that finding the most delightful to you?

So, the title was a collaboration between me and the publisher. But what we were going for was, the book covers a lot. It covers the reading level of New York Times Best Sellers, the adverb use of your classic authors, the difference in how men and women write, book cover design—and with this title, we were going for a bit of intrigue, and a bit of the possibilities of combining numbers and writing, or science and art. And yes, the specific finding about Nabokov was very exciting when I stumbled across it.

In an interview, Ray Bradbury had said his favorite word was “cinnamon.” If you look at the numbers, he actually does use the word “cinnamon” at a high rate. And his reasoning for liking cinnamon was that it reminded him of his grandmother’s pantry. If you look at a bunch of other words that relate to pantries, spices and smells, he also uses those at an extremely high rate. So I repeated that experiment on a hundred other authors, not knowing what to expect or if anything would come up.

For Nabokov, I found that his favorite word was “mauve,” and that struck me as a bit curious. And then I remembered, and found in some further reading, that he had synesthesia. He wrote in his autobiography about how when he would write a certain sound or letters, he would visualize, automatically, that color in his head. And mauve was one of them. I thought this was a nice way of showing that there’s not an opposition between the numbers and the words. This is probably what he would say his favorite word was anyway, but the numbers do back it up.

I was so delighted by the charts you created of authors’ frequently used words, compared to other writers’ usages. You break them down into “cinnamon words” and “nod words.” Can you walk us through the difference?

I was trying to come up with an unbiased, objective ranking of what an author’s favorite words were, and there’s no one way to do this. So I drew on two examples from literature. One was Ray Bradbury. “Cinnamon” is a word that’s used very rarely among writers. It’s not an obscure word, but it’s not something that comes up all the time. But Ray Bradbury used “cinnamon” at a very high rate, considering it’s a very rare word. So for that, I determined that it had to be a word that an author uses in at least half their books and they use once per 100,000 words.

For “nod words,” I was looking at Michael Connelly. In an interview, he’d said his favorite word was “nodded.” That’s a pretty common word in novels with dialogue and action, but he uses it at an obscene rate—upwards of a hundred times per book, sometimes three or four or five times on a single page. So a “nod word” is just a word that comes up a lot. It could be “nodded,” it could be “felt,” it could be “sir.” These are words that are in every book that an author wrote and that they use at least 100 times per 100,000 words. With the charts I made of authors’ cinnamon and nod words, I think a lot of times if you covered up the name, you could almost guess who the author was just by looking at the words.

Really? Like what?

Well, if someone told you an author’s cinnamon words were “civility,” “fancying” and “imprudence,” Jane Austen would be one of your top guesses. Some of them are more genre-related, like Agatha Christie’s words: “inquest,” “alibi” and “frightful.”

Right, that makes sense—someone probably could have guessed that “dwarfs,” “witch,” and “lion” were C. S. Lewis’s words.


As you mentioned, you drilled down in so many different directions in this book. Can you talk me through the process of narrowing down which questions you wanted to pursue in your research?

So the general premise of the book was applying data analysis to literature, because I believe that hasn’t been done to the extent that I would’ve liked. My central thesis, lingering in the background of everything I looked at, is that even though we’re talking about writing and art, data and objective analysis can be useful in understanding these novels, even from the writers’ point of view. That being said, once I had this thesis in mind, I just came up with a bunch of things to look at that are on the forefront of what authors and writers and editors are already talking about.

The first chapter of the book is all about adverb use, and -ly adverbs in particular. That’s one example that will hopefully draw the reader in: you take any writing class or read any book on writing and they probably say, be concise, don’t use -ly adverbs. So I wanted to see, is there actually anything to this? If you saw the numbers, would they back up this advice? And then I wanted to really look at a range both of what makes a book critically popular and remembered.

I was almost disappointed to find that the adverb advice held up so well. It feels almost like a truism, and I kind of wanted there to be a twist. But as you found, it turns out that even within a certain author’s works, their least adverb-y books are their most popular.

Right. And I will point out that in the book I talk about -ly adverb rates particularly, and the premise is that Stephen King, in On Writing, says not to use them. Toni Morrison, Chuck Palahniuk, and many other authors have talked about not using -ly adverbs, too. And I wanted to know both if the people giving the advice followed it, and then if the great authors, especially those known to be concise, follow it.

Everyone talks about Ernest Hemingway as a very direct, concise writer. As it turns out, he does use very few adverbs. One takeaway is that in general, the numbers show that fewer -ly adverbs do translate to a book that lasts longer. But I also discuss, is it actually the -ly adverbs? If you took a Hemingway book that was not as popular and just took out all the -ly adverbs, I don’t think it would become A Farewell To Arms. I think it’s a symptom of kind of direct, interesting writing that focuses on ideas.

Why is it so appealing to you to quantify art? Art is unruly, and I wondered if that was part of it—categorizing or taming this unruly thing.

Yeah. It definitely is a concern of mine, and I’m not trying to create a perfect set of rules to create a novel. I think the appeal of this is that novels are unruly in many ways, and writing is unruly in many ways. But because there are these guidelines that everyone follows, you almost forget that there actually is a set structure.

Most novels are within a general range of length, they have a story with a beginning, middle and end and a climax, and they follow a lot of structure. Writing is, except for maybe a few experimental novels, organized within sentences, within paragraphs, within chapters. So I think there is a lot of inherent structure within a novel to begin with that you kind of take for granted. And the stuff that I was looking at was the trends that come through as a result of the structure that writers are already confined by.

The book is as much about finding trends in popular authors versus amateur writers, or Best Sellers versus classics, as it is about seeing the enormous range. There definitely are some authors—even in the -ly adverbs chapter—who are off the charts in the other direction. Even though they are anomalies, and I think it’s important to know what everyone else is doing versus the anomalies, there are some people who benefit from breaking the rules.

Which is so comforting. I want to go through some of my favorites among your findings, but before I do, did anything really surprise you, or could you tell me the few that you were most interested in?

I have a chapter on the differences between how male and female novelists write. I took in the ratio of the pronoun “he” versus the pronoun “she” in 100 books that different library associations and such have ranked the hundred best of all time. I chose 50 books by males and 50 books by females, and I wanted to know, are these classics generally 50/50 male/female, or are they, say, 10/90 in their pronoun rates? I found a definite skew: of female novelists of these classic books, 29 out of the 50 used “she” more than “he.” That’s a pretty modest split, close to 50/50. But in the classic books by men, 44 out of 50 used “he” more than “she.” And these are books that, obviously, are taught throughout high school and college as the classics. And they’re really extremely skewed male.

There are plenty of books that are 80 or 100 percent skewed male. And I don’t mean to say that a book can’t be all male and be even-handed. Obviously there are some books, like The Old Man and the Sea—that’s about one man on a boat, so it’s a bit hard to incorporate different characters and different pronouns. But I still was taken aback by how extreme the ratio was was.

You found some other big gender differences, too. Much more frequently, female characters are assigned the verbs “shivered,” “wept,” “murmured,” “screamed” and “married.” Male characters “muttered,” “grinned,” “shouted,” “chuckled” and “killed”! [Laughs] Wow. So “chuckled”—I sort of feel that no one should write “chuckled” and was surprised it even came up. “Grinned” surprised me in a different way; I’d never thought of that as a particularly male verb. All in all, I was fascinated by that finding.

Right. And there was another example—with women writers, there’s not a huge difference between their male characters and female characters having a situation where they interrupt. But in books written by males, in classic literature and Best Seller literature, the most common time that a variation on “interrupt” comes up is when they’re describing a female.

[Laughs] Wow. Okay. I also really wanted to talk about your finding that books have indeed become simpler—there are truly more guilty pleasures today than there used to be. That confirms a lot of people’s suspicions. But you wrote that you don’t really see this as a problem.

So I took every single number-one New York Times Best Seller from 1960 to 2014 and applied a formula that’s used to calculate reading levels. I wanted to know, has there been any change over time? The Flesch-Kincaid formula just has two components. One is sentence length, and one is the number of syllables in a word. Each one gets a value and then those are added together to get the baseline reading level. So in the 1960s, the formula would suggest the number-one Best Sellers had a reading level of eight, which means you would have to have graduated eighth grade to be able to understand them.

In 2000 and 2010, the reading level was sixth grade. And if you look at the chart, not only does this trend continue over 50 years, but every 10 years it took one step down as the writing got simpler and simpler. Anything that was in the bottom 25 percent in the 1960s would actually have been the most complicated book in terms of sentence length and syllable length in the last four years. Sentences have got shorter and more direct and the number of syllables in the words has declined, too.

But I think there is a difference between the directness and simplicity of writing and kind of the thoughts that are expressed in it. Books like The Grapes of Wrath, The Sun Also Rises, and To Kill a Mockingbird all have reading levels of six and under. These are classic books that had a major breakthrough that are great reads, and that are simple. A lot of it comes down to style and audience. So it depends on what the goal for writing is. If your goal is to communicate interesting ideas and thoughts, I don’t think that this is too worrisome. If your idea of writing is that you want carefully woven sentences with a kind of poetry, then it may be a bit alarming. But I do think even within a simply worded book, you can still express every idea you’re trying to aim for.

You also found that in line with perhaps some people’s suspicions, including mine, James Patterson is the most clichéd writer. He uses many more clichés than any other popular writer you looked at. Do you hope that he’d read your book or discover your finding, and if so, what would you hope his response would be?

I discuss James Patterson quite a lot in this book just because he does come out at these extremes. Frankly, I would be honored if he read the book. And I think his takeaway would probably be not to take any of the cliché or reading-level findings as cautionary findings. Instead, maybe he would have the attitude of, if this is where writing is going, and this is what people expect to read, then there’s really no reason to hold back. Because at least over the last decade or so, he’s the most-sold author in the United States. I think the things he relies on and does without shame are in some ways very successful for him.

Good point. He’s done totally fine.

I think if he looked at the cliché chart, his response would probably be, “Why is everyone else not using this technique?”

[Laughs] Touché. You also found that once a writer has a hit, their subsequent books tend to get longer and longer. Why do you think that is?

I looked at 25 different writers since 1980 whose first novel was a finalist or winner of the Pulitzer, the National Book Award, or the NBCC Award. With about 72 percent of these, the author comes back with their second book and it’s longer. And then 44 percent come back with a book that’s much longer. So there’s definitely a bottleneck where if you’re a first-time author, an editor may not want to read a huge book that you’ve written; maybe it’s easier to get a smaller book published. But then you look at an author like Amy Tan—The Joy Luck Club was such a hit, and it was about 95,000 words. Since then, she’s written books that have 150,000 or 200,000 words, and it’s hard to repeat the success that she had, but still, nothing she’s written since has been as relevant. Even Harry Potter—by books four through seven, they’re two or three times as long as that first book.

I do conjecture that some of this is differences between how the author sees their own work and how the common person sees their own work: a much smaller book is easy to read. But if you’re Harry Potter, or Twilight, even, you know that by the time you’re writing the last few books of the series, there is a huge fan base that wants as much as possible, so there is less pressure to make it extremely tight. Instead, you can explore every avenue that you want.

I loved your finding that authors’ names take up a greater percentage of the cover of a book the more famous they get, which I’d never noticed before.

You get to a point where the author’s name is kind of a brand within itself. And it makes sense: you have to think, what is going to get people’s interest enough to get them to open the book? For some authors, like Nora Roberts, her name is consistently the selling point. This is my second book, the first one that I’ve written solo. And having my name, giant, on the cover would not really draw anyone in. There’s a reason that Nabokov’s name is bigger than mine.

* * *

Jessica Gross is a writer based in New York City.