Ngram Diversity

The Chronicle of Higher Education, the New York Times, the Harvard Gazette, and Science have all reported on the news from Google. The company has made publicly available a database of all the words (in context) in the 5.2-million books it has digitized. A team of researchers, led by Erez Lieberman Aidem and Jean-Baptiste Michael, both Harvard fellows, helped assemble the data and are publishing a paper in Science that demonstrates how it can be used to illuminate cultural history.

One especially inviting part of the project is Google’s public release of its “Books Ngram Viewer,” which allows the user to plug in words and generate a graph of their frequency in print since 1500. Because the software allows you to search multiple terms at once, you can produce  graphs of the contrasting frequency of related words. Since 1500, “rights” has been on the rise, while “honor” has waned.

Graph these case-sensitive comma-separated phrases:



from the corpus

“Fame” peaked in the first half of 18th century, “fun” in the second; “knowledge” has always outstripped “intelligence,” except during the final decade of the 16th century—Shakespeare’s most productive years.

The Harvard scholars who have helped to create this marvelous tool foresee a new field of “culturomics” that will infuse quantitative research opportunities with the humanities. It is too soon to assess that claim, but not too soon to discover a playground of words and ideas.

When I was writing Diversity: The Invention of a Concept in 2001 and 2002, it was heavy labor trying to find the early English uses of the word and what it meant. “Diversity” was originally a way of describing political unrest and civil strife. Though it gradually moved towards a blander sense of mere variety, it long retained a pejorative tone. But it was a minor word in the English vocabulary until the early 19th century, when it was picked up by biologists among others as a way to characterize the burgeoning knowledge of the natural world. Darwin eventually gave “diversity” its real heft as key to his theory of natural selection.

Those were hard-won observations in 2002. They are a lot easier today. For most of its history as a word in English, “diversity” described unwelcome realities. In 1669, one author referred to the “diversity of factions,” and in 1677, another wrote of “the misery arising to Men from Diversity of Religions.” But the same century presented other authors taking positive note of “the “diversity of gifts” (1687) and “diversity of delights” (1671). The word diversity in these latter cases, however, still signals to the authors something to be explained. How and why does God give us diverse gifts? Why we do delight in such different things?

Linguistically, we are a very long way from the slogans that festoon contemporary New York City, “Diversity is our strength.” According to Google, we began to “celebrate diversity” (in print anyway) only in 1993. “Diversity,” of course, has become a name for an unalloyed good in most of American higher education. It doesn’t hurt to be reminded that the term once marked out explicitly what is now only implicit. “Diversity” is still the realm of competing interests, group resentments, and difficult-to-resolve and perhaps unresolvable tensions. Celebrating it may help us ignore these aspects, but they really don’t go away just because we have reversed the emotional polarity of the word.

Before long, Google’s new tool may be as taken for granted as its extraordinary Google Earth is now. Dive in and discover it while it is fresh. The instant graphs are a delight, regardless of what comes from “culturomics.”

Return to Top