Crunching Words in Great Number

June 03, 2010

In the June 4 issue, The Chronicle published an article on what Google Books could mean for researchers. We asked some leading scholars to comment on how "big data" will change the humanities. Here are their responses:

'We fool ourselves when we pretend that Google did this for us. Google is not a library.'

There are few recent developments in the U.S. academy more exciting than the rise of digital humanities. Hundreds of talented, bold scholars are unpacking the raw materials of traditional humanities in new and exciting ways. Universities, foundations, and federal funding agencies have recently realized the great potential of digital humanities scholarship. A bit farther behind, more traditional colleagues are just now beginning to consider ways to judge and support digital projects. We have a long way to go. But the signs are all positive that digital tools are poised to energize and promote academic work in ways beyond our imaginations.

But all this energy and excitement could cause us to stumble in our rush to do cool stuff too fast. The reliance on the research corpus generated by Google Book Search is one such hazard.

To do any sort of data- or computer-based analysis of any phenomenon, one should ensure that the research subject is uncontaminated, of high quality, and fairly comprehensive. Ideally, digital humanities projects should be exploiting a set of collections expressly designed for research, in open formats, selected and vetted by scholars themselves, and maintained in an archival system with projected viability and utility that would last well into the next century. Google Book Search is none of these things.

Google Book Search -- like everything Google does -- is amazing and useful. It's also -- like everything that Google does -- designed to benefit Google. That's the way it should be. That's what we should expect.

But we fool ourselves when we pretend that Google did this for us. Google is not a library. It is not run for scholars and by librarians. It is a big, important business that considers risk and reward in every decision. The proposed settlement over the massive copyright infringements committed by Google on behalf of universities just further demonstrates that Google is now on its way to becoming one of the world's largest bookstores.

It's a happy accident for all of us that Google is so rich today that it can throw so much money at projects that do benefit scholars. But nothing is free. And nothing that seems free is worth depending on for the long term.

Since 2004 a collection of universities, including my own, began donating many of millions of dollars of their rare collections of riches to one of the wealthiest companies in the world. This certainly stands as one of the most absurd cases of corporate welfare that universities have ever been involved in. If we can manage to turn the research corpus into some outstanding scholarly work, then we can all give each other high-fives. But if that happens, it will be because of the work and imagination of a brilliant collection of scholars. And we can only imagine what such a group could do with a collection that was actually designed by librarians for scholars.

Siva Vaidhyanathan
Associate Professor of Media Studies and Law, University of Virginia
Author of the forthcoming book The Googlization of Everything


'No humanities research ... has ever been done "alone"'

As to digital technology and its relation to humanities scholarship and education, I want to say: let a thousand flowers bloom. Data mining may prove a useful device in our longstanding effort to understand our cultural inheritance.

But please, let's not go there forgetting what scholarship and education have always involved. I find it hard to believe that "Mr. Moretti and his colleagues" would agree that "One lesson they've learned is you can't do this humanities research the old way: like a monk, alone." No humanities research and scholarship has ever been done "alone," as a glance at the footnotes and bibliographies that typically come with humanities research publications pretty clearly shows. (And as to that figure of the solitary monk, one might usefully recall the fundamentally collaborative nature of monastic orders and their individual communities. Those astonishing products of the great scriptoria were not only the works of individual genius.)

And another thing. If "computational methods" for studying literary and cultural work hold out a certain scholarly promise, and they do, the minute particulars of the objects—their social and material character—remain indispensable. Imagine what we would have forgotten about ourselves, what we would never be able to know, if our books were gone and we had only digital simulations.

And finally: if Google Books has "changed the landscape" of our scholarly perception, and it has, perhaps its greatest legacy will be the spur it gave to the educational community to "do it right"—to create a virtual depository of the kind Robert Darnton and others like him have been pleading for: a virtual collection of our cultural heritage that actually meets the needs of scholarship and public education. At least we can hope that will be its great legacy.

Jerome McGann
Professor of English
University of Virginia


'How can we pretend to be surprised?'

Oh come on—one could have predicted a recurrence of the (still) false opposition between quantitative and interpretive methods known during the 60s as "the structuralist controversy." New Historicism has grown old; it has settled into a senescence lacking both interpretive ingenuity and archival depth and freshness. How, then, can we pretend to be surprised at the apparently "meteoric" success of a method that feeds both the fetishism of the archive fostered by New Historicism and our more recent enchantment with information technology and some of the discoveries of neuroscience? What makes Moretti's enterprise so compelling, however, is clearly Moretti himself. The wizard behind the curtain of Google literary studies, he brings a combination of extensive reading, intuitive genius, and rhetorical mastery to the act of selecting just the right details to indicate a major change of narrative pattern. As a unique synthesis of the quantitative and interpretive wings of literary studies, his method cannot be said to represent either one or the other alone. The literary field has periodically been invigorated by just such interdisciplinary incursions. I say we embrace this newly available information and use it to develop interpretive strategies capable of rethinking our field for the new century.

Nancy Armstrong
Professor of English
Duke University


'Measurement without theory never tells us much'

The history of science is in no small part the history of instruments—better and better (and, usually more and more expensive) gadgets and techniques employed in the service of increasingly precise measurement. Telescopes and particle accelerators allow us to see almost to the beginning of the universe, microscopes resolve the unimaginably small, and supercomputers find order in vast quantities of data. The humanities also use technology—classicists were early adopters of photography, and every new technology of imaging has opened up texts that were theretofore invisible. Indeed, literary theory itself can be thought of as a technology, similar to mathematical technique, in that both provide powerful ways of analyzing and thinking about their respective domains.

Lest humanists be too worried that "mere" computation will take over, we should remember that measurement without theory never tells us much; good academic work always requires scholarly skill and creativity. Moreover, successful computation in the humanities will require that the corpus of texts and other objects of study be developed by scholars and institutions that serve scholars. It will be Stanford, the HathiTrust, and other library-based entities, not Google, that will do the painstaking work of assuring the integrity of the data.

An interesting problem for humanists will be learning how to apportion credit for work that relies on diversity of expertise in teams of scholars. But I am hopeful that this is exactly the kind of interpretative work that humanists are especially suited to do well.

Paul N. Courant
Librarian and Dean of Libraries
University of Michigan


'The true payoff will come when the collaborators ... read a set of works closely'

The time is long overdue for literary scholars to start working collaboratively. I think, though, that both proponents and opponents of Franco Moretti's ideas have too often treated "distant reading" purely in opposition to "close reading," as though one precludes the other. I suspect that the Stanford lab's greatest contributions will come through the perspective it will give us for better readings of particular works or defined sets of works. By mining the Google database, it should be possible to trace literary relations in a whole new way: to show who was the first person to use an influential term or to highlight a theme, and to find verbal patterns that will help reveal the real literary relations whereby the few novelists we still read emerged from the background noise of the genre fiction of their day. The true payoff will then come when the collaborators sit down together to read a set of works closely, both canonical works and forgotten books their research has led them to focus on, yielding a more solid middle-distance reading than we can reach either by close or distant reading alone.

David Damrosch
Professor of Literature
Harvard University


'We should embrace the promise of the moment'

Fortuitously, the invitation to participate in this e-book exchange arrived as I was reading through Frances Yates's "The Art of Memory." Yet again, it seems, a new technology is destabilizing longstanding relations among textuality, mind, and cosmos. We should embrace the promise of this moment. Who can resist the potential for understanding—or the shift in what "understanding" may come to mean—once memory has expanded to contain twelve million volumes?

We will surely look back on the current resistance to the e-book wistfully. After all, to the manuscript scribe, the information omitted in printing must have seemed a similarly appalling loss, and the digitized humanities research of the 1960s and 1970s has not provided a very promising model. We must hope that Professor Moretti and his students—and their students—will be able to formulate illuminating questions about the literary canon and interpret the information computers provide in a meaningful fashion.

For those who fear that the Stanford initiative will make our painstakingly cultivated practices of reading, criticism, and theory seem like an antiquated rhetorical mysticism, we should realize that that is exactly what they are. How wonderful if scholars could work together to understand literature, turning technology to humanistic advantage in the process.

Wendy Steiner
Professor of English
University of Pennsylvania


'Close reading ... has been joined by two other reading modes'

It's time to change the view that close reading gives literary studies its disciplinary identity.

Close reading will not disappear (nor should it!), but it has been joined by two other reading modes central to contemporary research: hyper reading and machine reading. Hyper reading is human screen-based, reader-directed, computer-assisted reading; machine reading is human-assisted algorithmic reading. Hyper reading includes skimming (reading quickly to get the gist), scanning (looking for a particular item), and juxtaposing (putting several texts side by side, as in a Google search). Moretti lumps hyper and machine reading together in "distant reading," but it is helpful to distinguish between what humans do with computer help, and what computers do with human help. Focusing on hyper and machine reading opens the field to work such as Moretti's and Matt Jockers', and it allows us to see their work as a continuum of the kind of reading literary scholars already practice. Our disciplinary identity, in this view, comes from rich articulations of the intersections between pattern and meaning, which can happen by reading one text closely, by surveying a landscape of texts in hyper reading, and by analyzing thousands of texts with machine algorithms. They all count!

N. Katherine Hayles
Professor Emerita of English
University of California at Los Angeles


'The contemporary is ... haunted by the digital'

In a collection of essays I've been writing called "The Classic and the Contemporary," a startling connection has emerged.

The Greek and premodern classics were produced before the Gutenberg era, under conditions of oral transmission or circulation in handwritten manuscripts. They predated printed, mass-produced, and mass-circulated books—or existed in a proximate relationship to them.

Contemporary writing exists on the cusp of a different age—ushered in not just by computers, which can look crude in retrospect, but by high-speed wireless Internet and multiple personal devices from smart cells to iPads. The transition to online publication and distribution now under way may never eliminate printed texts. But it will surely challenge their dominance. Will it undermine the stability of multiple copies, in multiple places, that private and public libraries have offered in the past? It's hard to say—rapid change being very much the condition of technology today.

In short, the classics and the contemporary bracket or frame the post-Gutenberg era of mass-produced, mass-circulated, and mass-read printed books. If Greek and premodern classics evoke the direct, unmediated conditions of oral narrative, the contemporary is haunted, though not yet replaced, by the digital.

Marianna Torgovnick
Professor of English and Director of Duke in New York Arts and Media
Duke University