• September 5, 2015

Crunching Words in Great Number

In the June 4 issue, The Chronicle published an article on what Google Books could mean for researchers. We asked some leading scholars to comment on how "big data" will change the humanities. Here are their responses:

'We fool ourselves when we pretend that Google did this for us. Google is not a library.'

There are few recent developments in the U.S. academy more exciting than the rise of digital humanities. Hundreds of talented, bold scholars are unpacking the raw materials of traditional humanities in new and exciting ways. Universities, foundations, and federal funding agencies have recently realized the great potential of digital humanities scholarship. A bit farther behind, more traditional colleagues are just now beginning to consider ways to judge and support digital projects. We have a long way to go. But the signs are all positive that digital tools are poised to energize and promote academic work in ways beyond our imaginations.

But all this energy and excitement could cause us to stumble in our rush to do cool stuff too fast. The reliance on the research corpus generated by Google Book Search is one such hazard.

To do any sort of data- or computer-based analysis of any phenomenon, one should ensure that the research subject is uncontaminated, of high quality, and fairly comprehensive. Ideally, digital humanities projects should be exploiting a set of collections expressly designed for research, in open formats, selected and vetted by scholars themselves, and maintained in an archival system with projected viability and utility that would last well into the next century. Google Book Search is none of these things.

Google Book Search -- like everything Google does -- is amazing and useful. It's also -- like everything that Google does -- designed to benefit Google. That's the way it should be. That's what we should expect.

But we fool ourselves when we pretend that Google did this for us. Google is not a library. It is not run for scholars and by librarians. It is a big, important business that considers risk and reward in every decision. The proposed settlement over the massive copyright infringements committed by Google on behalf of universities just further demonstrates that Google is now on its way to becoming one of the world's largest bookstores.

It's a happy accident for all of us that Google is so rich today that it can throw so much money at projects that do benefit scholars. But nothing is free. And nothing that seems free is worth depending on for the long term.

Since 2004 a collection of universities, including my own, began donating many of millions of dollars of their rare collections of riches to one of the wealthiest companies in the world. This certainly stands as one of the most absurd cases of corporate welfare that universities have ever been involved in. If we can manage to turn the research corpus into some outstanding scholarly work, then we can all give each other high-fives. But if that happens, it will be because of the work and imagination of a brilliant collection of scholars. And we can only imagine what such a group could do with a collection that was actually designed by librarians for scholars.

Siva Vaidhyanathan
Associate Professor of Media Studies and Law, University of Virginia
Author of the forthcoming book The Googlization of Everything


'No humanities research ... has ever been done "alone"'

As to digital technology and its relation to humanities scholarship and education, I want to say: let a thousand flowers bloom. Data mining may prove a useful device in our longstanding effort to understand our cultural inheritance.

But please, let's not go there forgetting what scholarship and education have always involved. I find it hard to believe that "Mr. Moretti and his colleagues" would agree that "One lesson they've learned is you can't do this humanities research the old way: like a monk, alone." No humanities research and scholarship has ever been done "alone," as a glance at the footnotes and bibliographies that typically come with humanities research publications pretty clearly shows. (And as to that figure of the solitary monk, one might usefully recall the fundamentally collaborative nature of monastic orders and their individual communities. Those astonishing products of the great scriptoria were not only the works of individual genius.)

And another thing. If "computational methods" for studying literary and cultural work hold out a certain scholarly promise, and they do, the minute particulars of the objects—their social and material character—remain indispensable. Imagine what we would have forgotten about ourselves, what we would never be able to know, if our books were gone and we had only digital simulations.

And finally: if Google Books has "changed the landscape" of our scholarly perception, and it has, perhaps its greatest legacy will be the spur it gave to the educational community to "do it right"—to create a virtual depository of the kind Robert Darnton and others like him have been pleading for: a virtual collection of our cultural heritage that actually meets the needs of scholarship and public education. At least we can hope that will be its great legacy.

Jerome McGann
Professor of English
University of Virginia


'How can we pretend to be surprised?'

Oh come on—one could have predicted a recurrence of the (still) false opposition between quantitative and interpretive methods known during the 60s as "the structuralist controversy." New Historicism has grown old; it has settled into a senescence lacking both interpretive ingenuity and archival depth and freshness. How, then, can we pretend to be surprised at the apparently "meteoric" success of a method that feeds both the fetishism of the archive fostered by New Historicism and our more recent enchantment with information technology and some of the discoveries of neuroscience? What makes Moretti's enterprise so compelling, however, is clearly Moretti himself. The wizard behind the curtain of Google literary studies, he brings a combination of extensive reading, intuitive genius, and rhetorical mastery to the act of selecting just the right details to indicate a major change of narrative pattern. As a unique synthesis of the quantitative and interpretive wings of literary studies, his method cannot be said to represent either one or the other alone. The literary field has periodically been invigorated by just such interdisciplinary incursions. I say we embrace this newly available information and use it to develop interpretive strategies capable of rethinking our field for the new century.

Nancy Armstrong
Professor of English
Duke University


'Measurement without theory never tells us much'

The history of science is in no small part the history of instruments—better and better (and, usually more and more expensive) gadgets and techniques employed in the service of increasingly precise measurement. Telescopes and particle accelerators allow us to see almost to the beginning of the universe, microscopes resolve the unimaginably small, and supercomputers find order in vast quantities of data. The humanities also use technology—classicists were early adopters of photography, and every new technology of imaging has opened up texts that were theretofore invisible. Indeed, literary theory itself can be thought of as a technology, similar to mathematical technique, in that both provide powerful ways of analyzing and thinking about their respective domains.

Lest humanists be too worried that "mere" computation will take over, we should remember that measurement without theory never tells us much; good academic work always requires scholarly skill and creativity. Moreover, successful computation in the humanities will require that the corpus of texts and other objects of study be developed by scholars and institutions that serve scholars. It will be Stanford, the HathiTrust, and other library-based entities, not Google, that will do the painstaking work of assuring the integrity of the data.

An interesting problem for humanists will be learning how to apportion credit for work that relies on diversity of expertise in teams of scholars. But I am hopeful that this is exactly the kind of interpretative work that humanists are especially suited to do well.

Paul N. Courant
Librarian and Dean of Libraries
University of Michigan


'The true payoff will come when the collaborators ... read a set of works closely'

The time is long overdue for literary scholars to start working collaboratively. I think, though, that both proponents and opponents of Franco Moretti's ideas have too often treated "distant reading" purely in opposition to "close reading," as though one precludes the other. I suspect that the Stanford lab's greatest contributions will come through the perspective it will give us for better readings of particular works or defined sets of works. By mining the Google database, it should be possible to trace literary relations in a whole new way: to show who was the first person to use an influential term or to highlight a theme, and to find verbal patterns that will help reveal the real literary relations whereby the few novelists we still read emerged from the background noise of the genre fiction of their day. The true payoff will then come when the collaborators sit down together to read a set of works closely, both canonical works and forgotten books their research has led them to focus on, yielding a more solid middle-distance reading than we can reach either by close or distant reading alone.

David Damrosch
Professor of Literature
Harvard University


'We should embrace the promise of the moment'

Fortuitously, the invitation to participate in this e-book exchange arrived as I was reading through Frances Yates's "The Art of Memory." Yet again, it seems, a new technology is destabilizing longstanding relations among textuality, mind, and cosmos. We should embrace the promise of this moment. Who can resist the potential for understanding—or the shift in what "understanding" may come to mean—once memory has expanded to contain twelve million volumes?

We will surely look back on the current resistance to the e-book wistfully. After all, to the manuscript scribe, the information omitted in printing must have seemed a similarly appalling loss, and the digitized humanities research of the 1960s and 1970s has not provided a very promising model. We must hope that Professor Moretti and his students—and their students—will be able to formulate illuminating questions about the literary canon and interpret the information computers provide in a meaningful fashion.

For those who fear that the Stanford initiative will make our painstakingly cultivated practices of reading, criticism, and theory seem like an antiquated rhetorical mysticism, we should realize that that is exactly what they are. How wonderful if scholars could work together to understand literature, turning technology to humanistic advantage in the process.

Wendy Steiner
Professor of English
University of Pennsylvania


'Close reading ... has been joined by two other reading modes'

It's time to change the view that close reading gives literary studies its disciplinary identity.

Close reading will not disappear (nor should it!), but it has been joined by two other reading modes central to contemporary research: hyper reading and machine reading. Hyper reading is human screen-based, reader-directed, computer-assisted reading; machine reading is human-assisted algorithmic reading. Hyper reading includes skimming (reading quickly to get the gist), scanning (looking for a particular item), and juxtaposing (putting several texts side by side, as in a Google search). Moretti lumps hyper and machine reading together in "distant reading," but it is helpful to distinguish between what humans do with computer help, and what computers do with human help. Focusing on hyper and machine reading opens the field to work such as Moretti's and Matt Jockers', and it allows us to see their work as a continuum of the kind of reading literary scholars already practice. Our disciplinary identity, in this view, comes from rich articulations of the intersections between pattern and meaning, which can happen by reading one text closely, by surveying a landscape of texts in hyper reading, and by analyzing thousands of texts with machine algorithms. They all count!

N. Katherine Hayles
Professor Emerita of English
University of California at Los Angeles


'The contemporary is ... haunted by the digital'

In a collection of essays I've been writing called "The Classic and the Contemporary," a startling connection has emerged.

The Greek and premodern classics were produced before the Gutenberg era, under conditions of oral transmission or circulation in handwritten manuscripts. They predated printed, mass-produced, and mass-circulated books—or existed in a proximate relationship to them.

Contemporary writing exists on the cusp of a different age—ushered in not just by computers, which can look crude in retrospect, but by high-speed wireless Internet and multiple personal devices from smart cells to iPads. The transition to online publication and distribution now under way may never eliminate printed texts. But it will surely challenge their dominance. Will it undermine the stability of multiple copies, in multiple places, that private and public libraries have offered in the past? It's hard to say—rapid change being very much the condition of technology today.

In short, the classics and the contemporary bracket or frame the post-Gutenberg era of mass-produced, mass-circulated, and mass-read printed books. If Greek and premodern classics evoke the direct, unmediated conditions of oral narrative, the contemporary is haunted, though not yet replaced, by the digital.

Marianna Torgovnick
Professor of English and Director of Duke in New York Arts and Media
Duke University


1. duffybjp - June 04, 2010 at 05:04 pm

Aren't we in danger of looking through the wrong end of the Google telescope? For millenia, human beings could ponder the infiniteness of detail in the night sky with our naked eyes. Inspirational as that was, it took the invention of the telescope to permit a detailed view of an infinitesimally small fraction of what our eyes saw to enlarge our grasp of what the nature and totality of the heavens might actually be. Will 20,000-30,000 19th Century British novels prove to have evidence of "viral" composition heretofore indiscernable to idividual scholars who haven't had world enough and time to spot it? Very likley so, but that viral composition will only be a facet of what culture and society were in the time and place of their origin, subjects already meaningfully and impressively, and never exhaustively, assessed by the old technology of scholarship. It's upon the analogue creativity of exceptional individual creative imaginations, much less so than upon the infinite regression of influences on them, which much scholarship has rightly focused. 19th Century Britiain produced novels great and inconsequential but each, one at a time. Cookbooks reduce the meals of a culture to a list of ingredients. Assembling ingredients for one's own table can't capture the experience of the resulting meal which each of the constituents of a given gastronomic culture experienced. The digitization of all books runs the risk of leaving inquiring minds faced with the conditions our ancestors faced, whether as individuals or in community: something, perhaps, commensurate to our capacity to wonder, but something vulnerable to the rich but fantastical surmises of mythologists.

2. bitnetted - June 04, 2010 at 06:31 pm

The digital tools as described here are finding aids, hints as to where to look next, alerting us to patterns that might not surface as we work within pre-established boundaries of genre and form. Reproducing and refining tidal canonical generic divisions is an important step towards validation of these sorts of methodologies (though of course we don't want to get caught in the trap of producing tools that yield the results we anticipated in creating them) but what be far more compelling is discovering how such methods cross genres and forms. What (structurally, linguistically, materially) distinguishes the newspaper of c. 19, the essay, the travel narrative, the novel, the poem, the personal letter, the history, the catalogue, the autobiography, the account book? What bind them together as being of an era, a people, a nation, a mindset (or an author)? Are there hidden patterns that we simply haven't conned? And what about when we start crossing into visual, architectural, landscape, spatial, geographic, algorithmic, market, demographic, and other forms of knowledge production, representation, and dissemination? How might we write our cultural histories differently if we weren't bounded by received packets of analysis? Machine-aided pointers towards disparate volumes might yet reveal hidden webs of connection and association that don't reproduce our disciplinary boundaries. They could then be applied towards all sorts of relations; figuring out what and how it means could be the next task of the humanities scholar.

However, laying that groundwork will take more than text mining, and certainly more than the subset of textual materials on offer. This is one small part of a great surfacing of influence networks made possible by computational methods, not only textual, but also visual, topographical, social...

3. mattjockers - June 07, 2010 at 12:29 pm

It's a shame that all the discussion fueled by the main article isn't taking place on one page.  There has been some good discussion in the comments section beneath the main article and also over at the Valve.

I'm concerned about some of the comments that seem to be pushing a point of view that Moretti and I are somehow suggesting the abandonment of close reading.  Over on the Valve site, Josh Landy writes: "it's a great shame that the recent re-advocacy of "distant reading" (which has always been a good idea) has been accompanied by a needless and counterproductive polemic against close reading."  I'm not sure who is opposing "close-reading," but it is certainly not me.

I have written elsewhere about what I call "macroanalysis," my term for the methodological approach to Moretti's idea of distant reading.  The type of analysis I am talking about is in some ways similar to macroeconomics.  Microeconomics studies the economic behavior of individual consumers and individual businesses.  Macroeconomics, however, is about the study of the entire economy.  It tends toward enumeration and quantification.  While there is an inherent need for understanding the economy at the micro level, in order to contextualize the macro-economy, macroeconomics does not directly involve itself in the specific cases, choosing instead to see the cases in the aggregate, looking to those elements of the specific cases that can be generalized, aggregated, and quantified.

Micro-oriented approaches to literature, interpretive, close readings of literature, remain fundamentally important, and it is the exact interplay between the macro and micro scale that promises a new, enhanced, and (dare I say it) perhaps even better understanding of the literary record.  The two approaches work in tandem and inform each other.  Human interpretation of the "data" whether it be mined at the macro or micro scale remains essential.  While the methods of enquiry, of evidence gathering, are different, they are not antithetical, and they share the same ultimate goal of informing our understanding of the literary record, be it writ large or small. At its most basic then, the macroanalytic approach we are employing is simply another method of gathering information about texts.  The information is different from what is derived via close reading, but it not of lesser or greater value to scholars for being such.

There is no antithesis, except perhaps among those possessed of knee's that jerk in spasms whenever the words "computation," "digital," or "statistics" are mentioned in the context of the humanities.

--Matthew Jockers

4. chenlook - July 04, 2010 at 06:22 am

Our Cheap wholesale and retail Nike air Jordan shoes,Nike shox, Nike air max 95 180 90 91, Gucci shoes, and handbags, hats, eyewear, Watch, swimwear, shirts, Leather belt, Jewelry, shorts , Wallet and other products,buy Now 40% ~60% discount,free shipping,accept paypal,Save Money is Make Money! do you like
welcome to http://www.china8trade.com

Add Your Comment

Commenting is closed.

  • 1255 Twenty-Third St., N.W.
  • Washington, D.C. 20037
subscribe today

Get the insight you need for success in academe.