Google is getting antsy.
Even as a lawsuit over its book-digitization project remains up in the air, the search giant has quietly started reaching out to universities in search of humanities scholars who are ready to roll up their sleeves and hit the virtual stacks.
The company is creating a “collaborative research program to explore the digital humanities using the Google Books corpus,” according to a call for proposals obtained by The Chronicle. Some of Google’s academic partners say the grant program marks the company’s first formal foray into supporting humanities text-mining research.
The call went out to a select group of scholars, offering up to $50,000 for one year. Google says it may choose to renew the grants for a second year. It is not clear whether anybody can apply for the money, or just the group that got the solicitation.
To date, Google has digitized more than 12 million books in over 300 languages, an enormous increase in digital content that Google says will open up “new avenues of literary research.” Libraries at Stanford University, the University of Michigan, and the University of Oxford are some of those that have collaborated with Google’s controversial project.
It’s unclear whether professors awarded grants under Google’s humanities program will be able to work with newer digitized volumes still protected by copyright. In 2005 the Authors Guild filed a lawsuit accusing the company of “massive copyright infringement.” A proposed settlement of that suit, which has drawn criticism from the U.S. Department of Justice, awaits court approval.
Even without the protected works, though, the corpus available to researchers “would still be enormous,” says Matthew L. Jockers, a lecturer and academic-technology specialist in Stanford’s English department. “Far, far bigger than anything we have had access to in the past.”
Mr. Jockers and Franco Moretti, an English professor, have organized a digital-humanities-research group at Stanford whose members plan to apply for a Google grant to conduct text-mining work that goes “beyond anecdotal studies of literary history.”
Literature is one of eight “disciplines of interest” that Google has identified for its program. The others are linguistics, history, classics, philosophy, sociology, archaeology, and anthropology.
The effort seems largely focused on building tools to comb and improve Google’s digital library, whose book-search metadata—dates and other search-assisting information—one academic researcher calls a “train wreck.” These are some of the sample projects that Google lists in its call for proposals:
• Building software for tracking changes in language over time.
• Creating utilities to discover books and passages of interest to a particular discipline.
• Developing systems for crowd-sourced corrections to book data and metadata.
• The testing of a literary or historical hypothesis through innovative analysis of a book.
Breakthrough for Digital Humanities?
In part, the program reflects Google’s self-interest. One of the company’s imperatives is to encourage people to use its collections in creative ways, so that those collections “become essential parts of daily life,” says Siva Vaidhyanathan, an associate professor of media studies and law at the University of Virginia and author of the forthcoming book The Googlization of Everything. But he argues that Google’s support could also be a breakthrough for digital humanities.
Over the years, digital-humanities scholars have received sporadic support from organizations like the Andrew W. Mellon Foundation and the National Endowment for the Humanities. For example, the “Digging Into Data Challenge,” recently organized by the NEH and other agencies internationally, is supporting research teams doing work such as a project to mine 53,000 18th-century letters to analyze how the effects of the Enlightenment can be observed in the letters of people of different occupations. Digital-humanities research, says Mr. Vaidhyanathan, is “full of great ideas and short on the tools needed to execute these great ideas,” largely because of a lack of money.
But digital humanists will need to be wary of becoming dependent on Google, whether for research money or for the raw material of their work, Mr. Vaidhyanathan cautions. “The last thing we need is such a close relationship that the tools that scholars develop only work with Google-supplied data sets,” he says. “That would be a tragic lock-in.”
Submissions for the grants are due by April 15. Google is keeping the process so low-profile that details don’t seem to even be available even on its own research Web site.
And it does not appear to be wide open. According to the call for ideas obtained by The Chronicle, Google is requesting proposals from “select researchers and faculty members.” That presumably includes Google’s book-digitization partners within academe. (Administrators at two of those partners, Michigan and the University of Illinois at Urbana-Champaign, confirmed that Google had solicited them.) Daniel J. Clancy, engineering director of Google Book Search, says the grant money will support about eight researchers. He referred further questions to a colleague, Jon Orwant, who was unavailable for comment.
Another unanswered question is what relationship this program will have to Google’s long-term plans for enabling research on its digital books. The proposed legal settlement would permit the use of millions of in-copyright works owned by universities for “nonconsumptive” computational research, meaning large-scale data analysis that is not focused on reading texts. One or two research centers would be created for this work, and Google would back the effort with $5-million.
By comparison, $50,000-maximum research awards are fairly small scale. “The difference might be that the center might support longer-running, larger-scale projects—and probably more collaborative,” says John M. Unsworth, dean of the Graduate School of Library and Information Science at Illinois. “The scale of the Google research awards suggests a single-investigator model.”
So should researchers worry about being co-opted by Google?
“It’s not like the tobacco industry sponsoring cigarette research,” Mr. Jockers says in an e-mail message. “Google’s profit model vis-à-vis the book-scanning project is pretty clear. ... That Google will also sponsor humanities research and give researchers access to the corpus does not, in my opinion, create any of the conflicts of interest that one finds in other kinds of sponsored research.”