The Chronicle of Higher Education
Today's News
Friday, August 25, 2006

U. of California Will Provide Up to 3,000 Books a Day to Google for Scanning, Contract States

By SCOTT CARLSON

Article tools

Printer
friendly

E-mail
article

Subscribe

Order
reprints
Discuss any Chronicle article in our forums
Latest Headlines
New Mexico State U. President Fires Off New Allegations in Dispute With Professors

Two whose contracts were not renewed by the university, in what they say was a case of discrimination, now say they are also the victims of baseless allegations by the university's president.

U. of Iowa Puts Flood Damage at More Than $230-Million

Professors Cry Foul as Kean U. Increases Required Office Hours

Portfolios Replace Qualifying Exams as Step Toward Dissertations

Robots Hit the Ivies

Commentary

Updating Higher Education's Past: 1940 to 2005


News Headlines From The Chronicle

Exclusion of evolutionary-biology majors from grant-eligibility list was inadvertent, department says

Pluto becomes a "dwarf planet" under astronomers' new definitions

Colleges hit by Katrina will share $80-million in federal grants and foreign contributions

Government declines to appeal ruling requiring it to give scholar a visa or explain why

U. of Cincinnati moves to fire scholar for plagiarism in book on German-Americans

U. of Miami agrees to pay nearly $400,000 for federal cleanup of hazardous waste

State Digest: Controlling tuition costs in Montana, and other news from the states

Information Technology
U. of California will provide up to 3,000 books a day to Google for scanning, contract states

A mere two months after the University of California begins its book-digitization project with Google, the university may provide the search company with a whopping 3,000 books a day for scanning. That nugget, and many others, can be found in a confidential contract that allowed California to join Harvard and Stanford Universities, the University of Michigan at Ann Arbor, and the University of Oxford, as well as the New York Public Library, in the search-engine company's elaborate and controversial library-digitization effort.

The contract was released in part as a response to an open-records request from The Chronicle.

According to the document, the university will provide at least 2.5 million volumes to Google for scanning, starting with 600 books a day and ratcheting up over time to 3,000 volumes a day. Materials pulled for scanning will be back on the shelves of their libraries within 15 days.

The contract offers clues to the scale of Google's ambition. "It is simply stunning that they can work with 3,000 books a day," said Prudence S. Adler, associate executive director of the Association of Research Libraries, after reviewing the contract.

Daniel Greenstein, director of the California Digital Library, who helped set up the deal, said Google had committed early on to a core value for the university: public access to the public-domain materials at no cost.

"They said, As long as we are alive as a company, or successors are alive using this file, we will make it available for free," he said. "I've never seen this from anybody. That was their opening gambit."

Under the contract, the university agrees to pay for pulling and shelving the books, bandwidth and hardware to store digital copies, rooms in which to do the digitization, and transportation of materials to those rooms, among other things. Google will cover its own labor, hardware and software to do the scanning, space in which to do scanning, and transportation to its spaces, along with other costs.

Both the university and Google will get digital copies of the scanned works, but there are some restrictions on how the university can use its copies. The university can offer the digital copy, whole or in parts, "as part of services offered to the university library patrons." But the university must prevent users from downloading portions of the digital copies and stop automated scanning of the copies by, for example, other search engines.

Entire works not covered under copyright can be distributed to scholars and students for research purposes, but there are limits on in-copyright material. The university retains a right to distribute no more than 10 percent of the collection to other libraries and educational institutions for noncommercial research. Before receiving the digital copies of works, other institutions have to enter a written agreement with Google regarding the use of the copies and provide indemnity to Google. The company has already been sued by a handful of publishers over its library-digitization project (The Chronicle, October 28, 2005).

The contract also reveals the project's branding opportunities for Google. According to the agreement, any time the university makes a digital copy of a book publicly available, the university has to identify the works as "Digitized by Google, or in a substantially similar manner."

Officials at Google provided few insights into the contract. The restrictions placed on the digital files, particularly those covered by copyright, were requested by both Google and the University of California, Adam M. Smith, the group business-product manager at Google, said in an e-mail message.

Mixed Reactions

Observers of Google's reach into academic libraries found elements in the agreement to applaud and to condemn.

Ms. Adler, of the research-libraries group, said the agreement to digitize millions of books over six years, and offer them free to the public, "greatly enhances the ability to mine and access these collections and opens up new venues for research."

She also pointed to "very clear provisions in here that will protect publishers that are concerned that there will be a release of the digital copies of their published works," including a requirement to prevent patrons from downloading and distributing the scanned files.

But some publishers have been worried about how libraries might use their digital copies from Google. Sanford Thatcher, director of Pennsylvania State University Press and president-elect of the Association of American University Presses, said that the agreement gave the university too much leeway.

"California could set itself up as a facility for providing e-reserves to all land-grant institutions," he said.

Publishers have worried that libraries are distributing too much digitized material through electronic reserves, defying copyright and cutting into publishers' income. Recently, the Association of American Publishers sent a letter to the University of California, complaining about the use of e-reserves at its San Diego campus.

The language allowing distribution in the Google contract "opens the door to the kind of thing that we are worried about with e-reserves," Mr. Thatcher said.

Others fretted that the University of California was giving too much to Google. Brewster Kahle, co-founder of the nonprofit Internet Archive, said the contract was another step in the "balkanization" of the digital library system. He said that while each of the institutions that have partnerships with Google will get digitized versions of their own books, they will not be able to share those versions to build a digital library. Only Google will have the most comprehensive collection, he said.

"We want a public library system in the digital age, but what we are getting is a private library system controlled by a single corporation," he said

Mr. Kahle forged a partnership with the University of California in forming the Open Content Alliance, which also includes Yahoo, Microsoft, and institutions such as Columbia University, the Johns Hopkins University, and the University of Toronto. The alliance, which has made open access a core component of its mission, is scanning only out-of-copyright materials.

"Microsoft, Yahoo, the Sloan Foundation, and dozens of libraries are funding a public and open system, but this is made more difficult by UC's agreeing to spend millions of taxpayers' dollars to benefit a single corporation's interest in building a private library," he said. "Needless to say, I am disappointed and hope it does not undermine others' interest in pursuing broad public benefit."

Mr. Greenstein said that the University of California was digitizing at full capacity with the Open Content Alliance, and would continue to do so. But one has to look at the Google deal from the university's point of view, he said. With the Open Content Alliance, "I think last month we did 3,500 books. ... Google is going to do that in a day. So, what do you do?"

"I understand [Google's] ends are commercial," he said. "But it's one of these things where their business model, their interests, and our interests align around public access for the public domain forever and for free."



Background articles from The Chronicle: