The Chronicle of Higher Education
Today's News
Monday, October 3, 2005

Yahoo Works With 2 Academic Libraries and Other Archives on Project to Digitize Collections

By SCOTT CARLSON and JEFFREY R. YOUNG

Article tools

Printer
friendly

E-mail
article

Subscribe

Order
reprints
Discuss any Chronicle article in our forums
Latest Headlines
Colleges Foot a Large Share of Athletics Expenses

A new report from the NCAA breaks out revenue and expenses more clearly than before and shows what many had feared: Most athletics programs are heavily subsidized.

The Lectures Are Recorded, So Why Go to Class?

One-Third of West Virginia U.'s Faculty Meets, Reaffirming Calls for President's Ouster

AAUP Criticizes U. of New Haven Over Lecturer's Dismissal

Education Dept. Easing Off on Preferred-Lender Limits

Higher Education Act Nears Completion, but Major Hurdles Remain

As Campuses Become More Diverse, Colleges Must Alter Fund-Raising Approaches, Report Says

3 Colleges Will Provide Courses for Forces in Iraq


Headlines

2 Australians win Nobel Prize in medicine, for research on peptic ulcers

Hurricane-relief bill with $36-million for colleges and students awaits Bush's signature

Harvard endowment exceeds $25-billion, even as university struggles to find new money manager

TIAA-CREF pulls out of small market of pension funds from overseas colleges

Student dies in apparent suicide explosion near U. of Oklahoma football stadium

British university reinstates student leader who invited controversial Muslim speakers

Suspicious letters cause contamination scare at U. of Calgary

Information Technology
Yahoo works with 2 academic libraries and other archives on project to digitize collections

Another search engine company has joined with academic libraries to digitize large collections of books to make them easily searchable online. Yahoo Inc. has teamed up with the University of California, the University of Toronto, and several archives and technology companies on a project that could potentially bring the complete texts of millions of volumes into digital form.

Yahoo officials say that the project is not a response to Google's partnership with five major research libraries to scan millions of books, and that some planning for the Yahoo project was under way before Google announced its plans last December.

The new archive is called the Open Content Alliance, and it was conceived in part by Brewster Kahle, director of the Internet Archive, a nonprofit digital library. The archive will be doing much of the actual scanning for the project, using a process it has developed in recent years. Libraries involved in the project can have their books scanned by the Internet Archive for 10 cents per page, which leaders of the project say is far below the standard price of scanning.

Other participants in the project are Adobe, the European Archive, the National Archives of England, O'Reilly Media, and Hewlett Packard Labs. The project hopes to attract other libraries and other partners, however, as well as more financial support.

Leaders of the project stressed that no books that are under copyright will be scanned unless the copyright holders give explicit permission. In that way the project hopes to avoid the controversy raised by Google's plan to scan nearly every book at the library of the University of Michigan at Ann Arbor, even works under copyright. Publishers' and authors' groups have said that Google must obtain permission before scanning copyrighted books, even if it offers only short excerpts of their content, as it plans to do.

In fact, one publishing group that has been critical of Google's project, the Association of Learned and Professional Society Publishers, has endorsed the Yahoo plan. In a press release, Sally Morris, chief executive of the association, said, "We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners."

That plan means the Open Content Alliance will be limited mostly to out-of-copyright works -- and to works by publishers who are willing to experiment with giving their content away online. The project will allow generous access to the materials it holds, however, in some cases even allowing users to download the full texts of books.

Neither Yahoo nor any other group involved has been given exclusive rights to the content, according to the project's leaders. In fact, the books will be made available in ways that can be searched by other search engines, David Mandelbrot, Yahoo's vice president for search content, said in an interview Friday.

The project is modeled on open-source software projects, in which volunteers extend and improve free software.

"Open source was a fantastic success; they figured it out," Mr. Kahle said in an interview on Sunday. He hopes the Open Content Alliance "can do the same for open content."

"We would like to see the great wealth of our libraries get made much more available, where everybody is psyched and everybody knows their place and part," Mr. Kahle said.

"This is a stab at what different organizations should do and what if any restrictions should be made on what is out there," he added.

Daniel Greenstein, executive director of the California Digital Library, a project of the University of California system, said, "The focus of this thing is really open access."

To help jump-start the project, Mr. Mandelbrot said, Yahoo will pay for the scanning of an 18,000-volume collection of American literature at the University of California. Yahoo is also developing the technology to search the books.

Adobe and HP Labs are contributing software and services to the project.

Mr. Greenstein said the University of California would add materials by selecting and scanning certain collections. The project will probably cost the university $500,000 to do the first couple of collections, he said.

"One meaningful service for a library community is to build something which enables the libraries to identify instantly what's in there and what's not in there," and then add to the collection, he said. "One of the interests of the group is exploring ways to get people to upload materials directly to the archive," he said.

Starting later this year, some of the scanned books will be available at the Open Content Alliance's Web site, as well as through Yahoo, and more books will be added as they are ready. "The scanning has actually begun," said Mr. Mandelbrot, "but it's a somewhat time-consuming process."

The Internet Archive has been working with the University of Toronto for the past year in a pilot project to test its scanning process, Carole Moore, the university's chief librarian, said in an interview on Friday. So far, she said, about 2,000 books have been scanned, and more than 1,000 of those are already available through a section of the Internet Archive.

She said Toronto has coordinated with six other Canadian university libraries, as well as the Library and Archives of Canada, to select books by Canadian authors to be scanned for the project. "We're trying to contribute for everyone a certain amount of Canadian material," she said.

Leaders of the project hope that more and more libraries will add unique portions of their collections, so that jointly the new central digital library can one day hold nearly every public domain work.

"We're trying to nail bringing public access to the public domain," said Mr. Kahle. "We want people to be able to do great things with the classics of humankind."



Background articles from The Chronicle: