News

Yahoo Works With Academic Libraries on a New Project to Digitize Books

By Scott Carlson and

Jeffrey R. Young

October 14, 2005

Another search-engine company has joined with academic libraries to digitize large collections of books to make them easily searchable online. Yahoo Inc. has teamed up with the University of California system, the University of Toronto, and several archives and technology companies on a project that could potentially bring the complete texts of millions of volumes into digital form.

Yahoo officials say that the project is not a response to Google’s partnership with five major research libraries to scan millions of books, and that some planning for the Yahoo project was under way before Google announced its plans last December.

We're sorry. Something went wrong.

We are unable to fully display the content of this page.

The most likely cause of this is a content blocker on your computer or network.

Please allow access to our site, and then refresh this page. You may then be asked to log in, create an account if you don't already have one, or subscribe.

If you continue to experience issues, please contact us at 202-466-1032 or help@chronicle.com

The new archive is called the Open Content Alliance, and it was conceived in part by Brewster Kahle, director of the Internet Archive, a nonprofit digital library. The archive will be doing much of the actual scanning for the project, using a process it has developed in recent years.

Libraries involved in the project can have their books scanned by the Internet Archive for 10 cents per page, which leaders of the project say is far below the standard price of scanning.

Other participants in the project are Adobe, the European Archive, the National Archives of England, O’Reilly Media, and Hewlett-Packard Labs. The project hopes to attract other libraries and other partners, however, as well as more financial support.

Leaders of the project stressed that no books that are under copyright will be scanned unless the copyright holders give explicit permission.

In that way the project hopes to avoid the controversy raised by Google’s plan to scan nearly every book at the library of the University of Michigan at Ann Arbor, even works under copyright. Publishers’ and authors’ groups have said that Google must obtain permission before scanning copyrighted books, even if it offers only short excerpts of their content, as it plans to do.

In fact, one publishing group that has been critical of Google’s project, the Association of Learned and Professional Society Publishers, has endorsed the Yahoo plan. In a press release, Sally C.L. Morris, chief executive of the association, said, “We welcome the launch of the OCA because its approach respects the rights of publishers and other copyright owners.”

Classic Works

That plan means the Open Content Alliance will be limited mostly to out-of-copyright works — and to works by publishers who are willing to experiment with giving their content away online.

The project will allow generous access to the materials it holds, however — in some cases even allowing users to download the full texts of books.

Neither Yahoo nor any other group involved has been given exclusive rights to the content, according to the project’s leaders.

In fact, the books will be made available in ways that can be searched by other search engines, David Mandelbrot, Yahoo’s vice president for search content, said in an interview.

The project is modeled on open-source-software projects, in which volunteers extend and improve free software.

“Open source was a fantastic success; they figured it out,” Mr. Kahle said in an interview. He hopes the Open Content Alliance “can do the same for open content.”

“We would like to see the great wealth of our libraries get made much more available, where everybody is psyched and everybody knows their place and part,” Mr. Kahle said.

“This is a stab at what different organizations should do and what, if any, restrictions should be made on what is out there,” he added.

Daniel Greenstein, executive director of the California Digital Library, a project of the University of California system, agreed. “The focus of this thing is really open access,” he said.

Scanning 18,000 Volumes

To help jump-start the project, Mr. Mandelbrot said, Yahoo will pay for the scanning of an 18,000-volume collection of American literature at the University of California system. Yahoo is also developing the technology to search the books.

Yahoo does not expect to profit from the arrangement, said Mr. Mandelbrot, but sees it instead as a “philanthropic effort” to put more content online.

“Any monetization that’s able to be generated specifically as part of this program would only be used to fund additional digitization of public-domain or copyrighted works,” he said.

Troy D. Mastin, an analyst who watches Yahoo for William Blair & Company, said that Yahoo could see this project as an opportunity to “keep up with the Joneses” — specifically, their competitor, Google. “I don’t think that they are doing it just for appearance purposes, but these two players are pushing each other,” he says. “One will innovate, and the other will respond.”

He says that Yahoo could make money on the project indirectly if the digitized books can attract people to Yahoo’s site. “There still may not be that much commercial opportunity,” he says. “We don’t know if there is going to be a significant amount of advertising attached to book passages.”

Adobe and HP Labs are contributing software and services to the project.

Sharing Information

Mr. Greenstein said the University of California would add materials by selecting and scanning certain collections. The project will probably cost the university system $500,000 for the first couple of collections, he said.

“One meaningful service for a library community is to build something which enables the libraries to identify instantly what’s in there and what’s not in there,” and then add to the collection, he said. “One of the interests of the group is exploring ways to get people to upload materials directly to the archive,” he said.

Starting later this year, some of the scanned books will be available at the Open Content Alliance’s Web site (http://www.opencontentalliance.org), as well as through Yahoo, and more books will be added as they are ready. “The scanning has actually begun,” said Mr. Mandelbrot, “but it’s a somewhat time-consuming process.”

The Internet Archive has been working with the University of Toronto for the past year in a pilot project to test its scanning process, Carole Moore, the university’s chief librarian, said in an interview. So far, she said, about 2,000 books have been scanned, and more than 1,000 of those are already available through a section of the Internet Archive.

She said Toronto has coordinated with six other Canadian university libraries, as well as the Library and Archives of Canada, to select books by Canadian authors to be scanned for the project. “We’re trying to contribute for everyone a certain amount of Canadian material,” she said.

Leaders of the project hope that more and more libraries will add unique portions of their collections, so that jointly the new central digital library can one day hold nearly every public-domain work.

“We’re trying to nail bringing public access to the public domain,” said Mr. Kahle. “We want people to be able to do great things with the classics of humankind.”

http://chronicle.com Section: Information Technology Volume 52, Issue 8, Page A34