Grounding Tomorrow's Digital Library in Traditional Values

John Wilkin
February 25, 2012

"I've always felt like a bit of an outlier in the digital library sphere," says John P. Wilkin, associate university librarian for library information technology at the University of Michigan. That's not what one expects to hear from a leader in the drive to build, connect, and preserve enormous collections of digitized material—a super-library for the 21st century.

Mr. Wilkin is executive director of HathiTrust, an online digital repository with more than 10 million volumes. Created in 2008 with the help of Google's ambitious book-scanning project, the effort is housed at Michigan but draws on the collections and resources of more than 60 partner institutions. "From the beginning, it was about the collective interest of libraries," he says. "Not about Michigan's collections, but about the ways those collections are meaningful to other libraries."

THE INNOVATOR: John P. Wilkin, University of Michigan

THE BIG IDEA: Pool digital collections from universities to build a super-library for the 21st century.

To Mr. Wilkin, emphasizing the digital in "digital library" misses the point. The challenges of building an online library are the those of building any library in an era of superabundant information. What do you include? How do you make collections findable and usable? The answers get harder to pin down as the amount of material increases.

"Our sense of the scope of the problems is imperfect," he says. "We don't know what a corpus is, what the comprehensive corpus is. We don't know what we're aiming at."

Figuring that out absorbs a lot of Mr. Wilkin's attention at HathiTrust. Even in the face of a lawsuit brought by the Authors Guild and other groups over access to digitized, copyrighted material, the repository has pressed ahead with efforts to get a handle on orphan works, whose rights-holders can't be identified or located. "The orphan-works problem, the in-copyright problem, all these things don't have numbers in the way they could have numbers," Mr. Wilkin says, noting the lack of estimates of how many works are affected. Part of his mission is to find those numbers.

Meanwhile, under his direction, the HathiTrust repository continues to grow, as new partners join and more volumes are added.

Mr. Wilkin originally planned to be an English professor. Working on his master's degree, he was uncertain enough about his tech skills that he brought his kid brother to the Library of Congress for help in using its online catalog.

After contemplating his job prospects, Mr. Wilkin decided to abandon the Ph.D. track and head to library school. There he discovered that he did have a knack for using computer systems, and for tasks such as database design. It turned out that such things "came very easily to me," he recalls.

That facility has served him well as the projects he takes on have gotten bigger and more complex. In his early days as a librarian at the University of Virginia and at Michigan, he helped put literature collections and government data online. Since the mid-1990s, he has been involved in large-scale digitization at Michigan. For instance, he worked on the Making of America project, a joint venture with Cornell University, which created an online library of primary-source documents about American history from the antebellum period through Reconstruction.

At that point, Mr. Wilkin realized that it wasn't enough to focus on text encoding and transcription. Technology had made possible "the reproduction of library materials on a large scale," he says. At the University of Michigan, that included shifting the library's preservation strategy "from reformatting and microfilming to entirely digital," he says.

Other libraries were dubious. "We didn't convince anybody—anybody—that that was the right thing," he says. "There was so much skepticism."

When Google's book-scanning project came along, digitization got big enough to capture people's attention. The dream of a large-scale digital repository didn't look so far-fetched after all. "Having a sense of scale changes everything," Mr. Wilkin says. "I will say, emphatically, that we had this conception in mind from the beginning. The first drafts of the agreement with Google had the seeds of the idea."

Digital preservation has become the watchword now, but some of the fundamental challenges that confront libraries have always been with them: how to manage ever-bigger amounts of information and how to make best collective use of resources. "What has been interesting to me is how technology can help the library transform its work—not the digital library, but the library," Mr. Wilkin says. "Because we're hybrid libraries, and will be for as long as the artifact matters."