• October 23, 2014

Digital Public Library of America: Young but Well Connected

Digital Library of America, 7-Month-Old Superaggregator 1

Kelvin Ma for The Chronicle

"I think one of the reasons people are liking DPLA is you can find material from a small rural archive alongside things from the Smithsonian," says the library's executive director, Dan Cohen (center, with staff members).

This past spring, after years of hopeful talk, the idea of a U.S. national digital library took the leap into reality.

The early signs are promising. After only seven months, the Digital Public Library of America, or DPLA, serves as the central link in an expanding network of cultural institutions that want to make their holdings more visible to the public. It has attracted financial support from foundations and government agencies, among them the National Endowment for the Humanities, the Alfred P. Sloan Foundation, and, mostly recently, the Bill & Melinda Gates Foundation. And it's begun to attract not only users in search of far-flung information but also developers who want to build new tools and applications on its open-source platform.

But its small staff also has a lot of work to do before the digital library fully realizes the vision that brought it to life.

Relying on many partner institutions, the library shuns what Dan Cohen, its executive director, calls an "imperial" model. It's not meant to be a virtual equivalent of, say, the Library of Congress, a central storehouse for collections of images and texts. It's not in the business of preservation. Instead the new digital library acts as a connector or superaggregator. It takes in millions of records of items held by libraries, museums, historical societies, and other cultural institutions across the country—more than 1,100 so far. Then it standardizes­ the records' metadata and uses it to point searchers toward items relevant to their interests.

To do all that, the digital library relies on a system of "service hubs" that feed it records from smaller entities in their parts of the country. In turn, the DPLA leads patrons back to individual collections, wherever they are. Mr. Cohen likens the system to an ecosystem in which water flows from pond to stream to ocean in a continuing cycle.

The service hubs include nine major state and regional digital libraries or library collaborations, including New York's Empire State Digital Network, the Kentucky Virtual Library, the Mountain West Digital Library, the Minnesota Digital Library, and the North Carolina Digital Heritage Center. The hubs help "regularize metadata and get content into shape," Mr. Cohen says, and host scanned and digitized content so that DPLA doesn't have to.

There are also "content hubs," institutions that generally have more than 250,000 unique records to add to the collective pool. ARTstor, the Internet Archive, the New York Public Library, and the libraries of Harvard University and the University of Virginia are among the content hubs. So is HathiTrust, the digital repository based at the University of Michigan at Ann Arbor that encompasses millions of digitized volumes from more than 80 partner institutions.

Creating a Network

"I think one of the reasons people are liking DPLA is you can find material from a small rural archive alongside things from the Smithsonian," Mr. Cohen says.

That increased visibility has already given a boost to some smaller institutions. For instance, the Nicollet County Historical Society, in St. Peter, Minn., about 70 miles southwest of the Twin Cities, has plenty to interest genealogists and historians of the region. It has lots of 19th- and 20th-century photographs, the records of Minnesota's first hospital, the ledger of the state's first school board, and other material that tells the story of the Minnesota River Valley and beyond. But it hasn't been overrun with visitors, according to its executive director, Ben Leonard "We have some real gems in the collection that people don't know are here."

That's changing. The Minnesota Digital Library, one of the Digital Public Library of America's's service hubs, helped with scanning and metadata, and the historical society was able to upload its records and make them findable via DPLA. Since then, traffic to the historical society's website has more than doubled.

"It's astronomically increased our ability to tell people about our collections," Mr. Leonard says. Beyond that, "it's not just images being digitized for us" that are valuable, he says. "It's this network being created" that his organization can tap for expertise and ideas.

About 160 institutions, including the historical society, belong to the Minnesota Digital Library, according to John T. Butler, associate university librarian for data and technology at the University of Minnesota Libraries. (The university administers the state digital library; the state supports it financially.)

DPLA brings those local and regional institutions "onto the national stage," Mr. Butler says. The Minnesota Digital Library has seen a 55-percent increase in traffic to its repository site since April 2013. "That's a powerful incentive" to participate, he says. "This is becoming something that you really want to be part of."

Advocacy for Openness

Mr. Butler and his colleagues in Minnesota have seen other attempts at large-scale metadata aggregations, he says. "This one feels different, I think, because of a number of things." DPLA is not only "reaching across cultural-heritage organizations" in a new way, he says; it has encouraged those organizations to be less fearful about sharing their metadata. "We're beginning to see an aggregation of advocacy around openness," Mr. Butler says.

With the service and content hubs feeding it metadata, the digital library has added records at a fast clip. It began in April with about 2.5 million items, according to Mr. Cohen. As of early December, that number had grown to about 5.5 million. (Each record may represent multiple items, like serials or multivolume works; DPLA has about 1.5 million records so far from HathiTrust, for instance, which describe some 3.5 million volumes.)

Once the records arrive, somebody has to standardize that information. "Folks who've been around libraries, archives, and museums for a while know that everybody's got weird, idiosyncratic databases," Mr. Cohen says. "One of the big things we're doing behind the scenes is just normalizing all this data. It's not perhaps a very sexy process," but a necessary one.

That massaging also makes it possible to display results in creative ways—mapping results by geographic location, for instance. The small DPLA team just hired a director of technology—Mark A. Matienzo, a digital archivist at Yale University—and Mr. Cohen hopes to have a staff of eight by the end of the year.

In the meantime, users have begun to dig in to the site. By October, the DPLA was seeing hundreds of thousands of page views per month—a decent rate, Mr. Cohen says. He's encouraged by "a really high level of engagement" among site users, who look at an average of six pages per visit. "I'm happy about those kinds of stats, but I think we still have a ways to go in terms of visibility," he says.

The DPLA was designed to be not just a gateway to information but a platform on which people can build new applications. That seems to be part of its appeal: Its application-programming interface, or API, has had some 1.7 million hits so far. The OpenPics application is a nice example of the possibilities. It uses GPS to pull together place-specific material from DPLA-linked collections, so that a user can create a customized, location-specific set of results. That kind of engagement, Mr. Cohen says, is "a form of success that won't be apparent in our straight-up Google rankings."

A recent $1-million grant from the Gates foundation may help the DPLA address another concern: its relationship with public libraries. The Gates money will finance a training program to equip public librarians with more digital skills so they can work with DPLA materials more easily.

In spite of its rapid growth, the Digital Public Library of America has a lot more expanding to do if it is going to live up to its name. Several regions of the country don't yet have service hubs, and adding more hubs is a top priority. So is expanding the range of cultural institutions and objects included. Mr. Cohen would like to see more audiovisual material represented, for instance, along with more recent books, although how to deal with copyright remains a question mark.

"The term 'library' is a very expansive term," Mr. Cohen says, "and we just need to do a better job fulfilling that."

subscribe today

Get the insight you need for success in academe.