• December 20, 2014

As Researchers Turn to Google, Libraries Navigate the Messy World of Discovery Tools

As Researchers Turn to Google, 
Libraries Navigate the Messy World of Discovery Tools 1

A.J. Mast for The Chronicle

Andrew Asher, assessment librarian at Indiana U. at Bloomington, studied how students use new library search tools. “It’s a logical impossibility to create a querying tool that doesn’t have any form of bias,” he says.

Many professors and students gravitate to Google as a gateway to research. Libraries want to offer them a comparably simple and broad experience for searching academic content. As a result, a major change is under way in how libraries organize information. Instead of bewildering users with a bevy of specialized databases—books here, articles there—many libraries are bulldozing their digital silos. They now offer one-stop search boxes that comb entire collections, Google style.

That’s the ideal, anyway. The reality is turning out to be messier.

The rise of these "discovery" tools, which mine giant indexes of aggregated content, is generating new tensions. Because some companies that make the search tools are also in the content business, selling article databases and other material to libraries, one fear is that firms could favor their own content in results.

Another is that discovery software, by sluicing content together, could deluge users with less-appropriate resources. Either way, they could miss relevant articles.

Discovery tools have fed a broad and sometimes bitter debate within the library world. Last year, for example, one library consortium, the Orbis Cascade Alliance, grew so frustrated with the lack of cooperation between two major vendors in the discovery business, Ex Libris Group and ­Ebsco Information Services, that it issued open letters urging the companies to "bring this nonsense to an end." Promising signs are emerging, however, including Ebsco’s announcement last week of a new data-sharing policy that the company calls "a huge advancement in cooperation."

Controversy aside, library patrons are reaping the benefits of what has become a vibrant and innovative technology market. Those patrons can now get library search results tailored to their interests. They can search across a sea of curated academic content, not just the limited pond of one library’s holdings. They can also use the software to explore features that go beyond just search results, including topic-focused research guides and names of campus librarians who can help them further investigate a subject.

The big question is how these emerging tools are influencing research. Scholars have begun several studies to find out. The work is important because "unlike almost anything that libraries have done before," the rollout of one-stop search tools is "really intentionally trying to change the way people do research," says Michael Levine-Clark, associate dean for scholarly communication and collections services at the University of Denver Libraries. "That’s bound to change what people find."

Mr. Levine-Clark and two collaborators—John McDonald of the University of Southern California and Jason Price of the Statewide California Electronic Library Consortium—have studied how adoption of a discovery tool changes the use of articles from publisher-hosted online journals. Based on data from 33 libraries and 8,765 journals from six major publishers, their analysis showed "an overall increase in usage for the entire set of journals in the year after implementation, though the extent of change varied by discovery service and publisher."

But what are people finding?

That’s at the heart of a separate study by Andrew Asher, assessment librarian at Indiana University at Bloomington. Mr. Asher, an anthropologist by training, gained notice for previous work on a five-university study of the student research process, which ran from 2008 to 2010 and used ethnographic methods to closely observe students’ habits. In 2011, he began a fresh experiment to figure out how undergraduates use the new library search tools and how they stack up against Google. The results, published last year in a College & Research Libraries paper written with Lynda M. Duke and Suzanne Wilson of Illinois Wesleyan University, shed some light on these sometimes opaque products, along with the bias issues that have dogged them.

Built-In Bias?

The study divided undergraduates from two universities, Bucknell, in Pennsylvania, and Illinois Wesleyan, into test groups. The groups were assigned different search systems: Ebsco Discovery Service; Summon, from ProQuest; Google Scholar; and conventional library catalog and periodical databases. Students were instructed to find resources they would use to complete various assignments. Librarians rated their choices.

To appreciate what Mr. Asher and his co-authors found, it helps to understand how discovery tools work. Libraries make large investments in different kinds of content, such as their subscriptions to databases of scholarly articles, or the books that fill their local catalogs. The new breed of search software hinges on building "a very large, consolidated index that represents all of those things," says Marshall Breeding, a consultant who specializes in library technology. Vendors of discovery tools will make deals with providers that sell content to libraries, he says, so that content can be represented in the discovery tools’ indexes and made available for search. (Beyond products from Ebsco and ProQuest, other major tools in this genre, known as "web scale" or "index based" discovery, include Primo, from Ex Libris, and WorldCat Discovery Services, from OCLC.)

Vendors describe their discovery tools as unbiased arbiters of information. Ebsco, for example, sells both search software and content, as does ProQuest. Asked whether Ebsco favors its own content in the results generated by its search tool, Sam Brooks, executive vice president for sales and marketing, dismissed the idea as "competitor-driven propaganda." He added, "There’s no truth to that whatsoever." Bias toward a content provider, he says, "would be commercial suicide for any discovery vendor."

Mr. Brooks points out, however, that Ebsco makes design choices about article relevance that may seem like bias, yet actually have nothing to do with content providers. For example, a university that uses Ebsco’s search tool, he says, will find that "a two-sentence news blurb will lose to a four-page, peer-reviewed article."

Mr. Asher’s experiment discovered that default settings of the tools had a major effect on what resources students chose. Working with Google Scholar, which is integrated with Google Books, students used more books. With Summon, they used a lot of shorter newspaper and magazine articles. With ­Ebsco Discovery Service, they used more journals, which meant they scored highest under the study’s rating rubric. (In a blog post responding to Mr. Asher’s study, ProQuest said the methodology "inadvertently penalized" Summon, its product.)

Mr. Asher believes that "it’s a logical impossibility to create a querying tool that doesn’t have any form of bias." He speculates that discovery vendors may have better information about their own content, boosting certain articles higher in results.

After Bucknell adopted Summon, his study notes, the university saw significant increases in use of newspaper databases, including a jump of more than 700 percent for the ProQuest-owned Ethnic NewsWatch.

‘Content Neutrality’

In this competitive market—where rival players angle to sell discovery tools, content databases, and in some cases both—other complications can arise when competitors refuse to play nice with each other. Say, for example, a library buys a discovery tool from one vendor and a content database from another. If the database vendor declines to share information about its content with the discovery-tool provider, that content may fail to appear in the discovery tool’s search results—even though the library pays for both products. It can be difficult for librarians to evaluate which discovery tools cover which content, and how well.

"If I subscribe to something, I want my users to be able to find it regardless of what discovery system I choose," says Laura Morse, director of library systems for Library Technology Services at Harvard University, which picked the Ex Libris search tool.

Ms. Morse belongs to a group, the Open Discovery Initiative, which unites various players to promote transparency and best practices for discovery tools. In the emerging conversations around this topic, one buzz phrase is "content neutrality." The idea resembles "net neutrality," the notion that network operators shouldn’t block or favor certain content. Proponents of content neutrality argue that discovery providers should have equal access to the information needed to surface content in the search tools’ results. That information includes things like the full text of journal articles and the "metadata" that describe the articles, such as the author, subject, journal title and publication date.

The ice has cracked a bit. In January, two competitors, ProQuest and Ex Libris, announced a data-sharing deal. Ebsco, meanwhile, points to its new metadata-sharing policy, as well as partnerships with three other discovery providers, Innovative Interfaces, SirsiDynix, and OCLC.

How much all this will matter is debatable. Only about 20 percent of faculty members begin research at their libraries’ online catalogs, according to a 2012 survey by Ithaka S+R. And while undergraduates, in particular, enjoy the new one-stop discovery tools, others emphasize that specialized databases remain important for serious scholarship.

Meanwhile, the competition for student and faculty attention has only intensified since 2004, when Google’s "simple way to broadly search for scholarly literature" made its debut. That free service, called Google Scholar, has many fans in academe.

One of those is Mr. Asher, the Indiana University librarian who studies discovery software. He appreciates the "cited by" feature on Google Scholar, which lets you trace how an article is used. "It’s faster," he says of the product, "and I’m just used to it."

Mr. Asher is familiar with the criticisms of Google Scholar. After all, his own study listed them: "limited advanced search functionality, incomplete or inaccurate metadata, inflated citation counts, lack of usage statistics, and inconsistent coverage across disciplines." Perhaps for this reason, he sounded a bit sheepish admitting his preference.

"I kind of hate to say it, since I am a librarian," he says. "We pay a lot of money for discovery tools. And then I go off and just use Google Scholar."

subscribe today

Get the insight you need for success in academe.