Search The Site
 
More options | Back issues
Home
News
Opinion & Forums
Careers
Multimedia
Chronicle/Gallup
Leadership Forum
Technology Forum
Resource Center
Campus Viewpoints
Services
/r

The Chronicle of Higher Education: Information Technology
From the issue dated January 25, 2002


Once-Trustworthy Newspaper Databases Have Become Unreliable and Frustrating

Supreme Court decision led publishers to purge much archival material, to the dismay of scholars

By SCOTT CARLSON

Before Xiaotian Chen confronts a class of undergraduates, usually to teach them about methods of library research, he likes to get his material together months ahead of time. Mr. Chen was teaching a course last fall at Truman State University, where he was an electronic-resources librarian; naturally, he started gathering material in the spring.

While digging through Lexis-Nexis, he found an incisive San Francisco Chronicle article about the fragility of digital media. Later, in the fall, he told his students to look the article up. That'll be good practice for using electronic databases, he thought.

But his students came back telling him that they couldn't find it, or that the article didn't exist. Mr. Chen went online and was embarrassed when he couldn't find it, either. Then it dawned on him: "I knew right away it was because of Tasini," he says. These days, Mr. Chen, who has moved to Bradley University, uses the story to discuss The New York Times Company v. Jonathan Tasini, and how that case and related cases have perhaps permanently changed electronic databases.

Mr. Chen is merely one librarian whose work has been affected by the Tasini case and others like it. Six months ago, the U.S. Supreme Court ruled that publishers don't own the rights to online freelance articles. Other copyright battles between freelancers and publishers -- such as lawsuits between the National Geographic Society and a group of photographers -- are also moving through the courts. The publishers have responded by purging freelance articles -- sometimes entire newspaper archives -- from online databases. Almost 20 years' worth of newspaper history, a vital source of information for those studying history, politics, society, the media, and other subjects, is shot through with more holes than a block of Swiss cheese.

Scholars worry that they might find holes in their research. No one in academe seems to know how many articles, and which ones, are missing from the databases. After all, online databases, with their ethereal form, aren't like broadsheets of newsprint -- you can't open them like you would a morning paper and see the holes cut out.

Old-Fashioned Skills

Bruce P. Keller, a lawyer who represented the publishing industry in the Tasini case, says that publishers are merely protecting themselves from new lawsuits, not retaliating against writers.

"The implication right now is that nothing is safe if it's in electronic form unless you have contracts to cover it," he says. "The fact of the matter is that massive redactions have occurred." He adds that most of the latest material is safe; by the mid- to late 1990s, most publishers had added online-publication rights to their freelance contracts. But as for the articles from the late '70s, '80s, and early '90s, no one, even among publishers, is yet certain how much is coming down and for how long.

The news doesn't frighten everybody. In fact, some scholars, such as Bonnie Sue Brennen, are looking on the bright side. Ms. Brennen, an associate professor of journalism who teaches journalism history at the University of Missouri at Columbia, says that the databases are already incomplete, often omitting short articles, news briefs, and alternative papers. Yet they have become a crutch for lazy students. "I have a concern that all they rely on these days is Lexis-Nexis," she says. "They aren't really looking." She says the gaps in the databases might force students to dig through real newspapers.

Lee W. Formwalt, the executive director of the Organization of American Historians, says he has not heard any complaints from his members and that "the downside was not as catastrophic as some predicted." After all, as a 50-year-old historian, he had to do quite a bit of research without electronic databases, and he hopes that young historians are learning the same old-fashioned skills.

"Historians have been working this way for years," he says -- digging through old papers, traveling to relevant localities, and so on. "Yes, it might be inconvenient, but it doesn't prevent the research from happening. There are other sources out there."

Complications in Research

But other scholars and librarians are disturbed by the gaps in the databases. David M. Kennedy, a professor of history at Stanford, signed a brief with the popular historians Ken Burns and David McCollough, supporting the publishers. "This was exactly the fear of those of us who signed the brief -- that this would create an inferior online source," he says. Mr. Kennedy expects the case to affect his work. "To the extent that articles are not available or there's a disparity between the electronic and paper record, that just complicates my research program."

Stanley N. Katz, a professor of history at Princeton University who supports the writers, says the publishers' response to the ruling has been "devastating."

"Oh God, it's just terrible," he says. "I'm particularly appalled because I still think that my position was the correct one legally and politically, but it never occurred to me that the publishers were going to behave this way."

He doesn't regret supporting the writers. "We did the right thing. ... I think that had The New York Times taken a more responsible position and come to a settlement that was offered by Tasini and his group, it could have been solved at reasonable cost."

There has been serious damage to historical research, he says -- and even more to some other disciplines. "The people who are worst hit are the social scientists," who rely on electronic searching techniques to see the big picture, he says.

One such scholar is a colleague of Mr. Katz's, Steven J. Tepper, the deputy director of Princeton University's Center for Arts and Cultural Policy Studies. Mr. Tepper has been studying how the culture wars played out in 75 cities from 1995 to 1998, with a focus on controversial art: Would a work like Andres Serrano's Piss Christ cause more trouble in Atlanta or Missoula, and why? His main tools have been media databases like Lexis-Nexis, Dow Jones Interactive, and Dialog, which he scans for articles about art-related controversies.

But last fall he began stumbling across omissions. "This message started popping up, saying, We can't guarantee this is a full record," Mr. Tepper says. Now he's worried. If he misses even one controversy, "it really biases the results -- the difference between 12 cases and eight cases can be really important when you are doing statistical analysis."

"I knew this was a problem, but I was just assuming that the whole purging hadn't happened yet," he adds. "We don't know what the implications are, but we're trying to race like hell against time, to get as much done as we can before even more get purged."

Electronic research is the only practical way to study trends in 75 cities, Mr. Tepper says. He can't afford to send researchers out to each place to interview artists and curators and to dig through the local newspaper morgue. If the articles continue to disappear from databases, Mr. Tepper says, more research than his will be jeopardized. "Anyone who wants to do research on, say, abortion conflicts in the early '90s or late '80s couldn't do this kind of study because they couldn't be guaranteed that their findings would be consistent."

The debate made headlines last summer when the U.S. Supreme Court decided the Tasini case. In the suit, Mr. Tasini and other freelance writers said that, under the copyright law of 1978, traditional publishers did not have the right to republish freelance work in online databases; such use was infringement because it significantly altered the original work, the writers said, and thus the publications owed them money. Lawyers for the publications, in turn, argued that producing and selling online versions did not constitute infringement. They said that if the court ruled against them, the public would lose access to many online articles, as the publishers would begin taking material offline.

Both the American Library Association and the Association of Research Libraries filed briefs in support of the writers, as did a group of prominent historians, including Mr. Katz. Another group of historians -- including Gordon S. Wood from Brown University and Jack N. Rakove from Stanford, along with Mr. Kennedy and Mr. McCollough -- filed a brief in support of the publishers, fearing that significant access to newspaper files would be lost if the writers won the case.

The Supreme Court ruled in favor of the writers. The case has influence and bearing on similar freelance cases still wending through the courts. Soon after the ruling, the Authors Guild and another group of freelancers sued The New York Times, charging infringement. The National Geographic Society is battling through a set of infringement suits related to a CD-ROM set that compiles an image-based archive of all of the magazine's issues in its 114-year history; each side in that case says that the Supreme Court's ruling supports its claim. And the National Writers Union, of which Jonathan Tasini is president, has filed a new lawsuit against the Times, charging that the paper now forces freelancers to unfairly sign away their rights to online articles.

Some top librarians and library organizations are still pondering the role that they played in the case, along with the new world it created. Miriam M. Nisbet, the legislative counsel for the American Library Association, says that librarians are still trying to assess the effects of the case. "We have heard from a few different libraries and universities that are having some databases that have been pulled or are not accessible," but those libraries didn't think that the problem would be permanent, she adds.

'Tough Calls'

Ms. Nisbet says that while the Tasini case rose through the court system, both the American Library Association and the Association of Research Libraries tried to remain neutral because they could see the merits and dangers of each side's positions. But when the case reached the Supreme Court, the associations had to take a stand, she says.

Kenneth Frazier, the director of libraries at the University of Wisconsin at Madison and a member of several committees of the Association of Research Libraries, says that the decision to support the writers' side in this case was one of the "tough calls" that the organization has made. But ultimately librarians should be in the creators' corner every time, he says. "There were good reasons to worry about this outcome, and I was one to argue that reasonable people would not allow this, and I was wrong."

James G. Neal, the university librarian at Columbia University, says that the libraries' position "was in many ways a gamble, but an informed gamble."

"We felt that the larger issue -- demonstrating our support of authors -- was more important than the loss of information, which we did not think was going to be significant anyway," he says. "Now the degree to which that stuff is being pulled is unknown. We don't have a handle on that."

No one seems to. With the help of his research assistants and librarians, Mr. Tepper has been trying to figure out how much material is missing from the databases and whether he can continue without it. "We're having trouble getting straight answers," he says.

Indeed, tallying the articles missing from online sources can be time-consuming and tricky business, fraught with dead ends. When called by a Chronicle reporter, David Garcia, a spokesman for The Los Angeles Times, said that he wouldn't reveal the numbers of articles that the paper has pulled offline. He also wouldn't give numbers for the newspaper's parent corporation, the Tribune Company. Officials at The Washington Post referred a reporter to the paper's lawyer, who did not return calls.

But some news media and database companies confirmed that articles had been pulled off of databases -- either temporarily or permanently. For example, Factiva, which is owned by Dow Jones, lists on its Web site various newspapers whose online archives have been reduced or are no longer featured on the database, including the Star Tribune of Minneapolis and many papers owned by the Gannett Company, such as The Cincinnati Enquirer, The Arizona Republic, The Detroit News, and others. Tara Connell, a spokeswoman for Gannett, says that according to company policy, entire newspaper files are being pulled offline and reviewed. She doesn't know when the files will go back up.

Toby Usnik, a spokesman for The New York Times, said that since the Supreme Court ruling, in June, the newspaper has pulled 100,000 articles offline; however, 15,000 of those articles have since been restored after the paper struck deals with writers. Mr. Usnick says the paper's management is determined to get all of the articles back online, but he's not certain when that will happen.

There have been similar purges among Knight-Ridder papers. Dick Cooper, the research-services manager at Philadelphia Newspapers -- which includes The Philadelphia Inquirer and The Philadelphia Daily News -- says the company has permanently purged one-third of its 2.5 million online articles. "Unless there is a change in the law, that is going to be lost from our public file," he says. "We're not going to be renegotiating any past work."

All these deletions have led some scholars to find ways to get around the gaps by combining various media. One reference librarian at a southern California college says he uses ProQuest, microfilm, and digital archives on a CD-ROM set to find older articles from The San Diego Union-Tribune, which has deleted a great deal of its online material. The database ProQuest has retained citations of the paper's articles; the librarian simply uses the database to find articles he needs, then gets the text from the microfilm or the CD-ROM set, which the paper sold to the library several years ago. The librarian requested anonymity because he fears that the Union-Tribune, or lawyers for freelancers, will force the library to give up its CD-ROM's.

'A Moving Target'

Though the numbers of deletions seem to add up to quite a sum, officials at database companies maintain that the effect of the court cases has been minimal. George Plosker, vice president of content support for the Gale Group, a database company owned by the Thompson Corporation, says that the number of lost articles is "kind of a moving target because we keep getting additional notifications from publishers." When he discussed the issue on a panel at an Internet-librarian's conference in November, fewer than 100,000 articles were affected; the number has at least doubled since then. Still, he says, that's less than 1 percent of the material Gale serves.

It is up to the publishers to determine which articles should be pulled, he says, adding: "We are making every attempt to retain citation indexing and written abstracts whenever possible." If the abstract was written by the author, however, it has to go, and Gale doesn't put anything in its place.

Charles Sims, a lawyer for Lexis-Nexis, says that database has seen similar losses. In many cases, he says, a citation is left in the article's place, so that people can tell what is missing and be able to look it up on microfilm. He couldn't say how often a citation is left behind.


http://chronicle.com
Section: Information Technology
Page: A29


Print this article
Easy-to-print version
 e-mail this article
E-mail this article


Copyright © 2002 by The Chronicle of Higher Education