Faculty, students, researchers, and librarians can now create archived collections of Web sites through the California Digital Library’s Web Archiving Service – a way to preserve information on the Web that could otherwise be removed or deleted.
The frequency with which Web pages disappear is an “inherent vulnerability” for faculty and students presenting papers and research, said Tracy Seneca, the library’s Web archiving service manager. It’s difficult to validate online sources, she said, because cited links die, on average, three or four years after they were created.
“Government information is disappearing at the federal level and at the local level,” Ms. Seneca said. “All it takes is taking that document off of one server and it’s gone.”
With the service, clients — which include the University of California — can act as curators of a collection of Web sites, choosing which pages to archive and accessing them through a Web interface, Ms. Seneca said. Those collections can then be made available to the public.
After a client selects a site it wishes to archive, the service will crawl that Web address for 36 hours, being careful “not to be disruptive, especially to information related to events,” she said. The California Digital Library then curates the results and holds them privately for six months, to avoid any copyright violations.
Because the service and its clients are based in California, most of the information stored will initially be local to the state, like Los Angeles government pages, Ms. Seneca said. But it could soon be used to trawl national sites as well.
The library has already created archives of pages about the 2003 California recall election, the 2007 Southern California Wildfires, the Guantanamo Bay Detention Camp, California’s state government, and Middle East politics.
“This is all public domain information that our tax dollars pay for,” Ms. Seneca said. “We need to make sure it doesn’t disappear.”