Search The Site
 
More options | Back issues
Home
News
Opinion & Forums
Careers
Presidents Forum
Technology Forum
Sponsored Information & Solutions
Campus Viewpoints
Travel
Services

The Chronicle of Higher Education
From the issue dated July 2, 1999

Speeding Up the Flow of Scholarly Data

Tennessee professor wants to create a network of virtual warehouses

By JEFFREY R. YOUNG

Knoxville, Tenn.

Start chatting with Micah Beck, and he's likely to give you an earful about his latest research project

ALSO SEE:

The Distributed Storage Infrastructure Project


-- how it will speed up the Web and change the very nature of the Internet.

His enthusiasm is infectious, and his idea is so simple that it just might work. Over cappuccino at the cozy Golden Roast coffee shop here -- just a few blocks away from the University of Tennessee, where he is a research associate professor of computer science -- Mr. Beck explains his scheme, which can be summed up in one word: storage.

Let's say you're a professor with a large set of scientific images you want to share with colleagues around the world. Rather than putting the data on your campus Web server, as you would today, you would use a system that would automatically clone the data and place copies in virtual warehouses across the Internet. When a scientist at a distant university followed a link to your site, the system would automatically deliver the data from the nearest warehouse.

Because the file would have less distance to travel, it would load much faster, even on a slow Internet connection. Although such a file now travels quickly across the network -- for most of its trip, it moves at the speed of light -- it encounters split-second delays at multiple relay points along the way, and those delays add up.

"It's crazy to be fetching these pages over and over again across the wide-area network," says Mr. Beck. "Why not send the data part of the way there before you need it?"

His long-term vision: to get universities around the world to set up data warehouses on their campuses, and to put researchers in charge of deciding which information is important enough to store in multiple places.

He is off to a good start. He has persuaded the Internet 2 consortium (http://dsi.internet2.edu/) to sponsor his research. The consortium, which includes about 160 universities, is building a high-speed network that will form a kind of digital fast lane for researchers. Working with Internet 2 gives Mr. Beck backstage access to its experimental network, allowing him to test his software in a controlled environment before deploying it on the regular Internet.

So far, Mr. Beck has attracted seven corporate sponsors, which have donated equipment and expertise. He has also enlisted two co-directors who are nearly as excited about the project as he is: Terry Moore, associate director for project development in Tennessee's Innovative Computing Laboratory, and Bert J. Dempsey, an assistant professor of information science at the University of North Carolina at Chapel Hill.

Some network experts say Mr. Beck has a good idea, at least in theory. But pulling it off will require him to overcome many technical challenges, and some observers point out that the system will pay off only if its managers pick materials that are popular enough to warrant storing in so many locations.

In the race to build a faster Internet, Mr. Beck and his team take inspiration from the tortoise rather than the hare. Many high-speed-network projects are looking for ways to zip more data across the network faster. But Mr. Beck's project, known as the Distributed Storage Infrastructure Initiative, takes advantage of the shrinking costs of hard drives and other storage devices to achieve similar results with the slow but steady connections already available.

The underlying concept isn't new. In fact, a number of researchers across the country are working on better methods of what is often called "Web caching" -- pre-loading data from the Web onto local hard drives to improve surfing speeds. There's even an annual conference on Web caching, which this year met for the fourth time.

Some commercial and academic Web sites now use a similar strategy, setting up "mirrors" of their material on servers around the world. But users generally have to choose which mirror site they wish to visit -- that means an added step to get to the data. And users might not know which site is physically closest.

Mr. Beck hopes to take the idea of caching and mirrors one step further, by designing new software -- actually a set of translation protocols known as "middleware" -- that will help make the process of finding the most accessible data invisible to the user.

The software will be written so that "a single U.R.L. gets you to the closest copy, without human intervention," explains Mr. Beck. "We believe that makes a huge difference."

So far, the project has placed large server units -- those data warehouses -- at five locations around the country. One challenge, though, is making sure that if a change is made in the data in one place, all of the copies are updated as well.

After coffee, Mr. Beck shows off the server here. It consists of two components, one the size of a refrigerator and the other the size of a nightstand. The smaller unit is a jukebox of data tapes that are loaded into drives by a robotic arm. The larger unit stores some of the data from the tapes in a high-capacity hard drive, and transfers the data, on request, to the computer network. The sleek, black cabinets stand amid stacks of computer hardware and tangles of wires in the "machine room" of the university's computer-science building.

The server system can hold about a terabyte of data. That's enough space to store the contents of about 1,500 CD-ROMs, or more than 250 feature-length Hollywood movies. International Business Machines Corporation donated the server, and five others like it. Each system retails for about $200,000.

Mr. Beck hopes to install servers at about seven more universities by the fall, using equipment donated by other server manufacturers.

The additions will provide a testbed for the software Mr. Beck and his team are developing to let the servers communicate with one another. The researchers hope to make the software easy to use, so that others can add their own servers to the net work once the full system is in place.

"We're creating real estate," says Mr. Beck, pointing at the server unit here. "The real estate is these disks."

In a way, the servers would be a digital reserve-reading room for the global library of the Internet. Once an item had been placed on the server here, researchers on the campus could get it quickly, without having to wait for it to travel across the Internet.

"One example of when it could be worth it is if material will be accessed for a class," says Mr. Beck. "Let's say someone's doing a film class, and they want a film archive available to anyone from anywhere on campus" -- even though the original material is stored elsewhere.

A set of data in the storage system would be called a channel. Mr. Beck sees the channels as not just static collections of material, but groups of resources that would be updated regularly by researchers who would serve as channel managers. The managers would encourage other universities to replicate the channels on their warehouse servers.

If a university wanted to be the host of a set of films for a class on its campus, for instance, it would subscribe to a channel that would include the material it needed.

"We're looking for curators or channel developers who will put stuff on that disk," Mr. Beck says. "As far as we can, we will put it in their hands to manage that storage."

His group has identified several digital libraries that might benefit from a distributed-storage system, and their data sets will be used to test the system. The sets include collections of digitized music at Indiana University (The Chronicle, May 2, 1997), medical images at Vanderbilt University, digital video at Northwestern University, and scientific software at the university here, among others.

The distributed-storage project is a good example of how the Internet 2 consortium is doing more than just installing a faster backbone among members, says Greg Wood, a spokesman for the group.

"It really is an example of a new service that makes the network not just faster, but smarter and more efficient," he says.

And, he says, the project is an example of research sponsored by Internet 2 that could be quickly transferred to the existing Internet, benefiting everyone in cyberspace. If the system works, businesses that run popular commercial Web sites might want to use it to increase the speeds of their sites.

But some experts say the technical challenges facing the project are immense.

"I think this is clearly a good idea, but getting it to work well is a different question," says Peter Steenkiste, a senior research computer scientist at Carnegie Mellon University. "The real question is, How far can you push this?"

If it succeeds, the increases in speed could be substantial, Mr. Steenkiste says. "It can be a factor of 100 faster, or more."

Lixia Zhang, a professor of computer science at the University of California at Los Angeles who is also doing research on caching, says the biggest challenge of Mr. Beck's project is deciding what material to put in the data warehouses. "I support the idea of pushing, if you can predict the demand and what will be needed," she says.

In fact, the concept behind the distributed-storage project resembles one of the Internet's most notorious failures -- "push technology." Push services deliver information to the hard drives of registered users before the users even ask for the data. When users want to see the information, it pops up almost instantly on their screens, because it already exists on their computers.

When push technology appeared, in 1997, Wired magazine and other industry observers declared it the future of the Internet. That prophecy turned out to be false, however, as the technology failed to catch on with users. Mr. Beck says that his version of push has a better chance of success, because it relies on institutional efforts rather than individual ones.

Why is he so excited about the project? He believes his system will dramatically improve the Internet -- nearly as much as the Web did a few years ago. "To have your work be a part of other people's daily lives -- to me that's an incredible opportunity," he says. "It's a once-in-a-lifetime kind of thing."


http://chronicle.com
Section: Information Technology
Page: A21


Copyright © 1999 by The Chronicle of Higher Education