November 27, 2007
Utah Network Crash Erases Student Data
Thousands of Utah students and instructors lost grades, assignments, and other information when the state’s Blackboard Vista database crashed as a result of corrupt data.
The Utah Statesman reported that the crash erased all information entered in the statewide database between November 13 and 16 at Utah State University. The information is permanently lost. Many students will have to resubmit assignments, and faculty members will have to re-enter grades.
The database is run by the Utah Education Network, which operates the Blackboard system for colleges and schools across the state. To what extent various institutions were affected by the crash depends on how much they relied on the Blackboard system for classwork.
Scott Allen, learning systems administrator at the Utah Education Network, said in an interview that the crash was not caused by a defect in the Blackboard product but most likely by a problem with a computer network port. —Dan Carnevale
Posted on Tuesday November 27, 2007 | Permalink |Comments
Commenting is closed for this article.
Previous: Back to Soup Cans and String?
Next: Book-Digitizing Project Surpasses Goal of One Million
The information *entered between November 13 and 16” was lost, not “all information entered in the statewide database.”
The network went down again yesterday (26th). It doesn’t appear to be the fault of the Blackboard software.
— Doug Holton Nov 27, 12:28 PM #
“[T]he crash erased all information entered in the statewide database between November 13 and 16 at Utah State University.”
I think it said just that.
— Landrum Kelly Nov 27, 06:58 PM #
While I’m sure that Scott’s analysis is technically true (he’s a super-competent admin) it’s important to contextualize this. The ultimate worth of a system is contingent on the systems it depends upon; when we disaggregate Blackboard from these other systems, or from Blackboard’s historical track record in the state of Utah, we’re engaging in a form of abstraction that ultimately inhibits our ability to assess the product.
The more grievous problem is that Utah schools have experienced multiple catastrophic failures with Blackboard in the past. Given this history it’s perplexing that UEN has elected to partner with Blackboard anyway. One way UEN can redress this liability is by making sure that it is carefully evaluating alternative learning management solutions. There are a lot of other systems out there that are, in many ways, better aligned with the mission and spirit of higher education. Let’s hope UEN continues to evaluate these other options as it moves forward.
— Luke Fernandez Nov 27, 07:20 PM #
I do not know Scott, but the quote doesn’t make any sense from a technical standpoint. Either the reporter is not accurate or someone is engaging in CYA. How often are they backing up the database? I cannot believe it’s any less than once a day. The Bb database is a file or series of files. Those files should be backed up to disk and/or tape frequently. If there is a problem, the most that should be lost is data entered between the last backup and the failure.
If there was a corruption problem that forced them roll back four days, then perhaps it was a Blackboard issue after all. Blackboard as a company has always seemed more interested in making a buck than providing a quality product. At any rate, there’s no such thing “computer network port” that would have anything to do with the inability to recover three days of data when a database crashes. If the database is down (or crashed) you can’t enter data. If it was “up” during those three days, then the data should exist in backups. And by the way, it shouldn’t be some systems administrator facing the public, it should be the CIO or IT Director providing an honest assessment of what happened and what they are going to do to absolutely minimize the chance of a repeat.
— Bill Nov 28, 07:55 AM #
NB: Vista is the WebCT product recently acquired by Bb Inc.
— Corrie Bergeron Nov 28, 08:33 AM #
I completely disagree with “I do not know Scott, but the quote doesn’t make any sense from a technical standpoint”
There can be quite a few reasons for this type of failure, most of which a person could not competently comment on without knowledge of the systems and how they are set up and interact.
For instance….
On a huge like Utah, Scott may not be responsible for the backup systems, someone could have blocked a port on a firewall (computer Network Port) this could have made the backups fail, or parts of the backups fail.
He could be using an oracle script to do Hot Backups, which put each table into ‘hot backup mode. Some of these could have failed. The database could have ran out of space on some but not all of the tablespaces, this could cause corruption.
I can think of more reasons.
We administrators always seem to end up as the scapegoats for systems failures. Speaking from similar experiences. I bet Scott is the only admin who works on, what is probably a very large and complex system, I bet he has warned his superiors on multiple occasions that, he needs more resources and extra staff to support the systems, and I suspect this has been met with deaf ears.
Also the Guy who says
“There are a lot of other systems out there that are, in many ways, better aligned with the mission and spirit of higher education.”
If anyone can name a single product that even comes close to having the functionality or scalability of WebCT, sorry Blackboard Vista, and I mean someone who knows Vista and the other product.
(Moodle doesn’t count because its rubbish and most people who say its good have never used Vista)
I had an academic one who kept pushing for us to replace Vista with moodle ‘because you can put a picture of yourself in a discussion topic’ WOW (that’s a sarcastic wow!)
— Garry Nov 29, 04:26 AM #
I have to interject some comments here because there are so many inaccuracies in what has been reported about this problem.
First of all, we do not believe that Blackboard Vista software had anything to do with the problem. This particular problem could have occurred with any software that uses an Oracle database server and a Storage Area Network (SAN). It was a hardware failure, not a problem with software.
The problem, as reported by Dan Carnevale in this article, was not “a problem with a computer network port”, but an intermittent data transfer problem with a port on the switch that connects the database server with the SAN. No further errors have occurred since we moved database traffic to a different port on the SAN switch.
This was a very difficult problem to diagnose because it was an intermittent problem, and the faulty switch port did not report any unusual errors until some time later (when the switch was reset). The first indication that we might have a problem was that incremental database backups (done twice daily) started failing.
The reason that we suffered data loss (about 2.5 days) is because the data transfer issues with the SAN switch caused data corruption in both the Oracle data files and the archive log files. We had tape backups of the data files and archive log files, but they were also corrupt. Unfortunately, we could only recover the database to the last point that we had clean archive log files.
Could we have been better prepared to deal with this problem? Certainly. The staff at Utah Education Network (UEN) realizes the grief that this hardware problem caused for faculty, students, and administrators throughout the state. UEN staff is implementing additional procedures to monitor the various system components so we will be alerted sooner when problems occur, and to increase the redundancy of systems when problems do occur.
Scott Allen
UEN Learning Systems Administrator
— Scott Allen Nov 29, 02:06 PM #
The Utah Education Network obviously backs up data, but maybe just a simple continuous backup may not be enough. Is it possible to have information regularly transferred to a separate server base where corruption cannot spread? Some data entered between scheduled transfers of information may be lost but, it could at least limit the data loss to less than one day’s; a great improvement. If the data is manually transferred nightly then the connection between the auxiliary server and the main servers is cancelled the data cannot unexpectedly be corrupted. Also the data could be checked as it is transferred for corrupt aspects.
— Patrick Swartout Dec 11, 12:14 PM #