May 7, 2008
In Wikipedia, Length Matters
A new study found that in Wikipedia, word count can be used to predict article quality.
Joshua E. Blumenstock at the University of California at Berkeley analyzed articles to see if he could predict whether an article was “featured” on Wikipedia’s homepage, which would indicate that it had received extra vetting from top editors to verify its exceptional quality. He looked at 100 variables that might correlate with whether an article ended up as a feature, including number of citations, readability metrics, one-syllable words, etc.
He found that using word count alone, he could predict with 97% accuracy whether an article was featured or not. Considering the full “kitchen sink” of all 100 variables only improved his accuracy slightly to 97.99%. The magic word-count cut-off seemed to be 1,830 words, above which articles were likely to be higher-quality, featured entries. Mr. Blumenstock speculated that the collaborative nature of Wikipedia may force longer articles to be higher quality.
Still, he wrote, “[f]eatured articles are meant to be ‘the best that Wikipedia has to offer’; these results indicate that they might merely be the longest Wikipedia has to offer,” he wrote. “The high degree to which word count can approximate Wikipedia’s elaborate peer-review process is somewhat unsettling.”—Catherine Rampell
Posted on Wednesday May 7, 2008 | Permalink |Comments
Commenting is closed for this article.
Previous: Using Cellphones in the Classroom (Constructively)
Next: Orphan-Works Bill Sails Through House Panel
This isn’t “unsettling” at all… it’s exactly what you would expect in a collaborative editing environment.
A Wikipedia article can only become long if it is of the highest quality. Why? Think about the process: if someone adds something of poor quality, it’s sure to be deleted quickly. Changes only ‘stick’ if they are good… which means that article length only sticks if it is good.
This should be neither surprising nor unsettling.
— Brad May 7, 04:34 PM #
It is far-more “unsettling” that C.V. length predicts academic rank. Researchers crazy enough to spend the long years of effort needed to solve a deep problem are unlikely to receive promotions, because they have failed in their duty to pollute the academic environment with hundreds of write-only books, papers, and other vita lengtheners.
— S. Britchky May 7, 07:15 PM #
This is a misleading summary of the paper’s findings. Basically, the study found that, for a sample including all the featured articles (about 1500) and a random sample of other articles (about 9500) more than 50 words long, article length gave 97% accuracy in sorting the featured from the random.
All this shows is that the vast majority of random Wikipedia articles are much shorter than the typical Featured Article.
Given the entire Wikipedia dataset (where the featured to not-featured ratio is about two orders of magnitude smaller), length alone would not make a good predictor of featured status.
Unless I am seriously misinterpreting the methodology, the conclusion of the paper is totally unsupported, in that the data set had 13% featured articles instead of .1% like in the actual Wikipedia.
There are very (very) few Featured Articles shorter than 1830 words, and apparently only about 3% of all articles are above 1830 words (or were as of July 2007; average article size is growing).
By my back of the envelope calculation, this method (classifying all articles over 1830 words as featured) would have about 5% accuracy if applied randomly to the entire Wikipedia database.
Extremely shoddy work.
— Sage Ross May 7, 07:56 PM #
Comments in the stripe of Sage Ross and Brad are what I love about collaborative environments and why I always look at the wikipedia discussion pages after reading the article when I want a take on the quality of the work presented. Together we are often smarter than we are individually.
— DW May 7, 08:10 PM #
I did a bit of math wrong. The last two paragraphs of my previous post should be:
“There are very (very) few Featured Articles shorter than 1830 words, and apparently only about .5% of all articles are above 1830 words (or were as of July 2007; average article size is growing).
By my back of the envelope calculation, this method (classifying all articles over 1830 words as featured) would have about 24% accuracy if applied randomly to the entire Wikipedia database.”
Assume that all Featured Articles are longer than 1830 (this is very nearly true, possibly exactly true). Assume that .5% of all non-featured articles are longer than 1830 words. Using a sample of 1550 featured articles and 9550 non-featured articles, one expects length to successfully predict featured articles with 97% accuracy. (1550 featured plus 48 non-featured articles longer than 1830 words, so 1550/1598=97%)
However, applying that test to a random article in the Wikipedia database, where there are (by Blumenstock’s sorting methods) one million non-featured articles but still only 1550 featured articles, that would mean 5000 non-featured articles over 1830 words for a 1550/6550= 24% success rate in predicting featured articles.
— Sage Ross May 7, 08:31 PM #
Hi Sage,
Thanks for drawing my attention to this post – I had no idea this report would get any sort of press. Anyhow, as I responded in the email I just sent you, there are a couple of things to point out:
1) For the most part, you’re correct. There is nothing earth-shattering in this paper. I found the tight correlation between length and “Featured” surprising. I guess I was getting poetic when I said “unsettling,” cuz my point wasn’t to badmouth Wikipedia, or force any pejorative conclusions on anyone. Brad’s comments are totally correct, and plenty of people won’t find these results surprising. That said, there are a lot of people out there, especially in the research community, that assume that “Featured” articles are the be-all and end-all of quality on wikipedia. A lot of effort is devoted to designing complex methods for predicting which articles will be featured. My point is simple: if you want to classify between featured and not, length is enough. If you want to do something more sophisticated, you need to look beyond featured articles for a proxy for quality.
2) I think your back of the envelope calculations are correct, but you’re glossing over different meanings of the word “accuracy.” There’s (a) accuracy of predicting featured articles, (b) accuracy of predicting non-featured articles, © overall accuracy, and (d) more advanced metrics such as F-measures, kappa statistics, etc. (see my WWW2008 paper for more info on these numbers). In this report I was mostly referring to ©, and this motivated my use of 1500/9500 articles, instead of a more representative sample, in which case it would be trivial to maximize © because you could classify everything as not-featured (although this is where the kappa statistic is useful). If we applied my test to the entire Wikipedia, using your .5% statistic, the © accuracy would be roughly 99.5% – again a trivial result. The accuracy you cite of 24% refers to (a) – which is an important thing to think about, but it isn’t what most researchers have been trying to predict, and wasn’t the focus of my work.
3) I would be interested to know how well normal people could differentiate between long articles that weren’t featured, and long articles that were. A truly interesting test would be to see if a computer could do better than a human in that task. My guess is that the computer could do as well as the human, using methods like those in my paper; I’m guessing you would put your money on the humans. Unfortunately, I didn’t have the time/resources to put together a human panel to make the comparison.
Again, this is a simple paper pointing out a simple fact. My personal conclusion was that I will place a little less blind faith in the holy status of featured articles, but that is just my conclusion, and not one implied by the research. The research merely establishes the correlation.. what you conclude is up to you, even if your conclusion is that there was no point in doing the research to begin with ;)
— Josh Blumenstock May 8, 02:40 AM #
Unless you actually vetted the CONTENT of the articles, how can you make any conclusion about their QUALITY?
— RobJ May 8, 09:41 AM #