• Tuesday, May 29, 2012
May 29, 2012, 05:57:22 AM *
Welcome, Guest. Please login or register.

Login with your Chronicle username and password
News: Talk about how to cope with chronic illness, disability, and other health issues in the academic workplace.
 
Pages: [1]
  Print  
Author Topic: "Google Books Mutilates the Printed Past"  (Read 3914 times)
diefluffykitty
Junior member
**
Posts: 90


« on: June 18, 2009, 10:46:36 AM »

http://chronicle.com/free/v55/i39/39b00401.htm

I thought this article, by Ronald G. Musto, was going to be some sort of anti-corporate or anti-tech whine. (In fact, it is largely about Ronald G. Musto.) But near the end he finally gets around to his topic, and I was surprised to see his specific complaint about Google Books. Perusing one incredibly rare book that has been scanned, he found:

The books' pages were hurriedly reproduced: No apparent quality control was employed, either during or after scanning. The result is that 29 percent of the pages in Volume 1 and 38 percent of the pages in Volume 2 are either skewed, blurred, swooshed, folded back, misplaced, or just plain missing. A few images even contain the fingers of the human page-turner. (Like a medieval scribe, he left his own pointing hand on the page!)  . . . .

A random spot-check of other Google-scanned books has yielded some better results, but the general drift is clear: good enough for our mutilated view of the past, rushed through the scanning process so that Google could lay claim to as many artifacts of our cultural past in as short a time and with as small a budget as possible.


I wonder how many of you have found similar things? Not the odd scanned finger, but books that were all or partially unreadable due to the sort of sloppy human errors in scanning that Musto describes? I have not found any pages like that, but have not done any serious reseach in Google Books, though I often play with it.

Logged
balancing_act
Irritable, cranky, and non-smoking
Distinguished Senior Member
*****
Posts: 2,034

I come to the Fora to learn snark.


« Reply #1 on: June 18, 2009, 10:52:58 AM »

I have done serious research in Google books and have found his complaint exactly once, when google books was first launched. I thought it was sloppy. I did see a human finger, bent pages, blurred pages, and so forth. I haven't seen anything that bad since, but some of the older works, that you can download, aren't the best quality.
Logged

"Which of these stories will you be talking about tomorrow?"
bmljenny
New member
*
Posts: 7


« Reply #2 on: June 18, 2009, 03:29:16 PM »

Overall I would say the quality I've seen in Google Books is pretty good, and excellent compared to the quality of "legacy" scan formats like micro-opaques.  Having said that, there are some problems and it would be good if there were a way to easily report quality problems and know that someone would address it.  I get how Google really views this as a numbers game to get their hands on as much text as possible and quality and preservation are not their main concerns, so maybe that reporting mechanism would have to be directed back to the owning library. Which would be unfunded workload for them to deal with.  One thing I find annoying is that the OCR has been done on the entire scanned product, not just text. You get text matches of property stamps and date due slips. 

Logged
watermarkup
Distinguished Senior Member
*****
Posts: 1,431


« Reply #3 on: June 18, 2009, 11:09:03 PM »

I've run into this problem several times. A related issue is that the scan resolution is too low for the very fine print you find in some 19th century reference works. The biggest problem, though, is that the publication metadata Google makes available is horribly inadequate. You can't search by publisher, and multi-volume works get scattered all over the place.
Logged
sciencephd
Distinguished Senior Member
*****
Posts: 6,040


WWW
« Reply #4 on: June 18, 2009, 11:26:56 PM »

This is my first encounter with Dr. Musto.   What a self-massage-in-mirror piece of work he must be. 
Logged

I just hate it that I constantly have to like everyone and everything. -- moonstone

O, what a hateful feminist concoction!
Jews, communists, "lesbians", feminists and marihuana addicts  --Pyshnov
systeme_d_
Distinguished Senior Member
*****
Posts: 11,580

ஜ۩۞۩ஜ


« Reply #5 on: June 18, 2009, 11:33:35 PM »

When I am away from my office and cannot consult the actual (dozens of) volumes of a particular 19th century work I use constantly in my research, I have used GoogleBooks.   This particular scanned set of books pisses me off for two reasons.  The first is that the front matter of each volume evidently is in a certain typeface that doesn't scan well.  The same pages in every volume are blurred to indecipherability.

Second, there is recent handwriting in the margins of the volumes they chose to scan!   Who in the world would write in these relatively rare (and definitely expensive) volumes is completely beyond me,  but the second question is, of course, why would GoogleBooks scan these defaced volumes?  

Finally, and of the least concern, there is some weird blurry digitization stuff at the bottom of many pages.  Thankfully, this does not detract from the content at all.
« Last Edit: June 18, 2009, 11:34:58 PM by systeme_d » Logged

sciencephd
Distinguished Senior Member
*****
Posts: 6,040


WWW
« Reply #6 on: June 18, 2009, 11:38:52 PM »

Google books had a contract with a handful of libraries.  They scanned the books therein.  Simple as that, unless their model changed more recently than the articles I've read about it.

Most of the books classified as not rare or fragile were scanned in a completely automated fashion using robotics (with automatic page turners).

Clearly there are many weaknesses, but it is free.  Imagine the cost if the government had funded it, as well as the time it would have taken.  They would still be planning it.  Compare, for example,  to the digital projects at the Library of Congress.
Logged

I just hate it that I constantly have to like everyone and everything. -- moonstone

O, what a hateful feminist concoction!
Jews, communists, "lesbians", feminists and marihuana addicts  --Pyshnov
peppergal
Distinguished Senior Member
*****
Posts: 1,107


« Reply #7 on: June 18, 2009, 11:53:17 PM »

On the occasions I have used Google books I have also encountered blurred typeface, marginalia, missing pages, and incomplete scans (like only the top half of the page).  German Fraktur is particularly blurry, in my experience (not that it's particularly easy to read in the original, but the digitized version is much worse).  It would be nice if they had some sort of better quality control, but I still think it's valuable.
Logged
polly_mer
Distinguished Senior Member
*****
Posts: 30,222

hiding out from my grading. Shhh!


« Reply #8 on: June 20, 2009, 08:20:45 AM »

The biggest scanning problems I have encountered are blurred pages and out of focus type.

The bigger problem I have with Google books is the fact that some lazy people have taken to citing only what they can get through Google books rather than the best sources.  That's not Google's fault, but I'm annoyed about hoopla surrounding this "fantastic" new resource that has only 50 year-old junk books in my field that no one should be citing instead of the 20 to 40 year-old reference books that people use all the time.
Logged

If you haven't got either the anatomical or metaphorical balls to post your own question on a pseudonymous internet forum, then academia is the wrong job for you.
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.9 | SMF © 2006-2008, Simple Machines LLC Valid XHTML 1.0! Valid CSS!