Researching the Recent Past Online

statues at internet archive[This is a guest post by Dan Royles, a lecturer at the University of Angers in western France, where he teaches American Studies and English as a foreign language. He's previously written on "Digital Workflows for the Archives" for ProfHacker. You can find him online at, or follow him on Twitter at @danroyles.--@JBJ]

When I was writing my dissertation on African American AIDS activism, I ran into the problem that plagues many historians of the recent past: lack of archival sources. I had identified a handful of interesting stories to anchor my five chapters, but several of them involved organizations from the 1990s and 2000s that had left little in way of a paper trail. I solved some of my problem with an oral history project, which filled in some major gaps while creating a set of resources that will be useful for the future. However, I didn’t want to rely on oral histories alone. Memory is faulty, and although the narrative choices that people make in telling their stories are often instructive, without a set of corroborating sources it can be hard to piece together even a rough chronology of what “actually” happened.

This is why I was overjoyed to stumble on the Internet Archive’s Wayback Machine. ProfHacker readers may already be familiar with the Internet Archive’s TV News Search & Borrow from this earlier post by Anastasia Salter. Since 1996, the Wayback Machine has been crawling the Internet, caching websites as it goes. Depending on Internet traffic, the Wayback Machine may return to any given site numerous times, storing numerous snapshots over the years. Since my work deals with recent social movement and non-profit groups, I found this enormously useful. Armed with a group’s past URL, I could easily click through a timeline of cached pages to see how their site—and thus their language, programs, and membership—changed over time. I had heard of the Wayback Machine before, but hadn’t connected it to my research, much less to the particular problem I was facing. Nevertheless, it turned out to be my saving grace when writing about some organizations in the recent past for which few materials exist in traditional collections.

Of course, the Wayback Machine isn’t a magical window onto the Internet of old. For some sites, weeks or months may separate the webcrawler’s snapshots, so the archive doesn’t reflect every change to a given site. Moreover, there are broken links, missing images, and more bad web design (white Comic Sans on a black background, anyone?) than you can shake a stick at. But for getting at the textual meat of an otherwise dead webpage, it’s invaluable. [1]

Another strategy I found to be almost stupidly useful: Internet searches of e-mail addresses. Combing through my relevant Wayback picks, I’d find old e-mail addresses for the actors in my story. Simply dropping those into my search engine of choice turned up a whole other treasure trove of old press releases, Usenet lists, and forums that still live online. For example, one of my e-mail searches turned up this gem, a chain of e-mail correspondence among black gay artists and intellectuals upon the death of the black gay writer Essex Hemphill. In his introduction to the 2000 edition of Hemphill’s Ceremonies, African American and queer studies scholar Charles Nero recounts that Hemphill’s mother tried to expunge any trace of her son’s sexuality from his funeral service, maybe even destroying his personal papers, which were slated for donation to New York Public Library’s Schomburg Center for Research in Black Culture. [2] Going beyond Nero’s recollection, this e-mail chain gives us a sense of Hemphill’s friends’ real-time reaction to his death, including such bon mots as Colin Robinson’s wry comment about black gay men using Audre Lorde quotes to build their e-mail “signature prose pieces.”

Finally, as a Zotero user, I’ve found the Readability plugin for Chrome to be invaluable. (See Amy Cavender’s previous post on Readability and Zotero.)Occasionally I run across a web page that Zotero can’t “read,” meaning that it can’t take a snapshot. However, for whatever reason, Readability lets Zotero read these pages, making it easy to grab the site’s address, a snapshot (of the text, anyway), and basic metadata such as the time and date of access. Evernote users obviously needn’t bother with such a barbaric web-clipping practice, but for those who prefer Zotero, this has proven to be an effective workaround.

Of course, folks who do current research on the Internet and social media have their own sophisticated set of tools for stripping Twitter data, doing online ethnography, and the like. But for scholars working on the-recent-past-which-is-not-quite-the-present, this gives us a starting point for research methods on the receding digital present as flows ever more into the past.

Have you come up with your own tips and tricks for doing this kind of work? Leave them in the comments below!

[1] (As an aside, anyone interested in the recent histories of sex, gender, and social movements should take a look at the web pages and mailing lists] hosted by Philadelphia AIDS activist Kiyoshi Kuromiya and his Critical Path Project.)

[2] This, for a man who had once written, “When I die,/Honey chil’,/my angels/will be tall/Black drag queens.” (Essex Hemphill, “The Tomb of Sorrow,” in Brother to Brother: New Writings by Black Gay Men, ed. Essex Hemphill (Boston: Alyson Publications, 1991), 75-83.)

Photo “Pews of Statues at the Internet Archive” by Flickr user Eric Fischer / Creative Commons licensed BY-2.0″

Return to Top