[Editors’ note: this is a draft that Mark Sample uploaded to Profhacker last week. We have been unable to contact Mark for the final revisions, so we are posting it as-is. Our apologies for any errors.]
In late 2012 Twitter began rolling out a long-requested feature: a complete archive of a your public (non-DM) Twitter activity, from your very first tweet up to the moment you request the archive (from your Twitter Settings page). Shortly after you submit your request, you’ll be emailed a unique link to a zipped folder. Download that folder, unzip it, and open up the index.html file in your browser, and there’s your complete archive, organized by month and year, and totally searchable.
Despite my initial skepticism that the official Twitter archive might end up being a plain text file, stripped of any kind of metadata, I must admit that the archive is quite robust. All the metadata of every tweet is there—twice, in fact, as the archive includes both CSV and JSON files of your tweets, broken down by month. And because the archive is powered by HTML and Javascript, you can easily turn around and upload your complete archive to your own website. Here’s mine—every single tweet, from when I joined Twitter in August 2007 to January 17, 2013, when I requested my archive.
The main drawback to the official Twitter archive (versus back-up methods such as ThinkUp or Google Docs) is that your archive can go “stale” and fast. Any tweet after you download your archive is—and I hope this is obvious—not in said archive.
It is true that after a month has gone by, you can request a new archive, download it, unzip it, and so on. But what if you could automate the process of updating your archive and do it daily?
Enter Martin Hawksey’s ingenious Google Spreadsheet script for keeping your Twitter archive fresh. Hawksey (whose TAGSExplorer is another terrific tool for working with Twitter) combines the power of Google Apps and Google Drive to create a Twitter archive “seeded” by the official archive you’ve downloaded, but which is then updated daily with your newest tweets. The archive is hosted on Google Drive, which can now double—and this is a new feature—as a website host. Martin provides a thorough set of instructions as well as a video walkthrough.
I recommend watching the video, as Martin highlights a few subtle but important steps (like authorizing the system through Twitter’s OAuth process). I also encountered a problem in that my Google Drive was set to convert any files I uploaded into Google Doc format, which renders them unviewable as a website (e.g. my index.html file was converted into a word processing file). Once I turned off this conversion feature I was good to go.
Now, in addition to the currently-stale Twitter archive I’ve uploaded to my own site, I have a fresh archive, hosted on Google. And the best thing is, the updating process is totally automated. Every 24 hours my latest tweets show up there, seamlessly, magically.
fresh market photograph courtesy of Flickr user Robert S. Donovan / Creative Commons Licensed