• May 20, 2013

Previous

Next

A Gentle Introduction to Version Control

March 25, 2010, 11:38 am

Here at ProfHacker we’ve written a lot about backups, but never about version control. In fact, when I recently wrote “A Few Ways to Back Up Your Website”, I specifically said “I’m not going into things like version control software.” You see, for a lot of people there’s something about the phrase “version control” that makes it sound all super high tech, possibly scary, and definitely something only software developers would need to use. Well, it is pretty high tech on the back end, and software developers do use it, but it’s not all that scary—look at the tree-eating mascot for the free and open source, distributed version control system called Git, used in this post. It’s not all that scary, is it?

So maybe it is scary and high-tech to some, but that doesn’t mean non-software developers shouldn’t or can’t use it. In fact, a lot of you (non-software developers) probably have used version control before, and with positive results, as the processes are built into many popular tools.

For example, Google Docs features revision history for all documents. When I first wrote about using Google Docs in the classroom I mentioned that the students and I used the revision history while working through drafts of their paper—sometimes to step through the drafts to visualize the changes as they occurred over time, and sometimes to compare one revision to another (side by side).

Compare the Google Docs method of version control—in which Google force-saves (or you can manually save) revisions as you work, with maintaining a directory of files named doc1.doc, newdoc.doc, revision.doc, paper9.doc and so on.

Or, suppose you don’t have any version control methods in place—suppose you are writing a dissertation and you have a file called diss.doc in which you keep writing and writing and writing. You store multiple copies on external drives and perhaps even in the cloud. But what if your dissertation director says “I really liked that section you wrote last August but we decided to cut out. Let’s put that back in there.” What do you do? With version control software and processes in place, you can say “oh sure, I’ll go grab that version and work it back in” instead of “[gulp]” and “[expletive]“.

If you’ve used Google Docs, you’ve used version control. If you’ve used a wiki of any type, you’ve used version control (the “history” of a page). If you’ve used WordPress or some other blogging platforms, you’ve used version control (when editing a post, you can see all the auto-saves and manually-saved versions).

There’s More to Version Control than Just Revision History, Isn’t There?

Yes, there is. But the title says “gentle introduction” after all, and I didn’t want to jump right in with terms like “branching” or “atomic operations”.

I’ve described some tools that integrate version control by maintaining backups of your digital objects such that you can retrieve versions of them at any point in their creation. But full-fledged version control software works more functionality and terminology into the mix, such as:

  • commit/checkin and checkout: when you put an object into the repository (there are nuances to this, but go with it for now), you are committing that file; when you checkout a file, you are grabbing it from the repository (where all the current and historical versions are stored) and working on it until you are ready to commit or checkin the file again.
  • branch: the files you have under version control can branch or fork at any point, thus creating two or more development paths. Using a non-software example, suppose you are creating an assignment for the subject that you teach. Suppose you teach a lower-division course and an upper-division course in the same subject. You might have started with one master document of information but then forked it for the lower division class and the upper division class, continuing to develop them independently. If you continued developing the master document, the one you started with, that would be working with the trunk.
  • change/diff: this is just the term (change OR diff) for a modification made under version control. You might also hear “diff” used as a verb, as in “I diffed the files,” to refer to the action of comparing two versions of an object (there is an underlying UNIX command called “diff”).

There are many more terms than just these few listed above, but if you can conceptualize the repository, the (local) working copy, and the process of checking in and checking out files, then you are well on your way to implementing version control for your digital objects.

Two Open Source Version Control Systems: Subversion and Git

Although there are several different version control systems available for use—some free and open source and some proprietary—two of the most popular systems are Subversion and Git.

If you have a web hosting service that allows you to install Subversion, then you can create your own repository and use a Subversion client to connect to it.

But an increasingly popular tool is Git, which is a decentralized approach to version control and also offers numerous tools and hosting options for users who want to get started with a repository but don’t necessarily want/need/understand all the extra installation and maintenance overhead that goes with it.

For anyone wanting to get started with version control, I recommend Git. But first I recommend viewing the slideshow below:

You may also find these Git tutorials helpful.

As I was preparing this gentle introduction—which will lead to more specific discussions and how-to posts if there is interest—@benwbrum (Ben Brumfield) offered the following comments regarding version control:

Based on my own experience as a developer and two years of work in a university helpdesk, I’m convinced that researchers need to be comfortable with source control tools.

Think of the non-programming use cases: I can see the differences
between the four revisions to my THATCamp application. And since the differences live in the repository, my letter is a text file—no worries that the eventual recipient will turn on “track changes” and see the embarassing typos in the first rev.

Why fool with key fobs you might run through the laundry? I can get at my source code from our laptop, our desktop, AND from the server I’m deploying it on.

It’s easier to collaborate via an SCM system than by passing emails around, or the like. Systems like GitHub (which arrived after the original email) integrate social tools into the process, which help you track forking and merging.

All true! Ok, folks—now what do you want to know about integrating version control into your professional life (developer or not)?

This entry was posted in Productivity, Software and tagged , , , . Bookmark the permalink.

13 Responses to A Gentle Introduction to Version Control

Aaron - March 25, 2010 at 11:52 am

Good post. Versioned backups often are overlooked by end users, sometimes with disastrous results.

If you have a Mac, Time Machine is a perfect example of user friendly versioned backups.
DropBox.com keeps versions of any file you have in your db folder (Mac & PC). The Mac App ForeverSave (in the current MUpromo.com bundle) offers versioned, auto-save, backups for all the apps on your Mac! Very slick.

Another profHacker favorite: http://www.backblaze.com also uses versioned backups.

Versioning should be included in your backup strategy for at least your current working files! If I had a dime for every time someone came into the Apple office with a file that’s been overwritten by a clumsy grad student, or accidentally/incorrectly updated by a confused college, I’d have my student loans paid off…

OPIEWeb - March 25, 2010 at 12:06 pm

I wholeheartedly agree that researchers should be getting on the Version Control train. My academic wife and I have this conversation every time she emails a Word Doc to a colleague for editing. No, “Track Changes” is not the same thing.

I have to disagree in your examples however. Git in a “gentle introduction” to Version Control? If version control in general is analogous to, say, chess (easy to understand, difficult to play), then Git is closer to the Chinese game “Go” (http://en.wikipedia.org/wiki/Go_(game)). I’m a full time software/web developer and even I have a hard time wrapping my head around Git.

Subversion is heading in the right direction, however I would submit that Mercurial (http://mercurial.selenic.com) would be more suitable for researchers. It’s a different class of version control. Instead of dealing with entire “Versions” of a project, it deals in much more manageable “Change Sets.” See http://hginit.com/ for an introduction. Everywhere the author says “code”, think non-trivial document, dataset, or even statistics package syntax.

Instead of “committing” an entire file you are sending over individual changes which can be accepted or rejected individually or as a whole.

My “Saturday Morning New Software Idea” was for a word processor for larger projects (manuscripts, books, non-trival stuff that Word falls down on) with Mercurial style version control built in (there is more to it than that – if you want to talk more, hit me up)

Steffen - March 25, 2010 at 12:28 pm

While I use Git myself, I recommend DropBox to my students for backup and versioning in its most simple form. It works fine for the most of them, some even switch to a more powerful tool once the start collaborating with others.

Lincoln Mullen - March 25, 2010 at 10:36 pm

I’ve been hoping for an introduction to version control for digital humanists, so this post is great. I’d like to know more about two things: first, how to use a specific tool, like Git; second, how to use version control for documents like DOCs and ODTs.

Julie Meloni - March 26, 2010 at 9:21 am

First, thanks everyone for commenting on this post. You all seem to have picked up on the mighty rhetorical trick I employed…the one in which I brazenly ignored many things in an attempt to get people to talk about what they use or that they want to learn about. For years I used CVS, then SVN, and now I personally use Git. But I also still use SVN in some places, other services for other clients, etc. I’m no stranger to the system(s). But what I was trying to do here was gauge interest and also information from real readers about what you all use in academia or with your students.

With a reader base as varied in both skillsets and academic disciplines as ProfHacker readers are, there wasn’t going to be any way that I could cover all the options. Thus, the post was intended as a general introduction just to the concept, which is where we have to start.

I think this post and its comments show that there is an interest in a more detailed look at the various common systems and also the ability to use/maintain version control with more traditional backup services that people already use (much like how version control is integrated into Google Docs, wikis, etc).

Aaron, OPIEWeb, Steffen, I therefore thank you for your comments and appreciate your support for this sort of technical post/series. Lincoln, I’m going to keep you in my mind as my target audience! :)

OPIEWeb - March 27, 2010 at 11:36 am

@Sean do you have a link for Dan’s comment? I missed it. Thanks.

Sean Gillies - March 28, 2010 at 4:40 am

I was thinking of the first one in particular.

Sean Gillies - March 27, 2010 at 10:08 am

Mercurial, or Hg, is also very good, and has nice new tutorial material at http://hginit.com/. Joel Spolsky recently wrote a post about distributed version control, Mercurial in particular, but also applicable to Git, that argues that the big change here is to thinking in terms of changesets instead of versions (http://www.joelonsoftware.com/items/2010/03/17.html). After reading a tweet from Dan Cohen yesterday that scholarship might emulate software development, I wonder if distributed version control doesn’t point to a future of scholarship that’s less about editions and more about “diffs” or edits.

Scott - March 28, 2010 at 1:48 pm

Thanks so much for this post, Julie! I’ve been interested in learning about git for two main reasons, which, from the comments above, sound like pretty common use cases:

1 – I’m approaching my digital humanities project as an amateur with every intention of learning and using programming best practices. Git seems fantastic for keeping good versioned backups of my code and for a variety of collaboration scenarios that are looming on the horizon.

2 – Working on my first revise-and-resubmit, I realized just how good some kind of versioning would be for my dissertation work. (I had the old article, several writing-sample-specific updates, updates that integrated my article in the chapter. Then, there will be at least one more version, the copy edits, etc for the article.) Being able to “branch” certain revisions, when pieces are used for different purposes, seems like it would be worth a learning curve, though I’m not sure if it’s worth writing my diss as .txt files. But Word’s “diff” feature can be a bit spotty, especially if things get rearranged. It feels like this would help my writing discipline (I’m an inveterate, easily distracted tweaker), to think of writing sessions as “commits” with particular purposes. But I’m not sure if I’m dreaming that git or other version control will do more than is actually the case.

I second Lincoln’s suggestions above, then, as candidates for future posts. Thanks!

Heikki - March 29, 2010 at 3:43 am

I’ve learned to use Git in coding projects, and when I recently started doing my PhD (humanities, not comp. sci), using git seemed a natural decision. I’ve started with plain text files, but I’m thinking of moving to use MultiMarkDown, which allows for keeping the “source” files in plain text, and yet creating nice-looking PDF and HTML versions easily.

I think Git works beautifully with a writing project, at least for me. Haven’t done a lot of branching yet, but I imagine it will be very useful with proposals, article versions etc.

I agree with what Sean Gillies wrote above in that we’ll be heading towards a “culture of version control”, where authorship will be more distributed. I just wish there would be an easy, wiki-like web frontend for Git repositories, for those who are not interested in learning the command line interface. I have to confess I haven’t looked into the existing Git frontends, but I’m afraid they not oriented for writing or commenting, but coding.

Julie Meloni - April 2, 2010 at 11:46 am

Thanks for this, Michael. I’m collecting as many “reports from the field” that I can, so I can tailor the “real” posts about version control to the greatest number of people and in a good order. This is good stuff.

Michael Wojcik - April 2, 2010 at 11:09 am

Karl Stolley gave a nice demo of git at ATTW a couple of weeks ago, and we had a decent discussion afterward. There was some concern about the technical difficulties of git or other revision control systems.

There, and at my own ATTW talk a couple hours later (about using agile development for non-trivial web projects in writing classes), I noted that at MSU we’ve been successful in getting students to use version control for class projects. Specifically, this was a cross-listed 400-level (undergrad/grad) Tech Comm / Writing & Rhetoric class, where the students developed a community website using Ruby on Rails. We used Subversion in the usual software-development mode – students worked in their local sandboxes and committed changes to the server, and the site was periodically updated using Capistrano deployment tasks that pulled the code from the tip of the Subversion trunk. It only took a couple of weeks to get students comfortable with this system.

On another topic: since revision control systems work better with plain text, they play very nicely with LaTeX. LyX plus git or Subversion is a great way to create non-trivial documents with robust versioning and much better typesetting than what you’d get from Word. (LyX driving pdftex with a good font collection and features like microspacing and ligatures enabled is a thing of beauty. You’ll never be able to look at a Word document again without wincing at the horrible kerning.)

I wrote many of my undergrad lit papers in troff with vi and RCS. I’ve yet to find a word processor that could come close to that sort of combination in features, output quality, or price (free, including source).

  • 1255 Twenty-Third St, N.W.
  • Washington, D.C. 20037
subscribe today

Get the insight you need for success in academe.