Cookies are small text files that websites store in your browser. Most of this is to facilitate things like shopping carts, or personalized settings for a site–for example, to identify subscribers to a news site, or commenters on a blog.
They are also routinely used to track what pages people visit, and in what order. Here’s Wikipedia’s explanation of how this works:
1. If the user requests a page of the site, but the request contains no cookie, the server presumes that this is the first page visited by the user; the server creates a random string and sends it as a cookie back to the browser together with the requested page;
2. From this point on, the cookie will be automatically sent by the browser to the server every time a new page from the site is requested; the server sends the page as usual, but also stores the URL of the requested page, the date/time of the request, and the cookie in a log file.
On the one hand, if you’re a web developer, this probably seems like useful information to have. On the other hand, the full implications for privacy come into view when you consider that, on any given page you visit, there might well be multiple web servers leaving cookies on your computer. For example, if the site serves ads, which are probably managed by another company, then there is an excellent chance that the advertising company’s server is placing cookies on your computer. Those cookies can then be accessed any time you visit *another* site using the same advertising company. This allows the advertisers to, over time, build up a remarkably accurate view of who you are, and what you’re interested in.
For example, visiting us at ProfHacker means that you’re served ads from such companies as DoubleClick and Mediaplex, both of which are major players in online advertising. Hypothetically, if you subsequently visited InsideHigherEd.com to continue your higher-ed fix, you would also be served DoubleClick ads. As you visit ESPN to get ready for the Premier League season . . . more DoubleClick ads. Checking out the headlines at the New York Times gets you more DoubleClick ads. Plus, ESPN and the Times also share two additional ad providers, so there are even more connections. Looking for a literary fix, you visit the Blog of a Bookslut–DoubleClick! Wired? Your your favorite band’s discussion forum? Not only DoubleClick, but now there are increasing connections among the secondary ad providers on different sites, as the advertisers start to exchange more information about you.
For a rhetorically-bracing illustration of why you should care about this, see DuckDuckGo.
I learned all this by playing around with Collusion (h/t George Siemens), an informative Firefox addon that visualizes data that’s being shared across websites as you surf the web. Once you start the add-on, it opens a tab in Firefox that constantly updates as the cookies on your browser change. There’s also a demo for people who don’t have Firefox installed.
The plug-in’s author is Atul Varma, who works at Mozilla on the Hackasaurus project. Varma developed the tool for the best of reasons:
I actually didn’t know a lot about tracking myself, so I whipped up a Firefox add-on called Collusion to help me visualize it better. The results were a little unsettling.
The comments to Varma’s post suggest different strategies for managing privacy when browsing online. Melanie Gross at gHacks recently explained how to selectively block cookies in IE and Firefox. There is a move afoot to standardize on a user-selectable “Do Not Track” option, but, advertisers aren’t jumping at the chance to support it.
Check out more ProfHacker posts on privacy.
Do you have favorite privacy-management tools or strategies? Let us know in comments
Photo by Flickr user Metro Centric / Creative Commons licensed


