The scary power of search logs

Stay off those rails
My experience is that it’s often hard to get people excited about things like Google’s highly detailed records of what you’ve been searching for (e.g., check out “Verbatim: Search firms surveyed on privacy” for a survey of what four major search engines collect, “FAQ: When Google is not your friend” for further analysis, and “Keeping secrets” on Slate for a related discussion). Responses often run along the “I have nothing to hide” or “Only criminals/bad people would have reason to worry about that” vein, but an underlying theme is often a sense that there’s really nothing useful/interesting to be learned from trolling through search terms.

Recently, however, there was a leak of the search histories of over half a million AOL users over a three month period. No names are attached, but they do have user numbers so you can pull together all the search terms from single individuals. Declan McCullagh (of the Politech mailing list) has sifted through some of it in “AOL’s disturbing glimpse into users’ lives” over on CNet, and it’s a real eye opener. And while there are clearly some deeply scary and deeply troubled people in his discussion (makes for better reading), step back for a moment from the pathologies and think about just how much these small subsets of these people’s search histories tell us about them and their lives. And then think about how much of your life (good, bad, and slightly smelly) could be reconstructed from your search history. Then throw in some web access logs, your Amazon search and buying patterns, your e-mail address book, and IM logs.

To misquote a fairly paranoid TV show: The data are out there.

And they can be accidentally leaked, deliberately sold (with or without the knowledge of the holding company), or subpoenaed. Currently there is very little legal protection of your privacy in the realm of search terms; basically all you’ve got is the good will and good word of the search engines, and faith that the courts care and get it enough to look after your privacy interests. As the AOL situation makes clear, though, mistakes happen, and lord knows that not all judges are equally together on these issues.

What to do? Not clear that there’s a simple “fix it” action here, but at a minimum it would be useful to spread the word and help raise awareness. Bug your favorite search engine and let them know that you have expectations of privacy and are willing to vote with your feet. Supporting the work of groups like the Center for Democracy and Technology and the ACLU wouldn’t hurt, since they’ve both been pretty front-line on these and other important privacy issues. You might also take a librarian to lunch, since they’re a pretty cool bunch when it comes to protecting the privacy of their users.

The more hard-core among you might explore the many anonymous browsing tools. These are usually just a server that sits between you and, for example, Google, so Google’s records show all the searches for lots of people (including you) as coming from the anonymous proxy. This sounds good in principle, but in some sense you’re just moving your trust from Google to the proxy, since their logs can allow people to reconstruct your search (and browsing) history. If you’ve got toys, you could set up your own proxy (you do trust yourself, don’t you?), but then you lose the anonymity that comes from mixing your history in with that of hundreds of other uses. In other words, this is a complicated road to travel, and you’d best do your homework if you really want to anonymize your history.

Be careful out there. And don’t talk to strangers (at least not too often)…

Related posts

3 thoughts on “The scary power of search logs”

  1. Informative post. Thanks for the CDT link. Always nice to know where more of the ‘good guys’ are.

    Shouldnt you be at the fair drinking liberally!

  2. This is a good roundup of the issue. Unfortunately, most people are pretty oblivious (including, as you mention, judges) to this sort of thing. And the fixes are just too technical at this stage in our evolution. My grandma is NOT going to set up an anonymous proxy. She might, however, get a pretty convincing Phish-mail with information that has been personalized based on the data that is out there.

    Someday, maybe people will be so tech saavy that they run their own proxy, host their own rss feed readers, and collect their own data about themselves and their online habits. Maybe they could even sell this information to advertisers instead of giving it to yahoo, google (tivo?) for them to sell.

    For a funny treatment of the AOL deal, with examples from one strange user, they’ve got one over at boing boing

  3. Donkey: I got hooked up with CDT years ago when it was just an e-mail thing, and have continued to be impressed by their work. Glad you found the link useful! I probably should have been drinking liberally, but I’m pretty hit and miss about those sorts of things (even thought I thoroughly enjoy them when I go).

    Dykstra: Your example of your grandma is spot on. And, to be honest, while I can run all this stuff, I’m getting tired of running so many different things myself. Just because I can doesn’t mean it’s fun.

    I like the idea of collecting the info and selling it yourself, though, if you were so inclined :-). This raises the question, though, of why they’d trust/value the data if it came directly from you (as opposed to a “trusted” third party like Google).

    An interesting option that this suggests is to deliberately “spoil” the data being collected. It wouldn’t be hard to write a simple robot that generates semi-random queries all day long from your IP. It would take a little work to do it in a way that would effectively mask your real queries, but it might be do-able.

    Thanks for the Boing Boing link – very cool and strange!

Comments are closed.