Steve: Developing on the Edge - Buzz?
Steve: Developing on the Edge
Thoughts on development, Web-services, technology and mountains.
19Feb
Fri2010
Buzz?

Lots of fuss about Google buzz.

I can see the selling point "integrate the traces people leave round the web into a single point of contact, integrate with gmail, make things like twitter and facebook obsolete". But there were some other goals "make it easy to bootstrap" and presumably "create public pages showing your followers". Why these too? To avoid the facebook problem of it being no value if you don't know which of your friends are on -or if your friends aren't very interesting/active. A big limit on facebook is that for people with no active friends, it's not compelling. Hence their recent privacy changes: if you can publish your activities more, its easier to share them with others. Twitter solves the no-friends problem by having some recommended feeds. Google try and share things through your contacts -they do this with google reader and tried it with buzz. It almost makes sense. Almost.

What assumptions did they get wrong?

  1. You want to follow all the people you email. Really, as if I care about other people that much.
  2. You want the people you email to follow you. Really, as if they care about me that much.
  3. You don't want to keep any of your contacts secret
  4. You know what public activities you already get up to online

By glueing you together with your contacts, they may have bootstrapped a social network, but by trying to publish that stuff, they have exposed a bit too much about the communications graph. Indeed, so much that it comes very close to breaking EU data protection rules. Now they are in damage limitation. Yet like facebook, being able to make that communications graph public allows them to exploit that, and make money from it. Hence the continual pressure from once-private apps (gmail, facebook) to turn your actions public. I wonder what they will try next?

For everyone worried that Google is needlessly publishing facts, here are some of the other things they may know about you, that they chose not to share. Which is important: hopefully they recognise the implications of keeping this stuff secret, which is not just that publishing it may be illegal in some jurisdictions and upset people everywhere, but letting your users know you log it may cause them to stop using your services

  • Location of use - IPAddr and inferred location. Yes, latitude wants to do this, especially with your phone.
  • Time spent in their apps, document and email titles as well as content
  • When and where you use the google chat apps, possibly through third-party XMPP clients
  • What you've been searching for
  • What sites you've been clicking through to from the search results
  • If those sites use google analytics, you could probably do some de-anonymisation tricks to work out who you are, even without sharing cookies (the timestamp, IPAddr and referrer header should be enough to correlate)
  • What you've been buying using the google payment services
  • Photo metadata from picasaweb: location, device info

The interesting one is google analytics. If your browser downloads the analytics .js, it ends up issuing GET requests to google sites that give accurate clock and IP info, if that script can get the HTTP referrer header then they can see where in every web site you go after you go to it via their search engine. Now, if you are proxied/NATed it may be hard to get 100% accuracy, that is assuming nobody else asks for "avonmouth massage parlour" at roughly the same time you do. And if the google clickthrough links are devious enough, they could stick some id on every referral which you don't see but is on the headers and which analytics cares about. Doesn't take much extra effort and before long you've got a track of a users behaviour not just across the google cookied web sites but every other web site you go to via a search. And from there, and the cross-correlation with other people, you've got a very nice model of the user.

So really, those people worrying about Google publishing private info are to an extent overreacting: Google are not publishing most of the data they hold on you, not even a significant fraction of it. But what they were doing was publishing the graph of who you talk to. And that can be quite sensitive. Why did they do it? Because if you can make the entire graph public, you can do interesting things with it.

Comments