Lots of fuss about Google buzz.
I can see the selling point "integrate the traces people leave
round the web into a single point of contact, integrate with gmail,
make things like twitter and facebook obsolete". But there were
some other goals "make it easy to bootstrap" and presumably "create
public pages showing your followers". Why these too? To avoid the
facebook problem of it being no value if you don't know which of
your friends are on -or if your friends aren't very
interesting/active. A big limit on facebook is that for people with
no active friends, it's not compelling. Hence their recent privacy
changes: if you can publish your activities more, its easier to
share them with others. Twitter solves the no-friends problem by
having some recommended feeds. Google try and share things through
your contacts -they do this with google reader and tried it with
buzz. It almost makes sense. Almost.
What assumptions did they get wrong?
- You want to follow all the people you email. Really, as if I
care about other people that much.
- You want the people you email to follow you. Really, as if they
care about me that much.
- You don't want to keep any of your contacts secret
- You know what public activities you already get up to
online
By glueing you together with your contacts, they may have
bootstrapped a social network, but by trying to publish that stuff,
they have exposed a bit too much about the communications graph.
Indeed, so much that it comes very close to breaking EU data
protection rules. Now they are in damage limitation. Yet like
facebook, being able to make that communications graph public
allows them to exploit that, and make money from it. Hence the
continual pressure from once-private apps (gmail, facebook) to turn
your actions public. I wonder what they will try next?
For everyone worried that Google is needlessly publishing facts,
here are some of the other things they may know about you, that
they chose not to share. Which is important: hopefully they
recognise the implications of keeping this stuff secret, which is
not just that publishing it may be illegal in some jurisdictions
and upset people everywhere, but letting your users know you log it
may cause them to stop using your services
- Location of use - IPAddr and inferred location. Yes, latitude
wants to do this, especially with your phone.
- Time spent in their apps, document and email titles as well as
content
- When and where you use the google chat apps, possibly through
third-party XMPP clients
- What you've been searching for
- What sites you've been clicking through to from the search
results
- If those sites use google analytics, you could probably do some
de-anonymisation tricks to work out who you are, even without
sharing cookies (the timestamp, IPAddr and referrer header should
be enough to correlate)
- What you've been buying using the google payment services
- Photo metadata from picasaweb: location, device info
The interesting one is google analytics. If your browser
downloads the analytics .js, it ends up issuing GET requests to
google sites that give accurate clock and IP info, if that script
can get the HTTP referrer header then they can see where in every
web site you go after you go to it via their search engine. Now, if
you are proxied/NATed it may be hard to get 100% accuracy, that is
assuming nobody else asks for "avonmouth massage parlour" at
roughly the same time you do. And if the google clickthrough links
are devious enough, they could stick some id on every referral
which you don't see but is on the headers and which analytics cares
about. Doesn't take much extra effort and before long you've got a
track of a users behaviour not just across the google cookied web
sites but every other web site you go to via a search. And from
there, and the cross-correlation with other people, you've got a
very nice model of the user.
So really, those people worrying about Google publishing private
info are to an extent overreacting: Google are not publishing most
of the data they hold on you, not even a significant fraction of
it. But what they were doing was publishing the graph of who you
talk to. And that can be quite sensitive. Why did they do it?
Because if you can make the entire graph public, you can do
interesting things with it.