Patrice Riemens on Tue, 31 Mar 2009 22:27:59 -0400 (EDT)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Ippolita Collective: The Dark Side of Google (Chapter 5, first part)

There was a wee-end interruption as I had gone to the Union Territory of
Pudduchery (Pondicherry, ex- Inde Francaise)
patrizio and Diiiinooos!

NB this book and translation are published under Creative Commons license
2.0  (Attribution, Non Commercial, Share Alike).
Commercial distribution requires the authorisation of the copyright holders:
Ippolita Collective and Feltrinelli Editore, Milano (.it)

Ippolita Collective

The Dark Side of Google (continued)

Chapter 5  As bonus: other funky functionalities

Filtered algorithms: ready-made data banks and control of the users

Graph Theory [*N1] is the mathematical basis of all network algorithms,
PageRank[TM] among them. This branch of mathematics studies methods to
create, manage, and navigate different classes of networks, and to
describe them with graphs, and {rank them} according to their size. The
introduction of electronic calculators saw Graph Theory take a huge flight
from the mid 50s {of the previous century}. In terms of geometry, one can
figure a graph as a collection of points in space and continuous curves
connecting pairs of points without crossing. In Graph Theory, a graph (not
to be confused with a graphic) is a figure made up of points, called
vertices or nodes, and of of the lines connecting them, called arcs, edges
or arrows. [cf. Wikipedia, 'graph' & associated entries][*N2].

A network is a particular type of graph, in which it is possible to assign
a different value, or weight, to separate arcs. This enable to establish
different values for different routes {between nodes}. The Internet is a
graph, and the same can be said of all web pages taken together. Google's
search system is based on this principle.

One of the {most} fundamental aspect of network algorithms is the
relationship between the time factor and the number of examined nodes. The
'travel time' of a search, for instance, that is the time it takes to
connect one node to another, is dependent on the number of elements in the
network, and is always set between a minimum and a maximum value. This
value of which can vary widely according to the type of route algorithm

In the network of web pages, every page is a node in the graph, and every
link is an arc. Taking the time factor as starting point, it clearly
appears that {search} returns generated by Google as answer to a question
(technically the returns of a query on its data bases) can impossibly be
based on an examination of the 'entire' Internet.

Google's spider is constantly busy copying the Internet into its data
base: not an easy task. However, is is not believable that the search
engine browses through its complete database every time in order to
retrieve the most important results. The key factor enabling Google to
return almost immediate results is dependent on hidden sequences narrowing
the general selection {of data}, meaning concretely, it is dependent on
the application of specific filtering devices. Starting from the query
itself, the filter makes sure the final result is promptly arrived at by
way of a successive side-steps and choices which have been developed with
the explicit aim to limit the range of the blocks {of data} that are
likely to be analysed {for that particular query}.

This is how Google can return results for queries in an astonishingly
short time. However, this makes the search {process} just as opaque as it
is fast, or with other words, the search shows no coherence with the body
of data extent on that indexed part of network. Results for a search will
be returned very quickly not only thanks to the {massive} computing power
available, but also, and foremost, because filters are there to limit the
extent of the data pool that will be searched.

The filter's difficult task is to make a drastic selection of the network
nodes {to be looked at} in order to exclude some and valorise others and
their associated linkages. This method aims to exclude or include whole
blocks of data amidst those that would generate results [French text not
really clear].

All this is made possible by the existence of pre-set, ready to use search
databases, returning standard answers to standard questions, but also
tweaking the returns through individual user's profiling. The user's
profile is made up from her/ his search history, language, geographic
position {(IP address)}, etc. If a user habitually conduct searches only
in French for instance, not the whole of Google's database will be
queried, but only the French language part, obviously saving a lot of time
in the process.

Given the humongous amount of data, it is simply unthinkable to use
transparent algorithms, meaning ones that will hit _all_ the network's
nodes. It is therefore unavoidable that some manipulations,
simplifications, and {deliberate} limitations in the number of possible
analyses are taking place, and this both for technical reasons of
computability in the strict sense, as well as for evident economic
reasons. And one can, without falling into unjustified vilification,
easily conceive that within a system already biased by approximations
caused by filtering, further filters could be added to add, or maneuver
into a better position of visibility, those returns that go with paying
advertisements, or which carry some doctrinal message [?].

However, seen from Google's point of view, it must be noted that filters
are not directly linked to an economic motive, since they are not meant to
sell something. They are linked to the habits of the user, and her/ his
personal interests. Google sells ads, not products (or if so, in a very
limited way only, like Google Minium hardware, or indexation systems for
companies {and organisations}). Google's prime consideration is therefore
to obtain data generating parameters which can be used to target
advertisement campaigns accurately. The personalisation of results
according to their recipients is made possible by the information Google
/furnishes and/ gathers in the most discreet way possible. E-mail, blogs,
'cloud computing' (or 'virtual hard disks') and other services function as
as many databases in a way that is much more suitable to profile users
than these could or would ever fathom.

Hence, the additional services Google offers over and above search are
very useful to the firm for experimenting new avenues {of business}, but
also and foremost because they play a key role as 'aggregators' of
personal information' about users.

A prime example is the electronic mail service GMail, a virtual hard disk
of sorts (2GB for the moment, and counting...), which [in its beta phase,
when the book was written -TR] is made available through a distribution
system based on PageRank[TM]. Put simply, each (user) node of the Google
network has a certain weight (allowed number of invitations to join) and
can use it to offer the service (via a link) to her/ his acquaintances.
This method enable control over the usage made of the service, and at the
same time the user discloses to Google key intelligence about her/ his
{own network of} friends and acquaintances.

In a second stage, this mechanism spreads out among invited individuals,
who may extend new invitations: this way, a graph of /human/ relationships
between the users will be created, representing an enormous strategic
value with respect to 'personalised' ad targeting.

If one considers all the information that can be gathered from e-mail
traffic (to whom, why, in which language, which formats, which key words,
which attachments, etc.) one can surmise the existence, in Google data
vaults, not only of a partial - but significant - double of the Internet,
but also, of a copy, equally partial, equally significant, of the
relationships, personal, professional, and affective, of the service's

In theory, filters merely serve to make the query process faster and more
conform to individual requests. They are even necessary, technically
speaking. Their usage, however, shows to which extent it is easy, for a
party actually in position of dominance as regards to search services, to
profit in a commercial sense of the data at its disposal, without much
consideration to the privacy of its users.

To resume, Google's database today is able, based on what it knows about
this or that user, to marshall with the help of a few key words a query in
a manner that varies according to the type of user. Far from being
'objective', search returns are actually {pre-set and} fine-tuned, and
using the search service enables Google to 'recognise' an individual
better and better, and to present her/ him with 'appropriate' results.

Use of each Google service goes with acceptance of {a whole set of} rules
and liability disclaimers by the users. Google, from its side, promises it
will not reveal personal information {to third parties}. Yet, it is easy
to presume that Google is able to exploit and commercialise users' data to
its own ends [French text: different ends]. And then we need not even to
consider the possibility (or rather: the probability) that all sorts of
intelligence and police services can access these informations for any
reason of 'national security' they may like to invoke. [The addition of
{more} search filters in order to further personalise results is the most
likely outcome. - unclear sentence in French text]

Google's cookies: stuff that leave traces....

Users profile are always based on a system of search and selection {*N3].
Two types of profiling are prevalent on the Internet, one is
straightforward, the other is by implication. Explicit profiling
necessitates registration whereby the user fills in a form, disclosing
personal details. The information send are archived in a database, to be
analysed with the help of a string of parameters partitioning registered
users into homogeneous groups (according to age, sex, occupation,
interests, etc.). Conversely, implicit profiling is arrived at by tracking
anonymous users during their visit to a site, through their IP address, or
through cookies. Cookies are little text files used by web sites to leave
some data behind in the user's computer. Every time the user comes back to
that site, the browser resend the data stocked in the cookie. The aim is
to automatise login authentication, to refresh /eventual/ running
operations, but mostly to 'reunite' the user with data from her/ his
previous visits.

Most Internet sites offering online services use cookies, and Google is
surely no exception [*N4]. The combination of cookies and filters on an
algorithm enable to track an individual's navigation, and hence to
accumulate information on her or his 'fingerprint'.

Let's take an example: Individual 'X' has a mobile phone number on her
name, and uses her mobile to call his family, friends and a few work
colleagues. After some time, she decides to do away with this phone and
take another one, not in her name, for the sake of her privacy. Now with
her new phone, she reconstruct her circle of acquaintances by contacting
family, friends, and colleagues at work. This sequence of 'social links'
/(family, friends, colleagues)/ is, within {the totality of} all the
world's phone calls, a unique one, which cannot be dissociated from the
individual in question. So it is not impossible to formalise such a
sequence as graph representing the nodes and the arcs of a network. The
values of which (the respective 'weight' of the links between different
nodes) could be determined by assigning 'affinity value' to 'proximity',
starting from the departure point of the analysis, in our case individual

Getting rid of cookies is an excellent privacy protection practice, but
[as?] one can easily extrapolate to search engines from the preceding
example. With cookies, just by looking at some specific search themes, it
becomes possible to identity groups, or even single individuals, according
to the 'fingerprints' they leave behind on the Web.

The unique trace which identifies our movements, our social contacts, our
telephone calls, is just as unique as our preferences, our tastes, our
idiosyncrasies, and our passions, which make each one of us different.
Passions would be in this case, the sites we visit, and {for Google} more
specifically the searches we are launching during our navigation. This
mass of information we are giving to a search engine makes the compiling
of our 'fingerprint' possible [*N5]

Like all cookies, the ones on the Internet have a 'sell-by' date. Internet
sites sending cookies to our browsers must give a date by which the
browser is allowed to delete the information contained in the cookie. A
smart use of cookies is not something often encountered: the fact that
Google was able to exploit to its own advantage a technical trick to POSIX
developers is interesting in this regard. (POSIX is the international
standard that permits interoperability between Unix and Unix-like OSs,
such as GNU/Linux). The expiration date for Google cookies is somewhere in
2038 - more or less the maximum possible. This means that for all
practical purposes, the browser in our respective OSs will 'never' delete
these cookies and the information contained therein [*N5].

Techno-masturbation: create! search! consume! ... your own contents!

It is next to impossible to follow the rapid evolutions Google is going
through on permanent basis. New services are launched in a
quasi-convulsive way, and it is difficult to understand which ones are
actually meant to have an impact on our lives, and which ones are likely
to be discarded in the next few months or even weeks. And anyway, it does
not make very much sense, in view of the fast rate of innovation and
information 'burn' on the Internet, to lose oneself in complicated
descriptions and exhaustive classifications which would inevitably contain
errors and omissions. The natural dynamics and fluidity of the networks
should dissuade from any attempt at attaining complete knowledge - in case
someone would be attracted to do so. One would get lost even before having
started on such a ill-advised endeavour.

This being said, one can, albeit from a subjective and fragmentary
viewpoint, try to formulate a general critique of Google, without going
into technical details and even less engaging in uncertain forecasts. As
far as personalisation is concerned, the increasing prevalence of the
concept of 'prosumer'[*N7] is probably the most worth considering.

Google is well-known for its propensity to launch 'beta' versions of its
services, when these are still provisional and under testing mode. This is
a dynamic, as we have seen in the previous chapter, which is directly
inspired from {the modus operandi of} the Free Software development
communities. Users contribute significantly to the development of the new
service through their feedback, impressions, and suggestions regarding its
usability; they are at the same time users and producers of the service -
'prosumers' being the name given to this hybrid breed.

In its aim to become the position of global mediator of web contents,
Google sells technology and search results (through advertisments) to
users, who on the other hand, tend {more and more} to be the creators of
net content, and the consumers of the same through the services of Google,
which are more and more personalised.

Two examples, which would seem, de prime abord, to have very little to do
with each others, should make the point regarding this closed content
production and consumption cycle: Google Web Toolkit (GWT)[*N8] and the
alliance between Google's Gtalk and Nokia [*N9].

(nice cliff hanger, huh?)
(to be continued)

Translated by Patrice Riemens
This translation project is supported and facilitated by:

The Center for Internet and Society, Bangalore
The Tactical Technology Collective, Bangalore Office
Visthar, Dodda Gubbi post, Kothanyur-Bangalore

#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info:
#  archive: contact: