nettime's_oceanic_feeling on Wed, 1 Feb 2017 15:44:13 +0100 (CET)

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

<nettime> Little Atoms > Robbins > the myth that British data scientists won the election

< >

The myth that British data scientists won the election for Trump

By Martin Robbins


Claims that social media data won the presidency are greatly exaggerated

A piece of data science mythology has been floating around the internet
for several weeks now. It surfaced most recently in Vice, and it tells
the story of a firm, Cambridge Analytica, that was supposedly
instrumental in Donald Trump's campaign.

The story goes that by analysing marketing and social media data during
the EU Referendum, data scientists were able to model the personalities
of voters in unprecedented detail, helping the Leave campaign to an
unlikely victory. Shortly after that, the firm was employed by the Trump
campaign where, we are told, it contributed to another unlikely victory.

For me this story is like candy floss -- it looks nice and substantial,
but when you stick it in your mouth there's not much there and you're
still hungry. The reporting leaves a ton of questions unanswered, and
when you try to look into them the results are less than satisfying.

Before we even get into methods, there's Ted Cruz. The article posted by
Vice doesn't just gloss over him; it tries to present his campaign as
some sort of victory for Cambridge Analytica's approach. This would be
the campaign where Ted Cruz was wiped out in a few short weeks by a
reality TV demagogue with no data science operation, and subjected to
months' long national humiliation.

They mention the Iowa primary on 1 February 2016, where the data science
outfit helped to identify target voters. Cruz did indeed win, but took
just 27 per cent of the vote in a four way race, only three points ahead
of Trump. The authors don't mention the next three states in February --
New Hampshire, South Carolina or Nevada -- where he was thrashed. Nor do
they mention Super Tuesday, on 1 March, where Trump thrashed him by
double-digit margins in six states.

   "The story of the Republican primaries is actually that Cambridge
   Analytica's flashy data science team got beaten by a dude with a
   thousand-dollar website"

Remember, Cambridge Analytica weren't hired by Trump until June that
year. Meanwhile, Trump's entire data science operation was, as the
article admits, "a marketing entrepreneur and failed start-up founder
who created a rudimentary website for Trump for $1,500."

So the story of the Republican primaries is actually that Cambridge
Analytica's flashy data science team got beaten by a dude with a
thousand-dollar website. To turn that into this breathtaking story of an
unbeatable voodoo-science outfit, powering Trump inexorably to victory,
is quite a stretch. Who else have they even worked for? Without a list
of clients it's very easy to cherry-pick the winners.

That's before we even get into the question of what they were actually
doing. The authors tell us that Cambridge Analytica were using some
combination of survey data, content scraped from social media, and
traditional marketing data. They're then doing some kind of sentiment
analysis to build a 'five traits' profile of millions of Americans (and
Britons, in the case of the Brexit campaign).

The five traits model is a real thing in psychology, sure, and it may
have some predictive power for things like mortality. It's important to
note that it's not undisputed or unflawed, however. It's also true that
you can take demographic data and correlate it with political leaning
with a reasonable amount of success -- we know that votes in the EU
referendum tended to correlate to education, for example. That's the big
grain of truth at the heart of the story.

   "Establishing personality traits from someone's Facebook feed is
   at best untested science"

But let's just think this through. Firstly, this is data that's
available to every other major data science campaign outfit. It's not
some secret buried hard drive they found. Second, this usage would go
far beyond anything that any published science can support. OCEAN
personality traits would normally be assessed through a questionnaire.
To establish them from someone's Facebook feed is at best an untested
piece of science. Is their feed representative? Is it even public or
available to you? Is your algorithm 100 per cent confident or (more
likely) only 75 per cent?

Then there's the challenge of bringing all this data together with any
degree of accuracy. How confidently can you match a given Facebook
account to a given record on the electoral roll? You might get lucky and
find some location information that can match you to the only person
with a specific name in a given town, and you might then be able to
match that to a credit report or other bit of data.

What you end up with is a series of steps that individually sound
plausible, but collectively turn to mush. Only 60 per cent of Britons
are on Facebook. Of those, many will only use it sporadically. Maybe
half have their profiles public. Maybe half of those yield enough
information to do an accurate OCEAN profile. Maybe 75 per cent of those
yield data unambiguous enough to match to a credit card report. You're
down to about 10 per cent of the population at this point -- of course
I'm eyeballing the numbers here for illustration, but they're not
unrealistic and you see where I'm going with this.

   "There's no evidence of this voodoo marketing in action"

A claim attributed to the company is, "We have profiled the personality
of every adult in the United States of America -- 220 million people."
Clearly only 20-30 million of those will have been profiled using social
media data. Even for that sample, there's no way of independently
verifying whatever unpublished techniques they're using. For the vast
bulk of those people the only data available will be the bog standard
marketing data used by any other direct marketing firm.

And that seems to have played out on the ground. There's no evidence of
this voodoo marketing in action, and we have plenty of anecdotes
pointing to less than stellar use of data by campaigns. Leonid
Bershidsky wrote an excellent piece in Bloomberg where he points out his
own experience:

"I would have believed in the efficiency of these shamanic manipulations
had I not been the recipient of numerous e-mail messages from the Trump
campaign that designated me as a 'Big League Supporter' and doggedly
asked for contributions and moral support, though I am disqualified as a
Russian citizen. Whatever contact lists Trump's data team had, it didn't
even match them against open social network data. Cambridge Analytica's
microtargeting was obviously failing in my case. Even though I'd given
my e-mail address to the campaigns of Bernie Sanders's and Clinton, too,
as I registered for their rallies, they didn't senselessly bombard me
with messages as Trump did."

Then we come to the twist in the tale. After the story first appeared
online, a spokesman for Cambridge Analytica came forward with the
following statement: "Cambridge Analytica does not use data from
Facebook. It has had no dealings with Dr. Michal Kosinski. It does not
subcontract research. It does not use the same methodology.
Psychographics was hardly used at all."

Now, you may or may not choose to believe that statement but if you
agree with my assessment so far, it seems likely to be the truth. Even
if some Big Voodoo Data Science Company did have all this data, it still
wouldn't tell campaigns how to use it effectively. Nor would it have had
much influence on the most effective strategies in play, like the news
media war that Donald Trump engaged in so successfully.

So if you step right back and look at all this, what do we see? We see a
data science firm with Steve Bannon on the board, bigly claims about its
powers, whose exact methodology is unclear to us. We see a candidate,
Donald Trump, who used the same successful strategy right the way
through his campaign whether he was employing Cambridge Analytica or a
random dude with HTML skills. We have another candidate, Ted Cruz, who
used the same firm and tanked. We have another candidate, Hillary
Clinton, who used something very similar to Cambridge Analytica and also

How exactly do you turn all that into the story of an unstoppable data
science behemoth?

#  distributed via <nettime>: no commercial use without permission
#  <nettime>  is a moderated mailing list for net criticism,
#  collaborative text filtering and cultural politics of the nets
#  more info:
#  archive: contact:
#  @nettime_bot tweets mail w/ sender unless #ANON is in Subject: