AI guides your daily life, but is it liberal or conservative?
AI guides your daily life, but is it liberal or
conservative?
Recommender systems influence our cultural, social and
political lives, but are they agents of diversity or conservative guardians?
Intelligent software guides your daily choices, but is it
biased?
Martin Robbins Friday 26 August 2016 06.24 EDT Last
modified on Wednesday 22 February 2017 12.45 EST
Imagine you’re a billionaire, with your own film studio.
You’re sitting there on your golden throne, eating peeled grapes off Channing
Tatum’s abs. Your assistant has just handed you the script for The Expendables
7 or yet another Spider-Man reboot. You yawn theatrically in his face. Surely,
you think yourself, in this data-driven age there has to be a better way.
Couldn’t we use machine learning to design the optimum new film? Something
guaranteed to be a box office hit?
So you toss the grapes and get to work on some code. You
write a fancy algorithm, and you feed it the scripts and the box office takings
of every film ever made. It crunches through all the data, learns the
characteristics of a hit script, and using that knowledge it spits out the
blueprint for the most commercially lucrative film of all time. It’s a Pixar
remake of the Wizard of Oz, using space dinosaurs. It’s bloody brilliant.
The film is a disaster. People walk out of test
screenings, complaining about domestic abuse and racism. The opening weekend
sees protests outside every cinema. The voiceover cast launches a class action
to have their names removed from the credits. You’re no longer a billionaire.
You’re broke. A van comes to take your Channing Tatum away.
Your script was almost certainly sexist. And racist. Lots
of other –ists, too. We know this because we know the data you used. Many of
Hollywood’s greatest hits come from an era when women were routinely spanked
on-screen and racial segregation was still widespread. Even in recent times,
Hollywood makes films that silence women, whitewash race, and generally look
like Donald Trump’s mid-life crisis.
It doesn’t matter whether you use basic statistics or
deep learning, Excel or Tensorflow, if you feed this data into your system and
try to draw general conclusions from it, the results will always be polluted by
these inherent biases. But then you wouldn’t actually do this. You wouldn’t try
to create the one best film of all time; instead you’d want to identify good
film ideas for specific markets.
To achieve this you might build something like a
recommender system. You’ve almost certainly used a recommender system today. A
recommender system suggests other products for you to buy on Amazon. A
recommender system tells you what else to watch on Netflix. A recommender system
tells you whom to follow on Twitter. When an online service suggests something
to you, from which news story to follow to what book to read, there’s a good
chance it’s using a recommender system. Like a pushy best friend they exert a
constant influence on your life, shaping your social, cultural and political
experiences.
So how do they work? Let’s run through a simple example.
Imagine listing every TV show ever made in the column ‘A’ of a massive
spreadsheet. In column ‘B’ you put a ‘1’ next to the show if you’ve seen it, or
a ‘0’ if you haven’t. In column ‘C’ I do the same, for the shows I have and
haven’t seen. We end up with something like this:
Now we can compare how similar our tastes are. To do
this, we just add up the differences between our columns - the smaller the
number, the closer matched we are. With a difference of just 1, you and I have
similar taste – good taste – in television, and since we’re so well matched
it’s a good bet that you’ll like House of Cards.
Now imagine extending the list of films to cover every
single film or TV show, and adding columns for millions of people. For any
given person, you can find the people who most closely match their viewing
habits, and use that information to recommend new shows for them to watch.
What if we want to get more abstract? Instead of listing
TV shows, we could list themes. The main themes of Game of Thrones for example
are ‘tits’, ‘dragons’ and ‘evil people.’ House of Cards has ‘evil people’ and
‘politics.’ If we take the shows above and break them down into themes, we can
rewrite our data as follows:
Now we can do something quite cool – instead of trying to
match up people individually, you can look for larger clusters of people with
common interests. Identifying these clusters allows you to spot new niche
audiences, which you can then target specific programming at. If our
demographic likes House of Cards and BBC Question Time, for example, we’d
probably watch more shows about evil people and politics.
Obviously I’m giving you a very simplified version here,
but this ability to harvest data and identify new markets is something that
traditional broadcasters in the era of Nielsen ratings simply couldn’t do.
Niche genres that wouldn’t have attracted funding in the past can now
demonstrate their worth, which is good for diversity. It’s what made shows like
Orange is the New Black possible, and it’s driving a whopping $5bn investment
in original programming by Netflix in 2016.
It can also be flawed. Imagine I’ve just joined a new
website or service and they have no information about anything I’ve bought or
viewed. What do they recommend then? With no comparison possible, one common
option is to default back to demographic data. I’m shown whatever they think a
white male 30-something would like to see.
That’s a potential source of bias. The shows you recommend
will in some way influence the choices I make, which will affect the data you
gather on me. That effect may wane over time, but even without it we carry
biases with us wherever we go: the result of our upbringing, and our place
within a biased culture.
There are ways to mitigate this of course. Serendipity
has become a hot topic in the field, as services discovered that consumers find
it boring to see the same suggestions over and over again and try to build in
an element of ‘discovery’. But just as newspapers and populist politicians tend
to pander to and ultimately reinforce the biases of their audiences, poorly
designed recommender systems could act to polarise people over time.
So are these algorithms agents of diversity, or
conservative guardians? The truth is we don’t really know. If a supermarket has
an aisle full of pink toys for girls it’s pretty obvious to any observer what’s
going on. An online retailer looks different to every customer, it changes each
time you touch it, and there’s no human being making decisions. Good luck
unpicking that one, activists.
Different services use different algorithms and different
data with different structural biases. Many of the algorithms are proprietary
and much of the data is private. They’re not simply black boxes; a black box at
least looks the same to each observer. They’re kaleidoscopic mirages,
simultaneously all colours and none of them, virtually inscrutable.
It’s likely that all kinds of weird and wonderful
policies are at work within them, that many popular websites target different
content to black people and white people, men and women, rich or poor. Some may
be driving radical social change. Who knows, but it would be smart to try to
find out.
Comments
Post a Comment