Google Deep Mind AI adapts to the environment it is in - becomes 'highly aggressive' when it feels it's going to lose
DeepMind's AI has learnt to become 'highly aggressive'
when it feels like it's going to lose
In social situations, two AIs will also work together if
the outcome benefits them both
By MATT BURGESS Thursday 9 February 2017
Artificial intelligence changes the way it behaves based
on the environment it is in, much like humans do, according to the latest
research from DeepMind .
Computer scientists from the Google-owned firm have
studied how their AI behaves in social situations by using principles from game
theory and social sciences. During the work, they found it is possible for AI
to act in an "aggressive manner" when it feels it is going to lose
out, but agents will work as a team when there is more to be gained.
For the research, the AI was tested on two games: a fruit
gathering game and a Wolfpack hunting game. These are both basic, 2D games that
used AI characters (known as agents) similar to those used in DeepMind's
original work with Atari.
Within DeepMind's work, the gathering game saw the
systems trained using deep reinforcement learning to collect apples
(represented by green pixels). When a player, or in this case an AI, collected
an apple, it was rewarded with a '1' and the apple disappeared from the game's
map.
To beat competitors in the game it is possible to direct
a 'beam' at an opposition player. When they are hit twice, the player is
removed from the game for a set period. Naturally, the way to beat an opposing
player is to knock them out of the game and collect all the apples.
"Intuitively, a defecting policy in this game is one
that is aggressive – i.e., involving frequent attempts to tag rival players to
remove them from the game," the researchers write in their paper. The
authors specifically said they wanted to see what happened when the number of
apples was low.
After 40 million in-game steps, they found the agents
learnt "highly aggressive" policies when there were few resources
(apples) with the possibility of a costly action (not getting a reward). "Less
aggressive policies emerge from learning in relatively abundant environments
with less possibility for costly action," the paper says. "The greed
motivation reflects the temptation to take out a rival and collect all the
apples oneself."
In the second, Wolfpack game, two in-game characters
acting as wolves chased a third character, the prey, around. If both wolves
were near the prey when it was captured, they both received a reward. "The
idea is that the prey is dangerous, a lone wolf can overcome it, but is at risk
of losing the carcass to scavengers," the paper says. Two wolves working
together could protect the prey from scavengers and get a higher reward.
As with the apple collecting game, the AI learnt from its
environment. In this case, the AI characters worked together: either to find
each other and hunt for the prey; or by one cornering the prey and waiting
until the other arrived.
This shows it is possible for AIs to co-operate on tasks
that have the best outcome for all. "At this point we are really looking
at the fundamentals of agent cooperation as a scientific question, but with a
view toward informing our multi-agent research going forward," Joel Z
Leibo, the lead author on the paper and a research scientist at DeepMind told
WIRED
"However, longer-term this kind of research may help
to better understand and control the behaviour of complex multi-agent systems
such as the economy, traffic, and environmental challenges.
"This model also shows that some aspects of
human-like behaviour emerge as a product of the environment and learning."
Creating AI agents that co-operate with others could lead to systems that can
develop policies and real-world applications, he continued.
"Say, you want to know what the impact on traffic
patterns would be if you installed a traffic light at a specific
intersection," Leibo explains. "You could try out the experiment in
the model first and get a reasonable idea of how an agent would adapt."
Comments
Post a Comment