Google Deep Mind AI adapts to the environment it is in - becomes 'highly aggressive' when it feels it's going to lose

DeepMind's AI has learnt to become 'highly aggressive' when it feels like it's going to lose

In social situations, two AIs will also work together if the outcome benefits them both

By MATT BURGESS Thursday 9 February 2017

Artificial intelligence changes the way it behaves based on the environment it is in, much like humans do, according to the latest research from DeepMind .

Computer scientists from the Google-owned firm have studied how their AI behaves in social situations by using principles from game theory and social sciences. During the work, they found it is possible for AI to act in an "aggressive manner" when it feels it is going to lose out, but agents will work as a team when there is more to be gained.

For the research, the AI was tested on two games: a fruit gathering game and a Wolfpack hunting game. These are both basic, 2D games that used AI characters (known as agents) similar to those used in DeepMind's original work with Atari.

Within DeepMind's work, the gathering game saw the systems trained using deep reinforcement learning to collect apples (represented by green pixels). When a player, or in this case an AI, collected an apple, it was rewarded with a '1' and the apple disappeared from the game's map.

To beat competitors in the game it is possible to direct a 'beam' at an opposition player. When they are hit twice, the player is removed from the game for a set period. Naturally, the way to beat an opposing player is to knock them out of the game and collect all the apples.

"Intuitively, a defecting policy in this game is one that is aggressive – i.e., involving frequent attempts to tag rival players to remove them from the game," the researchers write in their paper. The authors specifically said they wanted to see what happened when the number of apples was low.

After 40 million in-game steps, they found the agents learnt "highly aggressive" policies when there were few resources (apples) with the possibility of a costly action (not getting a reward). "Less aggressive policies emerge from learning in relatively abundant environments with less possibility for costly action," the paper says. "The greed motivation reflects the temptation to take out a rival and collect all the apples oneself."

In the second, Wolfpack game, two in-game characters acting as wolves chased a third character, the prey, around. If both wolves were near the prey when it was captured, they both received a reward. "The idea is that the prey is dangerous, a lone wolf can overcome it, but is at risk of losing the carcass to scavengers," the paper says. Two wolves working together could protect the prey from scavengers and get a higher reward.

As with the apple collecting game, the AI learnt from its environment. In this case, the AI characters worked together: either to find each other and hunt for the prey; or by one cornering the prey and waiting until the other arrived.

This shows it is possible for AIs to co-operate on tasks that have the best outcome for all. "At this point we are really looking at the fundamentals of agent cooperation as a scientific question, but with a view toward informing our multi-agent research going forward," Joel Z Leibo, the lead author on the paper and a research scientist at DeepMind told WIRED

"However, longer-term this kind of research may help to better understand and control the behaviour of complex multi-agent systems such as the economy, traffic, and environmental challenges.

"This model also shows that some aspects of human-like behaviour emerge as a product of the environment and learning." Creating AI agents that co-operate with others could lead to systems that can develop policies and real-world applications, he continued.

"Say, you want to know what the impact on traffic patterns would be if you installed a traffic light at a specific intersection," Leibo explains. "You could try out the experiment in the model first and get a reasonable idea of how an agent would adapt."



Comments

Popular posts from this blog

Report: World’s 1st remote brain surgery via 5G network performed in China

Visualizing The Power Of The World's Supercomputers

BMW traps alleged thief by remotely locking him in car