DeepMind’s new system could take us a step closer to general AI

One of the key challenges of deep reinforcement learning models—the kind of AI systems that have mastered Go, StarCraft 2, and other games—is their inability to generalize their capabilities beyond their training domain. This limit makes it very hard to apply these systems to real-world settings, where situations are much more complicated and unpredictable than the environments where AI models are trained.

But scientists at AI research lab DeepMind claim to have taken the “first steps to train an agent capable of playing many different games without needing human interaction data,” according to a blog post about their new “open-ended learning” initiative. Their new project includes a 3D environment with realistic dynamics and deep reinforcement learning agents that can learn to solve a wide range of challenges.

The new system, according to DeepMind’s AI researchers, is an “important step toward creating more general agents with the flexibility to adapt rapidly within constantly changing environments.”

The paper’s findings show some impressive advances in applying reinforcement learning to complicated problems. But they are also a reminder of how far current systems are from achieving the kind of general intelligence capabilities that the AI community has been coveting for decades.

The brittleness of deep reinforcement learning

The key advantage of reinforcement learning is its ability to develop behavior by taking actions and getting feedback, similar to the way humans and animals learn by interacting with their environment. Some scientists describe reinforcement learning as “the first computational theory of intelligence.”

The combination of reinforcement learning and deep neural networks, known as deep reinforcement learning, has been at the heart of many advances in AI, including DeepMind’s famous AlphaGo and AlphaStar models. In both cases, the AI systems were able to outmatch human world champions at their respective games.

But reinforcement learning systems are also notoriously renowned for their lack of flexibility. For example, a reinforcement learning model that can play StarCraft 2 at an expert level won’t be able to play a game with similar mechanics (e.g., Warcraft 3) at any level of competency. Even slight changes to the original game will considerably degrade the AI model’s performance.

“These agents are often constrained to play only the games they were trained for – whilst the exact instantiation of the game may vary (e.g. the layout, initial conditions, opponents) the goals the agents must satisfy remain the same between training and testing. Deviation from this can lead to catastrophic failure of the agent,” DeepMind’s researchers write in a paper that provides the full details on their open-ended learning.

Humans, on the other hand, are very good at transferring knowledge across domains.

The XLand environment


The goal of DeepMind’s new project was to create “an artificial agent whose behaviour generalises beyond the set of games it was trained on.”

To this end, the team created XLand, an engine that can generate 3D environments composed of static topology and moveable objects. The game engine simulates rigid-body physics and allows players to use the objects in various ways (e.g., create ramps, block paths, etc.).

XLand is a rich environment in which you can train agents on a virtually unlimited number of tasks. One of the main advantages of XLand is the capability to use programmatic rules to automatically generate a vast array of environments and challenges to train AI agents. This addresses one of the key challenges of machine learning systems, which often require vast amounts of manually curated training data.

According to the blog post, the researchers created “billions of tasks in XLand, across varied games, worlds, and players.” The games include very simple goals such as finding objects to more complex settings in which the AI agents much weigh the benefits and tradeoffs of different rewards. Some of the games include cooperation or competition elements involving multiple agents.

Deep reinforcement learning

DeepMind uses deep reinforcement learning and a few clever tricks to create AI agents that can thrive in the XLand environment.

The reinforcement learning model of each agent receives a first-person view of the world, the agent’s physical state (e.g., whether it holding an object), and its current goal. Each agent finetunes the parameters of its policy neural network to maximize its rewards on the current task. The neural network architecture contains an attention mechanism to ensure the agent can balance optimization for the subgoals required to accomplish the main goal.

Once the agent masters its current challenge, the computational task generator creates a new challenge for the agent. Each new task is generated according to the agent’s training history and in a way to help distribute the agent’s skills across a vast range of challenges.

DeepMind also used its vast computational resources (courtesy of its owner Alphabet Inc.) to train a large population of agents in parallel and transfer learned parameters across different agents to improve the general capabilities of the reinforcement learning systems.