Computers Conquer Limit Texas Hold'em Poker for First Time
Computers Conquer Texas Hold'em Poker for First Time
By Jeremy Hsu
Posted 8 Jan 2015 | 18:59 GMT
All your poker chips may soon belong to the computers. A new algorithm has taken the first big step in figuring out poker, the globally popular card game played by more than 150 million people, by solving a two-player version known as heads-up limit Texas hold’em.
It’s been almost two decades since the IBM computer called Deep Blue beat the world chess champion Garry Kasparov. Since that stunning moment in 1997, computer algorithms have solved games such as Connect Four and checkers by analyzing all the possible plays and figuring out the perfect strategy for each move starting from the beginning of each game.
Future programs might even master the ancient game of Go. But computers face a different challenge in consistently winning at poker, because each player has two hidden cards that represent information hidden from the opponent. By solving an “imperfect-information game” such as poker, computer algorithms could also potentially handle real-world scenarios with similar levels of uncertainty.
“The solutions for imperfect-information games require computers to handle the additional complication of not knowing exactly what the game’s state is, such as not knowing an opponent’s hand,” says Neil Burch, a Ph.D. student in computer science at the University of Alberta in Canada. “Such techniques require more computer memory and computing power.”
Burch and his coauthors laid out their algorithm’s solution in a paper published in the 8 Jan 2014 issue of the journal Science. In computer science speak, it’s only a “weak” solution to a specific version of Texas hold’em that has just two players, fixed bet sizes and a fixed number of bet raises. But it’s still good enough so that it’s virtually impossible—from a statistical significance standpoint—to distinguish the algorithm’s solution from perfect play during a lifetime of poker games. The paper defines a lifetime of games as 200 games of poker an hour for 12 hours a day over the course of 70 years.
“While strong strategies have been computed for larger imperfect-information games as well, this is, to my knowledge, the largest imperfect-information game essentially solved to date, and the first one competitively played by humans that has now been essentially solved,” says Tuomas Sandholm, a computer scientist at Carnegie Mellon University in Pittsburgh, in a Perspective article written for the journal Science.
The algorithm, named CFR+ by its creators, uses an improved version of a technique called counterfactual regret minimization (CFR). Past CFR algorithms have tried to solve poker by using several steps at each decision-point: coming up with counterfactual values representing different game outcomes; applying a regret minimization approach to figure out the strategy leading to the best outcome; and averaging the latest strategy with all past strategies.
But past CFR algorithms never actually tried to solve the full game of heads-up limit Texas hold’em or any other poker game variation because of the huge amount of memory required; roughly 262 terabytes of memory. That’s about 268,288 times as much memory as the 1-gigabyte memory available to an iPhone 6.
Instead, CFR algorithms solved simplified versions of poker and used the resulting strategies to imperfectly play the full versions of poker games. Such algorithms often competed against one another in an Annual Computer Poker Competition that coincides with the main conference of the Association for the Advancement of Artificial Intelligence.
The CFR+ algorithm improves on the past CFR algorithms in several ways. One change is how it uses a slightly different regret minimization technique to select the best strategy at each step. Another change involves skipping the usual step of averaging the latest strategy with all previous strategies; the algorithm just uses the most recent strategy.
“The algorithm goes from three steps to two steps,” Burch explains. “We throw away the final step.”
CFR+ still works like the old CFR algorithms by gradually developing better solutions through playing thousands and hundreds of thousands of hands of poker. But it can develop a very good solution much faster than any past CFR algorithm by being more efficient; basically the equivalent of taking fewer, bigger steps toward the best solution.
That efficiency allowed Burch and his colleagues to try solving the full game of heads-up limit Texas hold’em, rather than just a simplified version. By applying compression, they reduced the memory requirements to less than 11 terabytes for storing the counterfactual values and just 6 terabytes for computing the main strategy. They also spread the memory requirements across a cluster of 200 computation nodes and stored the values on the local disks of each node. Each node consisted of 24 2.1-GHz AMD cores, 32 GB of RAM, and a 1-TB local disk.
Calculating the new best computer solution for heads-up limit Texas hold’em still took about 68.5 days with enough hardware to theoretically fill several fridges worth of physical space. In reality, the computer hardware was spread out on racks in a large university research building containing 1,500 machines.
The near-perfect “Cepheus” strategy developed by the CFR+ algorithm for heads-up limit Texas hold’em revealed several trends and strategies that might be of particular interest for poker players. For instance, Cepheus confirmed the common poker wisdom that the dealer has a significant winning advantage in heads-up limit Texas hold’em. (This effectively doesn’t matter as much in actual play because players take turns being the dealer.)
Cepheus also confirmed the common poker strategy of raising the bet on the first action rather than just match the highest bet by calling. The algorithm’s solution called on the first action just 0.06 percent of the time overall.
But sometimes Cepheus took actions that ran counter to conventional poker wisdom. The computer strategy almost never “capped” or made the final allowed raise in the first round as dealer, even if it held the strongest hand involving a pair of aces. The computer strategy also played a broader range of hands as the non-dealer rather than simply fold and quit. And it was much more likely to reraise when it held a low-rank pair of cards.
Still, Burch warned that human poker players should take the Cepheus strategy with a grain of salt. After all, Cepheus honed its strategy by playing the equivalent of a near-perfect opponent that made practically no mistakes. Certain strategies that wouldn’t work against such a powerful opponent could still prove very profitable for human poker players when exploiting the mistakes of other human players.
The Canadian team hopes to use the CFR+ algorithm to tackle more complex versions of poker with more players, or other similarly complex games. A similar algorithm approach could even prove useful in game theory applications for the real world. For instance, the algorithm could help develop strategies for security officials to deploy certain resources—maybe a bomb-sniffing dog or Coast Guard patrol boat—to certain areas at different times of the day without getting predictable.
“Just like poker, you have to bluff to play it perfectly,” Burch says. “If you don’t bluff, an opponent or attacker can take advantage of that.”