Computers Conquer Limit Texas Hold'em Poker for First Time
Computers Conquer Texas Hold'em Poker for First Time
By Jeremy Hsu
Posted 8 Jan 2015 | 18:59 GMT
All your poker chips may soon belong to the computers. A
new algorithm has taken the first big step in figuring out poker, the globally
popular card game played by more than 150 million people, by solving a
two-player version known as heads-up limit Texas hold’em.
It’s been almost two decades since the IBM computer
called Deep Blue beat the world chess champion Garry Kasparov. Since that
stunning moment in 1997, computer algorithms have solved games such as Connect
Four and checkers by analyzing all the possible plays and figuring out the
perfect strategy for each move starting from the beginning of each game.
Future programs might even master the ancient game of Go.
But computers face a different challenge in consistently winning at poker,
because each player has two hidden cards that represent information hidden from
the opponent. By solving an “imperfect-information game” such as poker,
computer algorithms could also potentially handle real-world scenarios with
similar levels of uncertainty.
“The solutions for imperfect-information games require
computers to handle the additional complication of not knowing exactly what the
game’s state is, such as not knowing an opponent’s hand,” says Neil Burch, a
Ph.D. student in computer science at the University of Alberta in Canada. “Such
techniques require more computer memory and computing power.”
Burch and his coauthors laid out their algorithm’s
solution in a paper published in the 8 Jan 2014 issue of the journal Science.
In computer science speak, it’s only a “weak” solution to a specific version of
Texas hold’em that has just two players, fixed bet sizes and a fixed number of
bet raises. But it’s still good enough so that it’s virtually impossible—from a
statistical significance standpoint—to distinguish the algorithm’s solution
from perfect play during a lifetime of poker games. The paper defines a
lifetime of games as 200 games of poker an hour for 12 hours a day over the
course of 70 years.
“While strong strategies have been computed for larger
imperfect-information games as well, this is, to my knowledge, the largest
imperfect-information game essentially solved to date, and the first one
competitively played by humans that has now been essentially solved,” says
Tuomas Sandholm, a computer scientist at Carnegie Mellon University in
Pittsburgh, in a Perspective article written for the journal Science.
The algorithm, named CFR+ by its creators, uses an
improved version of a technique called counterfactual regret minimization
(CFR). Past CFR algorithms have tried to solve poker by using several steps at
each decision-point: coming up with counterfactual values representing
different game outcomes; applying a regret minimization approach to figure out
the strategy leading to the best outcome; and averaging the latest strategy
with all past strategies.
But past CFR algorithms never actually tried to solve the
full game of heads-up limit Texas hold’em or any other poker game variation
because of the huge amount of memory required; roughly 262 terabytes of memory.
That’s about 268,288 times as much memory as the 1-gigabyte memory available to
an iPhone 6.
Instead, CFR algorithms solved simplified versions of
poker and used the resulting strategies to imperfectly play the full versions
of poker games. Such algorithms often competed against one another in an Annual
Computer Poker Competition that coincides with the main conference of the
Association for the Advancement of Artificial Intelligence.
The CFR+ algorithm improves on the past CFR algorithms in
several ways. One change is how it uses a slightly different regret
minimization technique to select the best strategy at each step. Another change
involves skipping the usual step of averaging the latest strategy with all
previous strategies; the algorithm just uses the most recent strategy.
“The algorithm goes from three steps to two steps,” Burch
explains. “We throw away the final step.”
CFR+ still works like the old CFR algorithms by gradually
developing better solutions through playing thousands and hundreds of thousands
of hands of poker. But it can develop a very good solution much faster than any
past CFR algorithm by being more efficient; basically the equivalent of taking
fewer, bigger steps toward the best solution.
That efficiency allowed Burch and his colleagues to try
solving the full game of heads-up limit Texas hold’em, rather than just a
simplified version. By applying compression, they reduced the memory
requirements to less than 11 terabytes for storing the counterfactual values
and just 6 terabytes for computing the main strategy. They also spread the
memory requirements across a cluster of 200 computation nodes and stored the
values on the local disks of each node. Each node consisted of 24 2.1-GHz AMD
cores, 32 GB of RAM, and a 1-TB local disk.
Calculating the new best computer solution for heads-up
limit Texas hold’em still took about 68.5 days with enough hardware to
theoretically fill several fridges worth of physical space. In reality, the
computer hardware was spread out on racks in a large university research
building containing 1,500 machines.
The near-perfect “Cepheus” strategy developed by the CFR+
algorithm for heads-up limit Texas hold’em revealed several trends and
strategies that might be of particular interest for poker players. For
instance, Cepheus confirmed the common poker wisdom that the dealer has a
significant winning advantage in heads-up limit Texas hold’em. (This
effectively doesn’t matter as much in actual play because players take turns
being the dealer.)
Cepheus also confirmed the common poker strategy of
raising the bet on the first action rather than just match the highest bet by
calling. The algorithm’s solution called on the first action just 0.06 percent
of the time overall.
But sometimes Cepheus took actions that ran counter to
conventional poker wisdom. The computer strategy almost never “capped” or made
the final allowed raise in the first round as dealer, even if it held the
strongest hand involving a pair of aces. The computer strategy also played a
broader range of hands as the non-dealer rather than simply fold and quit. And
it was much more likely to reraise when it held a low-rank pair of cards.
Still, Burch warned that human poker players should take
the Cepheus strategy with a grain of salt. After all, Cepheus honed its
strategy by playing the equivalent of a near-perfect opponent that made
practically no mistakes. Certain strategies that wouldn’t work against such a
powerful opponent could still prove very profitable for human poker players
when exploiting the mistakes of other human players.
The Canadian team hopes to use the CFR+ algorithm to
tackle more complex versions of poker with more players, or other similarly
complex games. A similar algorithm approach could even prove useful in game
theory applications for the real world. For instance, the algorithm could help
develop strategies for security officials to deploy certain resources—maybe a
bomb-sniffing dog or Coast Guard patrol boat—to certain areas at different
times of the day without getting predictable.
“Just like poker, you have to bluff to play it
perfectly,” Burch says. “If you don’t bluff, an opponent or attacker can take
advantage of that.”
Comments
Post a Comment