EC 2016 Computer Poker Tutorial, part 2: Equilibrium computation
Note: This page is mostly for my part 2. Here is a link to the tutorial's abstract, and to Sam Ganzfried's slides for part 1.
Here are my slides for part 2.
Suggested starting points for anybody new to CFR/Poker (see reference below):
- Model AI Assignments, Intro to CFR Tutorial: Neller and Lanctot '13 (intended for undergrads)
- Sandholm '10 AI Magazine article
- Rubin & Watson '11 survey
- Michael Johanson's PhD Thesis
References
Many of these (and more) are available from UAlberta's Computer Poker Research site.
- Abou Risk, N. & Szafron, D. (2010). Using counterfactual regret minimization to create competitive multiplayer
poker agents. In AAMAS, pages 159-166.
- Bosansky, B., Kiekintveld, C., Lisy, V., and Pechoucek, M. (2014). An Exact Double-Oracle Algorithm for
Zero-Sum Extensive-Form Games with Imperfect Information. In
Journal of Artificial Intelligence Research 51, pp. 829-866.
- Bowling, M., Burch, N., Johanson M., and
Tammelin, O.. (2015). Heads-up limit hold'em poker is solved.
Science, 347(6218):145-149. (Solving HULHE paper.)
- Brown, N. and Sandholm, T. (2014). Regret Transfer and Parameter Optimization. In
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
- Brown, N. and Sandholm, T. (2015). Simultaneous Abstraction and Equilibrium Finding in Games. In
Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
- Brown, N. and Sandholm, T. (2015). Regret-Based Pruning in Extensive-Form Games. In
Neural Information Processing Systems (NIPS).
- Brown, N. and Sandholm, T. (2016). Strategy-Based Warm Starting for Regret Minimization in Games. In
Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
- Burch, N., Johanson, M. , Bowling, M.. Solving
imperfect information games using decomposition. (2014). In
28th AAAI Conference on Artificial Intelligence. (CFR-D.)
- Chen, K. and Bowling, M. (2012). Tractable Objectives for Robust Policy Optimization.
In Advances in Neural Information Processing Systems 25 (NIPS), pp. 2078-2086.
- Gatti, N., Panozzo, F., and Restelli, M. (2013). Efficient
evolutionary dynamics with extensive-form games. In
27th AAAI Conference on Arti cial Intelligence (AAAI).
- Gilpin, A., Hoda, S., Peña, J., and Sandholm, T.. Gradient-based algorithms for finding Nash equilibria
in extensive form games. In Workshop on Internet and Network Economics (WINE-07), 2007. (EGT conference paper.)
- Gibson, R., and Szafron, D. (2011). On Strategy Stitching in Large Extensive Form Multiplayer Games.
In Advances in Neural Information Processing Systems 24.
- Gibson, R., Lanctot, M., Burch, N., Szafron, D., and Bowling, M. (2012).
Generalized sampling and variance in counterfactual regret minimization.
In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12).
- Gibson, R. Burch, N., Lanctot, M. and Szafron, D. (2012). Efficient Monte
Carlo counterfactual regret minimization in games with many player actions. In
Advances in Neural Information Processing Systems 25.
- Gibson, R. (2014). Regret Minimization in Non-Zero-Sum Games with Applications to
Building Champion Multiplayer Computer Poker Agents. arxiv.org/abs/1305.0034
(Multiplayer and dominated actions paper.)
- Gibson, R. (2014). Regret Minimization in Games and the
Development of Champion Multiplayer Computer Poker-Playing Agents.
PhD Thesis, University of Alberta.
- Hart, S., & Mas-Colell, A. (2000). A simple adaptive procedure leading to correlated equilibrium.
Econometrica, 68(5), 1127-1150. (Regret matching.)
- Heinrich, J. and Silver, D. (2015). Smooth UCT search in computer poker. In IJCAI.
- Heinrich, J., Lanctot, M., and Silver, D. (2015). Fictitious Self-Play in Extensive-Form Games.
In Proceedings of International Conference on Machine Learning.
- Heinrich, J. and Silver, D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games.
arxiv.org/abs/1603.01121
- Hoda, S., Gilpin, A., Pena, J., & Sandholm, T. (2010). Smoothing techniques for computing Nash
equilibria of sequential games. Mathematics of Operations Research, 35(2), 494-512. (EGT journal paper.)
- Jackson, E. (2012). Slumbot: An implementation of counterfactual regret minimization on
commodity hardware. In Proceedings of AAAI 2012 Poker Symposium, 2012.
- Jackson, E. (2014). A Time and Space Efficient Algorithm for Approximately
Solving Large Imperfect Information Games. In Proceedings of AAAI 2014 Computer Poker
and Imperfect Information Workshop.
-
Johanson, M., Bard, N., Burch, N., and Bowling, M. (2012).
Finding Optimal Abstract Strategies in Extensive-Form Games.
In Proceedings of Twenty-Sixth AAAI Conference. (CFR-BR paper.)
- Johanson, M., Bard, N., Lanctot, M., Gibson, R., and
Bowling, M.. (2012). Efficient Nash equilibrium approximation
through Monte Carlo counterfactual regret
minimization. In Proceedings of the Eleventh
International Conference on Autonomous Agents and
Multi-Agent Systems (AAMAS).
- Johanson, M., & Bowling, M. (2009). Data biased robust counter strategies. In Proceedings of
the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp.
264-271.
- Johanson, M., Zinkevich, M., & Bowling, M. (2008). Computing robust counter-strategies. In
Advances in Neural Information Processing Systems 20 NIPS. (Restricted Nash responses.)
- Johanson, M.B. (2016). Robust Strategies and Counter-Strategies: From Superhuman to Optimal Play.
PhD Thesis, Department of Computing Science, University of Alberta.
- Kroer, C., and Sandholm, T. (2014.) Extensive-form game
imperfect-recall abstractions with bounds. arXiv preprint
arXiv:1409.3302.
- Kroer, C., & Sandholm, S.. Imperfect-Recall Abstractions with Bounds in Games. (2016).
ACM conference on Economics and Computation (EC).
- Lanctot, M., Waugh, K., Zinkevich, M., & Bowling, M. (2009). Monte Carlo sampling for regret
minimization in extensive games. In Advances in Neural Information Processing Systems 22
(NIPS), pp. 1078-1086. (MCCFR paper.)
- Lanctot, M., Gibson, R., Burch, N. and Bowling, M. (2012). No-regret learning in extensiveform
games with imperfect recall. In Proceedings of the Twenty-Ninth International
Conference on Machine Learning (ICML 2012).
- Lanctot, M. (2014).
Further Developments of Extensive-Form Replicator Dynamics using the Sequence-Form Representation.
In AAMAS.
- Lisy, V., Lanctot, M. and Bowling, M. (2015). Online monte carlo counterfactual regret minimization
for search in imperfect information games. In Proceedings of the Fourteenth International Conference
on Autonomous Agents and Multi-Agent Systems (AAMAS).
- Lisy, V., Davis, T., and Bowling. M. (2016). Counterfactual Regret Minimization in Sequential
Security Games. In AAAI.
- Neller, T.W. and Hnath, S. (2011). Approximating optimal Dudo play with fixed-strategy
iteration counterfactual regret minimization. In Computers and Games, 2011.
- Neller, T.W. and Lanctot, M. (2013). An Introduction to Counterfactual Regret Minimization.
In Proceedings of Model {AI} Assignments,
The Fourth Symposium on Educational Advances in Artificial Intelligence ({EAAI}-2013).
link to site.
- Osborne, M.J. and Rubinstein, A. (1994). A course in game theory. MIT Press. ISBN 0-262-65040-1.
- Ponsen, M., de Jong, S., and Lanctot, M. (2011). Computing approximate Nash equilibria and
robust best-responses using sampling. Journal of Artificial Intelligence Research,
42:575-605.
- Rubin, J. and Watson, I. (2011). Computer poker: A review.
Artificial Intelligence, 175(5-6):958-987.
- Sandholm, T. (2010). The state of solving large incomplete information games, and application
to poker. AI Magazine 13-32. Special issue on Algorithmic Game Theory.
- von Stengel, B. (2007). Equilibrium computation for two-player games in strategic
and extensive form. In Algorithmic Game Theory, chapter 4.
Cambridge University Press.
- Szafron, D., Gibson, R., Sturtevant, N. (2013).
A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker.
In Proceedings of the Twelfth International Conference on
Autonomous Agents and Multiagent Systems (AAMAS).
- Tammelin, O., Burch, N., Johanson, M. and Bowling, M. (2015). Solving Heads-up Limit Texas Hold'em.
In Proceedings of IJCAI. (CFR+ paper).
- Waugh, K., Zinkevich, M., Johanson, M., Kan, M., Schnizlein, D., & Bowling, M. (2009). A practical
use of imperfect recall. In Proceedings of the 8th Symposium on Abstraction, Reformulation
and Approximation (SARA).
- Waugh, K., Morrill, D., Bagnell, J.A., and Bowling, M. (2015). Solving Games with Functional Regret Estimation.
Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, and Michael Bowling.
In AAAI Conference on Artificial Intelligence.
- Waugh, K. and Bagnell, J.A. (2015). A Unified View of Large-scale Zero-sum Equilibrium Computation.
Kevin Waugh and J. Andrew Bagnell. In AAAI Workshop on Computer Poker and Imperfect Information.
- Zinkevich, M., Bowling, M., and Burch, N. (2007). A New Algorithm for Generating Equilibria in Massive Zero-Sum Games.
In Proceedings of National Conference on Artificial Intelligence (AAAI), pp. 788-793.
- Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2007). Regret minimization in games
with incomplete information. In Advances in Neural Information Processing Systems 20
NIPS. (CFR paper.)