EC 2016 Computer Poker Tutorial, part 2: Equilibrium computation

Note: This page is mostly for my part 2. Here is a link to the tutorial's abstract, and to Sam Ganzfried's slides for part 1.

Here are my slides for part 2.

Suggested starting points for anybody new to CFR/Poker (see reference below):

Model AI Assignments, Intro to CFR Tutorial: Neller and Lanctot '13 (intended for undergrads)
Sandholm '10 AI Magazine article
Rubin & Watson '11 survey
Michael Johanson's PhD Thesis

References

Many of these (and more) are available from UAlberta's Computer Poker Research site.

Abou Risk, N. & Szafron, D. (2010). Using counterfactual regret minimization to create competitive multiplayer poker agents. In AAMAS, pages 159-166.
Bosansky, B., Kiekintveld, C., Lisy, V., and Pechoucek, M. (2014). An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information. In Journal of Artificial Intelligence Research 51, pp. 829-866.
Bowling, M., Burch, N., Johanson M., and Tammelin, O.. (2015). Heads-up limit hold'em poker is solved. Science, 347(6218):145-149. (Solving HULHE paper.)
Brown, N. and Sandholm, T. (2014). Regret Transfer and Parameter Optimization. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
Brown, N. and Sandholm, T. (2015). Simultaneous Abstraction and Equilibrium Finding in Games. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI).
Brown, N. and Sandholm, T. (2015). Regret-Based Pruning in Extensive-Form Games. In Neural Information Processing Systems (NIPS).
Brown, N. and Sandholm, T. (2016). Strategy-Based Warm Starting for Regret Minimization in Games. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI).
Burch, N., Johanson, M. , Bowling, M.. Solving imperfect information games using decomposition. (2014). In 28th AAAI Conference on Artificial Intelligence. (CFR-D.)
Chen, K. and Bowling, M. (2012). Tractable Objectives for Robust Policy Optimization. In Advances in Neural Information Processing Systems 25 (NIPS), pp. 2078-2086.
Gatti, N., Panozzo, F., and Restelli, M. (2013). Efficient evolutionary dynamics with extensive-form games. In 27th AAAI Conference on Arti cial Intelligence (AAAI).
Gilpin, A., Hoda, S., Peña, J., and Sandholm, T.. Gradient-based algorithms for finding Nash equilibria in extensive form games. In Workshop on Internet and Network Economics (WINE-07), 2007. (EGT conference paper.)
Gibson, R., and Szafron, D. (2011). On Strategy Stitching in Large Extensive Form Multiplayer Games. In Advances in Neural Information Processing Systems 24.
Gibson, R., Lanctot, M., Burch, N., Szafron, D., and Bowling, M. (2012). Generalized sampling and variance in counterfactual regret minimization. In Proceedings of the Twenty-Sixth Conference on Artificial Intelligence (AAAI-12).
Gibson, R. Burch, N., Lanctot, M. and Szafron, D. (2012). Efficient Monte Carlo counterfactual regret minimization in games with many player actions. In Advances in Neural Information Processing Systems 25.
Gibson, R. (2014). Regret Minimization in Non-Zero-Sum Games with Applications to Building Champion Multiplayer Computer Poker Agents. arxiv.org/abs/1305.0034 (Multiplayer and dominated actions paper.)
Gibson, R. (2014). Regret Minimization in Games and the Development of Champion Multiplayer Computer Poker-Playing Agents. PhD Thesis, University of Alberta.
Hart, S., & Mas-Colell, A. (2000). A simple adaptive procedure leading to correlated equilibrium. Econometrica, 68(5), 1127-1150. (Regret matching.)
Heinrich, J. and Silver, D. (2015). Smooth UCT search in computer poker. In IJCAI.
Heinrich, J., Lanctot, M., and Silver, D. (2015). Fictitious Self-Play in Extensive-Form Games. In Proceedings of International Conference on Machine Learning.
Heinrich, J. and Silver, D. (2016). Deep Reinforcement Learning from Self-Play in Imperfect-Information Games. arxiv.org/abs/1603.01121
Hoda, S., Gilpin, A., Pena, J., & Sandholm, T. (2010). Smoothing techniques for computing Nash equilibria of sequential games. Mathematics of Operations Research, 35(2), 494-512. (EGT journal paper.)
Jackson, E. (2012). Slumbot: An implementation of counterfactual regret minimization on commodity hardware. In Proceedings of AAAI 2012 Poker Symposium, 2012.
Jackson, E. (2014). A Time and Space Efficient Algorithm for Approximately Solving Large Imperfect Information Games. In Proceedings of AAAI 2014 Computer Poker and Imperfect Information Workshop.
Johanson, M., Bard, N., Burch, N., and Bowling, M. (2012). Finding Optimal Abstract Strategies in Extensive-Form Games. In Proceedings of Twenty-Sixth AAAI Conference. (CFR-BR paper.)
Johanson, M., Bard, N., Lanctot, M., Gibson, R., and Bowling, M.. (2012). Efficient Nash equilibrium approximation through Monte Carlo counterfactual regret minimization. In Proceedings of the Eleventh International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
Johanson, M., & Bowling, M. (2009). Data biased robust counter strategies. In Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 264-271.
Johanson, M., Zinkevich, M., & Bowling, M. (2008). Computing robust counter-strategies. In Advances in Neural Information Processing Systems 20 NIPS. (Restricted Nash responses.)
Johanson, M.B. (2016). Robust Strategies and Counter-Strategies: From Superhuman to Optimal Play. PhD Thesis, Department of Computing Science, University of Alberta.
Kroer, C., and Sandholm, T. (2014.) Extensive-form game imperfect-recall abstractions with bounds. arXiv preprint arXiv:1409.3302.
Kroer, C., & Sandholm, S.. Imperfect-Recall Abstractions with Bounds in Games. (2016). ACM conference on Economics and Computation (EC).
Lanctot, M., Waugh, K., Zinkevich, M., & Bowling, M. (2009). Monte Carlo sampling for regret minimization in extensive games. In Advances in Neural Information Processing Systems 22 (NIPS), pp. 1078-1086. (MCCFR paper.)
Lanctot, M., Gibson, R., Burch, N. and Bowling, M. (2012). No-regret learning in extensiveform games with imperfect recall. In Proceedings of the Twenty-Ninth International Conference on Machine Learning (ICML 2012).
Lanctot, M. (2014). Further Developments of Extensive-Form Replicator Dynamics using the Sequence-Form Representation. In AAMAS.
Lisy, V., Lanctot, M. and Bowling, M. (2015). Online monte carlo counterfactual regret minimization for search in imperfect information games. In Proceedings of the Fourteenth International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS).
Lisy, V., Davis, T., and Bowling. M. (2016). Counterfactual Regret Minimization in Sequential Security Games. In AAAI.
Neller, T.W. and Hnath, S. (2011). Approximating optimal Dudo play with fixed-strategy iteration counterfactual regret minimization. In Computers and Games, 2011.
Neller, T.W. and Lanctot, M. (2013). An Introduction to Counterfactual Regret Minimization. In Proceedings of Model {AI} Assignments, The Fourth Symposium on Educational Advances in Artificial Intelligence ({EAAI}-2013). link to site.
Osborne, M.J. and Rubinstein, A. (1994). A course in game theory. MIT Press. ISBN 0-262-65040-1.
Ponsen, M., de Jong, S., and Lanctot, M. (2011). Computing approximate Nash equilibria and robust best-responses using sampling. Journal of Artificial Intelligence Research, 42:575-605.
Rubin, J. and Watson, I. (2011). Computer poker: A review. Artificial Intelligence, 175(5-6):958-987.
Sandholm, T. (2010). The state of solving large incomplete information games, and application to poker. AI Magazine 13-32. Special issue on Algorithmic Game Theory.
von Stengel, B. (2007). Equilibrium computation for two-player games in strategic and extensive form. In Algorithmic Game Theory, chapter 4. Cambridge University Press.
Szafron, D., Gibson, R., Sturtevant, N. (2013). A Parameterized Family of Equilibrium Profiles for Three-Player Kuhn Poker. In Proceedings of the Twelfth International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
Tammelin, O., Burch, N., Johanson, M. and Bowling, M. (2015). Solving Heads-up Limit Texas Hold'em. In Proceedings of IJCAI. (CFR+ paper).
Waugh, K., Zinkevich, M., Johanson, M., Kan, M., Schnizlein, D., & Bowling, M. (2009). A practical use of imperfect recall. In Proceedings of the 8th Symposium on Abstraction, Reformulation and Approximation (SARA).
Waugh, K., Morrill, D., Bagnell, J.A., and Bowling, M. (2015). Solving Games with Functional Regret Estimation. Kevin Waugh, Dustin Morrill, J. Andrew Bagnell, and Michael Bowling. In AAAI Conference on Artificial Intelligence.
Waugh, K. and Bagnell, J.A. (2015). A Unified View of Large-scale Zero-sum Equilibrium Computation. Kevin Waugh and J. Andrew Bagnell. In AAAI Workshop on Computer Poker and Imperfect Information.
Zinkevich, M., Bowling, M., and Burch, N. (2007). A New Algorithm for Generating Equilibria in Massive Zero-Sum Games. In Proceedings of National Conference on Artificial Intelligence (AAAI), pp. 788-793.
Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2007). Regret minimization in games with incomplete information. In Advances in Neural Information Processing Systems 20 NIPS. (CFR paper.)