Temporal-Difference Counterfactual Regret Minimization (TD-CFR)

This is an implementation of the Counterfactual Regret Minimization (CFR) algorithm [1] that uses Temporal Difference (TD) learning instead of dynamic programming [1] or Monte Carlo sampling [2].

Running TD-CFR

Coming soon!

Playing vs. your agent

You can play on the console vs. your agent by specifying the rules and creating a simulator instance:

# load the rules of the game
leduc = leduc_rules()

# learn the agent's policy
tdcfr_agent = ... 

# create a human player
p0 = HumanAgent(leduc, 0)
p1 = learned_agent
agents = [p0, p1]

# create a simulator instance
sim = GameSimulator(leduc, agents, verbose=True, showhands=True)

# play forever
while True:
    sim.play()
    # move the button after every hand
    if p0.seat == 0:
        p0.seat = 1
        p1.seat = 0
    else:
        p0.seat = 0
        p1.seat = 1
    print ''

Dependencies

You need the pyCFR library to be in an external sibling folder ../cfr to run the code. The library provides implementations of poker game trees, expected value, best response, and the canonical CFR algorithm.

TODO

The following is a list of items that still need to be implemented:

Contributors

Wesley Tansey

References

[1] Zinkevich, M., Johanson, M., Bowling, M., & Piccione, C. (2008). Regret minimization in games with incomplete information. Advances in neural information processing systems, 20, 1729-1736.

[2] Lanctot, M., Waugh, K., Zinkevich, M., & Bowling, M. (2009). Monte Carlo sampling for regret minimization in extensive games. Advances in Neural Information Processing Systems, 22, 1078-1086.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
bayes_bluff		bayes_bluff
data		data
paper		paper
results		results
robust_responses		robust_responses
stationary_agents		stationary_agents
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
bayesian_bootstrapping.py		bayesian_bootstrapping.py
bootstrapping_-0.079755901138.png		bootstrapping_-0.079755901138.png
bootstrapping_-5.81107789616.png		bootstrapping_-5.81107789616.png
bootstrapping_0.09.png		bootstrapping_0.09.png
bootstrapping_0.117638227668.png		bootstrapping_0.117638227668.png
bootstrapping_3.83634798599.png		bootstrapping_3.83634798599.png
bootstrapping_Exploited.png		bootstrapping_Exploited.png
bootstrapping_Max.png		bootstrapping_Max.png
bootstrapping_P0Vulnerability.png		bootstrapping_P0Vulnerability.png
bootstrapping_PctExploited.png		bootstrapping_PctExploited.png
bootstrapping_Winnings.png		bootstrapping_Winnings.png
bootstrapping_average.csv		bootstrapping_average.csv
bootstrapping_stderr.csv		bootstrapping_stderr.csv
bootstrapping_stdev.csv		bootstrapping_stdev.csv
console_leduc.py		console_leduc.py
environment.py		environment.py
explicit_Exploited.png		explicit_Exploited.png
explicit_Max.png		explicit_Max.png
explicit_P0Vulnerability.png		explicit_P0Vulnerability.png
explicit_PctExploited.png		explicit_PctExploited.png
explicit_Winnings.png		explicit_Winnings.png
explicit_agent.py		explicit_agent.py
explicit_average.csv		explicit_average.csv
explicit_stderr.csv		explicit_stderr.csv
explicit_stdev.csv		explicit_stdev.csv
exploit.py		exploit.py
implicit_Exploited.png		implicit_Exploited.png
implicit_Max.png		implicit_Max.png
implicit_P0Vulnerability.png		implicit_P0Vulnerability.png
implicit_PctExploited.png		implicit_PctExploited.png
implicit_Winnings.png		implicit_Winnings.png
implicit_agent.py		implicit_agent.py
implicit_average.csv		implicit_average.csv
implicit_stderr.csv		implicit_stderr.csv
implicit_stdev.csv		implicit_stdev.csv
implicit_subpolicy.py		implicit_subpolicy.py
make_bayesbluff_jobs.py		make_bayesbluff_jobs.py
make_response_jobs.py		make_response_jobs.py
make_skewagent_jobs.py		make_skewagent_jobs.py
nash_response.py		nash_response.py
online_cfr.py		online_cfr.py
plot_histograms.py		plot_histograms.py
plot_implicit_max.py		plot_implicit_max.py
plot_results.py		plot_results.py
robust_response.py		robust_response.py
royal_nash.py		royal_nash.py
royal_nash0.strat		royal_nash0.strat
royal_nash1.strat		royal_nash1.strat
skew_agents.py		skew_agents.py
skew_utils.py		skew_utils.py
subpolicy_bootstrapping.py		subpolicy_bootstrapping.py
subpolicy_bootstrapping_Exploited.png		subpolicy_bootstrapping_Exploited.png
subpolicy_bootstrapping_Max.png		subpolicy_bootstrapping_Max.png
subpolicy_bootstrapping_P0Vulnerability.png		subpolicy_bootstrapping_P0Vulnerability.png
subpolicy_bootstrapping_PctExploited.png		subpolicy_bootstrapping_PctExploited.png
subpolicy_bootstrapping_Winnings.png		subpolicy_bootstrapping_Winnings.png
subpolicy_bootstrapping_average.csv		subpolicy_bootstrapping_average.csv
subpolicy_bootstrapping_stderr.csv		subpolicy_bootstrapping_stderr.csv
subpolicy_bootstrapping_stdev.csv		subpolicy_bootstrapping_stdev.csv
subpolicy_implicit_Exploited.png		subpolicy_implicit_Exploited.png
subpolicy_implicit_Max.png		subpolicy_implicit_Max.png
subpolicy_implicit_P0Vulnerability.png		subpolicy_implicit_P0Vulnerability.png
subpolicy_implicit_PctExploited.png		subpolicy_implicit_PctExploited.png
subpolicy_implicit_Winnings.png		subpolicy_implicit_Winnings.png
subpolicy_implicit_average.csv		subpolicy_implicit_average.csv
subpolicy_implicit_stderr.csv		subpolicy_implicit_stderr.csv
subpolicy_implicit_stdev.csv		subpolicy_implicit_stdev.csv
td_hskuhn.py		td_hskuhn.py
td_leduc.py		td_leduc.py
tdcfr.py		tdcfr.py
test.csv		test.csv
weak_player0.strat		weak_player0.strat
weak_player0_response0.5.strat		weak_player0_response0.5.strat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal-Difference Counterfactual Regret Minimization (TD-CFR)

Running TD-CFR

Playing vs. your agent

Dependencies

TODO

Contributors

References

About

Releases

Packages

Languages

tansey/td_cfr

Folders and files

Latest commit

History

Repository files navigation

Temporal-Difference Counterfactual Regret Minimization (TD-CFR)

Running TD-CFR

Playing vs. your agent

Dependencies

TODO

Contributors

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages