Code for our PRICAI 2022 paper:
"Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior"
by Baihan Lin (Columbia)*, Djallel Bouneffouf (IBM Research), Guillermo Cecchi (IBM Research).
*Corresponding
For the latest full paper: https://arxiv.org/abs/2006.06580
All the experimental results can be reproduced using the code in this repository. Feel free to contact me by doerlbh@gmail.com if you have any question about our work.
Abstract
As an important psychological and social experiment, the Iterated Prisoner’s Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner’s Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions.Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated.
Language: Python3, Python2, bash
Platform: MacOS, Linux, Windows
by Baihan Lin, April 2020
If you find this work helpful, please try the models out and cite our works. Thanks!
@inproceedings{lin2020online,
title={Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior},
author={Lin, Baihan and Bouneffouf, Djallel and Cecchi, Guillermo},
booktitle={Pacific Rim International Conference on Artificial Intelligence},
year={2022},
organization={Springer}
}
- Iterated Prisoner's Dilemma (IPD) with two players
- Iterated Prisoner's Dilemma (IPD) with N players
- Bandits: UCB1, Thompson Sampling, epsilon Greedy, EXP3, Human Behavior Thompson Sampling
- Contextual bandits: LinUCB, Contextual Thompson Sampling, EXP4, Split Contextual Thompson Sampling
- Reinforcement learning: Q Learning, Double Q Learning, SARSA, Split Q Learning
- Handcrafted: Always cooperate, Always defect, Tit for tat
- numpy and scikit-learn
- For the specifics about the mental variants used in this work, check out: https://github.com/doerlbh/mentalRL