dilemmaRL

Code for our PRICAI 2022 paper:

"Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior"

by Baihan Lin (Columbia)*, Djallel Bouneffouf (IBM Research), Guillermo Cecchi (IBM Research).

*Corresponding

For the latest full paper: https://arxiv.org/abs/2006.06580

All the experimental results can be reproduced using the code in this repository. Feel free to contact me by doerlbh@gmail.com if you have any question about our work.

Abstract

As an important psychological and social experiment, the Iterated Prisoner’s Dilemma (IPD) treats the choice to cooperate or defect as an atomic action. We propose to study the behaviors of online learning algorithms in the Iterated Prisoner’s Dilemma (IPD) game, where we investigate the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated, as an effort to connect artificial intelligence algorithms with human behaviors and their abnormal states in neuropsychiatric conditions.Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the Iterated Prisoner's Dilemma (IPD) game, where we explored the full spectrum of reinforcement learning agents: multi-armed bandits, contextual bandits and reinforcement learning. We have evaluate them based on a tournament of iterated prisoner's dilemma where multiple agents can compete in a sequential fashion. This allows us to analyze the dynamics of policies learned by multiple self-interested independent reward-driven agents, and also allows us study the capacity of these algorithms to fit the human behaviors. Results suggest that considering the current situation to make decision is the worst in this kind of social dilemma game. Multiples discoveries on online learning behaviors and clinical validations are stated.

Info

Language: Python3, Python2, bash

Platform: MacOS, Linux, Windows

by Baihan Lin, April 2020

Citation

If you find this work helpful, please try the models out and cite our works. Thanks!

@inproceedings{lin2020online,
  title={Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior},
  author={Lin, Baihan and Bouneffouf, Djallel and Cecchi, Guillermo},
  booktitle={Pacific Rim International Conference on Artificial Intelligence},
  year={2022},
  organization={Springer}

}

Tasks

Iterated Prisoner's Dilemma (IPD) with two players
Iterated Prisoner's Dilemma (IPD) with N players

Algorithms:

Bandits: UCB1, Thompson Sampling, epsilon Greedy, EXP3, Human Behavior Thompson Sampling
Contextual bandits: LinUCB, Contextual Thompson Sampling, EXP4, Split Contextual Thompson Sampling
Reinforcement learning: Q Learning, Double Q Learning, SARSA, Split Q Learning
Handcrafted: Always cooperate, Always defect, Tit for tat

Requirements

numpy and scikit-learn

Mental variants

For the specifics about the mental variants used in this work, check out: https://github.com/doerlbh/mentalRL

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

dilemmaRL

Info

Citation

Tasks

Algorithms:

Requirements

Mental variants

Files

README.md

Latest commit

History

README.md

File metadata and controls

dilemmaRL

Info

Citation

Tasks

Algorithms:

Requirements

Mental variants