MARL Penalty Shot Challenge

In this project we have a task to do a MARL (Multi-Agent Reinforcement Learning) Penalty Shot Challenge by creating our own platform to pit SOTA Deep Reinforcement Learning algorithms against each other. It involves two agents simulating a penalty shootout. We have two entities that we would be playing on, The Bar and The Puck.

Course and Professor

This Project is done under Professor and Course of IIT Bhilai:

Professor: Soumajit Pramanik
Course: DS251

Features

Visualization of the unique environment at every step
Complete customization of environments and policies
Asynchronous server for manual interaction with a policy

Back to TOC

How to begin

Install packages

The -e flag is included to make the project package editable

Login to wandb.ai to record your experimental runs

pip install -e .
pip install -e ./gym-env
wandb login

Back to TOC

Train and test a model

Use files in utils/config/ to control configuration of agent specific policy hyper-parameters and environment parameters

Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment

python ./utils/train.py  --wandb-name "ds251_project" --training-num 1 --test-num 2 --puck ppo --bar ppo --load-puck-id both_ppo --load-bar-id both_ppo

Back to TOC

To play as bar:

Open 3 terminals and run

python ./examples/server/start_server.py

python ./examples/server/agent_puck.py

python ./examples/server/agent_bar.py

Click Start and use the mouse slider to control the direction of the bar.

Back to TOC

Codebase

Game Environment

It comprises a puck and a bar, with the puck moving horizontally at a consistent speed towards the bar. Each entity is independently controlled by its respective agent. The objective for the puck is to surpass the bar and reach the final line, while the bar aims to intercept the puck before it reaches the final line.

The environment is constructed using the OpenAI Gym library, where two action parameters corresponding to the puck and bar are accepted. The game progresses by one time step, producing a tuple output of state, reward, completion state, and an additional information object. Back to TOC

Agents

lib-agents: It features trivial, value based and policy based algorithms including smurve, DQN, TD3, PPO and DDPG.
comm-agents: It implements the hardcoded approach for finding a baseline and pure exploration strategy. It also implements the mouse slider.
Also implements a TwoAgentPolicyWrapper to combine policies for the puck and the agent. Back to TOC

Utils

Includes a training script and utility functions that implement wrappers.
Holds information regarding policy and environment configurations. Back to TOC

Async Communication

To facilitate asynchronous inputs from agents, a central server has been developed to manage the environment. Agents utilize a client class to establish a connection with the server, employing its step function to submit their actions and receive the corresponding result tuple. The server processes actions from the agents, synchronizes them, and advances the environment by a single time step. Back to TOC

Examples

Script for playing with the puck as a bar
A notebook demonstrating smurves Back to TOC

Team Members

Abhishek Kumar (12140040)
Arnav Gautam (12140280)
Dhruv Gupta (12140580)
Mitul Vardhan (12141070)

Acknowledgement

We thank Prof. Soumajit Pramanik for providing us with this opportunity to explore and learn more about SOTA algorithms through a project.
We also thank the creators of Tianshou and OpenAI Gym library which forms a core part of our codebase
We thank the open source community for wonderful libraries for everything under the sun!

Back to TOC

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
communication		communication
examples/server		examples/server
gym-env		gym-env
psp.egg-info		psp.egg-info
saved_policies		saved_policies
utils		utils
wandb		wandb
Group 8_ Report.pdf		Group 8_ Report.pdf
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARL Penalty Shot Challenge

Course and Professor

Table of Contents

Features

How to begin

Install packages

Train and test a model

Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment

To play as bar:

Codebase

Game Environment

Agents

Utils

Async Communication

Examples

Team Members

Acknowledgement

About

Releases

Packages

arnavwinner/AI_Project

Folders and files

Latest commit

History

Repository files navigation

MARL Penalty Shot Challenge

Course and Professor

Table of Contents

Features

How to begin

Install packages

Train and test a model

Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment

To play as bar:

Codebase

Game Environment

Agents

Utils

Async Communication

Examples

Team Members

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages