Skip to content

arnavwinner/AI_Project

Repository files navigation

MARL Penalty Shot Challenge

In this project we have a task to do a MARL (Multi-Agent Reinforcement Learning) Penalty Shot Challenge by creating our own platform to pit SOTA Deep Reinforcement Learning algorithms against each other. It involves two agents simulating a penalty shootout. We have two entities that we would be playing on, The Bar and The Puck.

Course and Professor

  • This Project is done under Professor and Course of IIT Bhilai:
  1. Professor: Soumajit Pramanik
  2. Course: DS251

Table of Contents

Features

  • Visualization of the unique environment at every step
  • Complete customization of environments and policies
  • Asynchronous server for manual interaction with a policy

Back to TOC

How to begin

Install packages

The -e flag is included to make the project package editable

Login to wandb.ai to record your experimental runs

pip install -e .
pip install -e ./gym-env
wandb login

Back to TOC

Train and test a model

Use files in utils/config/ to control configuration of agent specific policy hyper-parameters and environment parameters

Example command to run that trains a puck and bar with PPO algorithm and uses a previously saved policy for each of the agents with 1 training environment and 2 test environment

python ./utils/train.py  --wandb-name "ds251_project" --training-num 1 --test-num 2 --puck ppo --bar ppo --load-puck-id both_ppo --load-bar-id both_ppo 

Back to TOC

To play as bar:

Open 3 terminals and run

python ./examples/server/start_server.py
python ./examples/server/agent_puck.py
python ./examples/server/agent_bar.py

Click Start and use the mouse slider to control the direction of the bar.

Back to TOC

Codebase

Game Environment

It comprises a puck and a bar, with the puck moving horizontally at a consistent speed towards the bar. Each entity is independently controlled by its respective agent. The objective for the puck is to surpass the bar and reach the final line, while the bar aims to intercept the puck before it reaches the final line.

The environment is constructed using the OpenAI Gym library, where two action parameters corresponding to the puck and bar are accepted. The game progresses by one time step, producing a tuple output of state, reward, completion state, and an additional information object. Back to TOC

Agents

  • lib-agents: It features trivial, value based and policy based algorithms including smurve, DQN, TD3, PPO and DDPG.
  • comm-agents: It implements the hardcoded approach for finding a baseline and pure exploration strategy. It also implements the mouse slider.
  • Also implements a TwoAgentPolicyWrapper to combine policies for the puck and the agent. Back to TOC

Utils

  • Includes a training script and utility functions that implement wrappers.
  • Holds information regarding policy and environment configurations. Back to TOC

Async Communication

To facilitate asynchronous inputs from agents, a central server has been developed to manage the environment. Agents utilize a client class to establish a connection with the server, employing its step function to submit their actions and receive the corresponding result tuple. The server processes actions from the agents, synchronizes them, and advances the environment by a single time step. Back to TOC

Examples

  • Script for playing with the puck as a bar
  • A notebook demonstrating smurves Back to TOC

Team Members

  • Abhishek Kumar (12140040)
  • Arnav Gautam (12140280)
  • Dhruv Gupta (12140580)
  • Mitul Vardhan (12141070)

Acknowledgement

  • We thank Prof. Soumajit Pramanik for providing us with this opportunity to explore and learn more about SOTA algorithms through a project.
  • We also thank the creators of Tianshou and OpenAI Gym library which forms a core part of our codebase
  • We thank the open source community for wonderful libraries for everything under the sun!

Back to TOC

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published