In this project, we are essentially exploring a gridworld using an agent and trying to maximize the agent's rewards. We assume that we don’t know the T&R values so the agent will learn those while exploring. There are 10 environments or worlds. Each world can be explored multiple times. The agent's score is always being averaged and is updated via the API.
Details for the API are here -> https://docs.google.com/presentation/d/15L3VPdl-hGUzM64wst5_eIzQZlL7wivbWo6TKK92lhA/edit#slide=id.g710fd126e1_5_5
Our Presentation -> https://docs.google.com/presentation/d/1r6ZSIM3X5BzC1qOpAFmh9nBAPx56MnF1J-AZDCerBd0/edit#slide=id.p1