-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Project examles
Here are some of the examples for course projects
There is a popular kind of video&board games that involve collecting and playing cards in a turn-based strategy way.
- Hearthstone
- Good old Magic the Gathering
- Gwent
- you'll fing 100+ others if you google
Most of those games have a replay system and even an API, e.g. this.
The first challenge is to pre-train your bot on a human expert sessions. The first problem to solve is how to efficiently generalize over large action spaces of possible cards, only a fractions of which are currently available. Luckily for us, there's an article for this.
The things your bot may learn to do include:
- building a perfect deck to counter your opponent (may include heavy deep learning stuff, metric learning / DSSM)
- pre-training on human replay sessions
- playing & training Vs oneself or human 'expert'
[ysda/hse] Should you wish, we can offer extensive assistance with theory and coding for this project.
There have been several successful attempts to speed up RL by introducing many parallel computation nodes.
- Google's Gorilla - http://www.humphreysheil.com/blog/gorila-google-reinforcement-learning-architecture
- Asynchronuous on-policy RL - https://arxiv.org/abs/1602.01783
The grand-quest is to reproduce the thing for the newly developed PGQ and it's continuous version. Also, trying out the diffrent kinds of parameter server would be really nice.
- The architecture is "many nodes that play, few nodes that train"
- Tech stack: redis as a DB, any DL framework you want.
Taken by: udimus,bestxolodec
Benchmarking q-learning with double/duelling/bootstrap/prioritized_er/constrained/soft_targetnet/PGQ on doom
This project aims to figure out whether the numerous articles about improving DQN actually improve it :) We cover an array of those guys in week5 lecture.
Most of them are trained on Atari envs, so it makes a sence to consider non-atari problems as a "private test set".
Luckily, there's a set of such problems called VizDoom. The simplest of doom problems was already covered in assignment 4.2.
So the goal is to reproduce those articles and see how well they generalize to doom envs with minimal tuning.
It is not neccessary to implement all of those, just some of them you like most. The project milestones will include one or a few methods at a time.
The task is simple: given the episodic reward R(z), try to deduce the per-step r(s,a) so that maximizing them is equal to maximizing R(z), but easier :). We have the baseline that does so on tabular envs, the goal is to generalize to "deep RL" case.