Skip to content

Temporal Difference Learning and Basic Reinforcement Learning Demos in Matlab

Notifications You must be signed in to change notification settings

sinairv/Temporal-Difference-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 

Repository files navigation

Temporal-Difference Learning Demos in MATLAB

In this package you will find MATLAB codes which demonstrate some selected examples of temporal-difference learning methods in prediction problems and in reinforcement learning.

To begin:

  • Run DemoGUI.m
  • Start with the set of predefined demos: select one and press Go
  • Modify demos: select one of the predefined demos, and modify the options

Feel free to distribute or use package especially for educational purposes. I personally, learned too much from cliff-walking.

The repository for the package is hosted on GitHub.

Why temporal difference learning is important

A quotation from R. S. Sutton, and A. G. Barto from their book Introduction to Reinforcement Learning (here):

If one had to identify one idea as central and novel to reinforcement learning, it would undoubtedly be temporal-difference (TD) learning.

Many basic reinforcement learning algorithms such as Q-Laerning and SARSA are in essence temporal difference learning methods.

Demos

  • Prediciton random walk: see how precise we can predict the probability of visiting nodes

  • RL random walk: see how RL generated random walk policy converges the computed probabilities.

  • Simple grid world (with and without king moves): see how RL generated policy helps the agent find the goal through time (by king-moves it is meant moving along the four main directions and the diagonals, i.e., the way king moves in chess).

  • Windy grid world: the wind distracts the agent from its destination sought by its actions. See how RL solves this problem.

  • Cliff walking: the agent should reach its destination while avoiding the cliffs. A truly instructive example, which shows the differences between on-policy, and off-policy learning algorithms.

References

[1] Sutton, R. S., "Learning to predict by the methods of temporal differences, In Machine Learning, pp. 9-44, 1988 (available online)

[2] Sutton, R. S. and Barto, A. G., "Reinforcement learning: An introduction," 1998 (available online)

[3] Kaelbling, L. P., Littman, M. L., and Moore, A. W., "Reinforcement learning: A survey," Journal of Artificial Intelligence Research, Vol.4, pp.237-285, 1997 (available online)

Contact

Copyright (c) 2011 Sina Iravanian - licensed under MIT.

Homepage: sinairv.github.io

GitHub: github.com/sinairv

Twitter: @sinairv

Screenshots

Prediction random walk demo:

Prediction random walk demo

RL random walk demo:

RL random walk demo

Simple grid-world demo:

Simple grid-world demo

About

Temporal Difference Learning and Basic Reinforcement Learning Demos in Matlab

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages