Implementation and visualization of optimization algorithms.
please add MAthJax Plugin for Github to your browser.
- Optimization-and-Search
- Table of Contents
- 1. Numerical Optimization
- 2. Stochastic Search
- 3. Classical Search
- 4. Dynamic Programming
- 5.
- 6.
- 7.
- 8.
Here we are trying to solve simple quadratic problem.
The animation(left) is tested on N=10, (right) for n=2;
using first-order gradient and learning rate
using line search to compute the step size
find conjugate direction
$$ \beta_k = \frac{\nabla f(x^{k+1})^T\nabla f(x^{k+1})}{\nabla f(x^k)^T\nabla f(x^k)} \text{ (FR)}$$
using second-order gradient
Here we try to use stochastic search to solve TSP problems.
cross entropy using less steps to get converged
Here trying to find lowest position. Since for TSP, I cannot find
The code is writen according to the slide in CSE257, UCSD.
The implementation here, the reward for all the tile is -0.04, and credit tile is 4, obstable tile is -5. You can just press space key to change the [credit] or [obstable] choice and click left to add credit/obstable tile and remove them.We can see we get the optimal policy before our utility function converges.
- Policy:
$\pi (s_i) = a_i$ - Trajectories:
$s_0, s_1, s_2, .., s_T$ - Rewards:
$R(s_0) + \gamma R(s_1) + .. + \gamma^TR(s_T)$
Follow policy and get a lot of samples of trajectories, and estimate the expectation $$V^{\pi}(s_0) = \mathbb{E}\pi [\sum{i=0}^T\gamma^iR(s_i)]$$
The difference between TD and MC method is that TD could update in every step and do not necessarily wait for a whole trajectory to generate.
Note that the learning rate $\alpha$ should decrease with the number of visits to a state, in this case the method is guaranteed to converge to the true action values by the law of large numbers.Off policy method
In this case I used Deep Q network to train a self-driving car. The environment is created by NeuralNine (github), thought he used NEAT package to train the car.
The demo is below: you can modify the network and train your own car!
Random case: