Reimplement the paper: Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution on MountainCarContinuous-v0
Note that the model is not very stable, if you find any tricks that can make it stable, please keep me informed.