This code base is an attempt at solving an orbital circularization control problem for a small craft around a fixed massive object with Reinforcement Learning
- Python version: 3.8.12
- Numpy version: 1.19.5
- Matplotlib version: 3.5.0
- Tensorflow version: 2.6.0
- ffmpeg version: 4.2.2
- gym version: 0.21.0
Remark: These runs were completed on an Intel MacBook Pro with MacOS 12.3 Monterey in a Conda environment. Compatibility may differ for other machines.
The notebook rocket_dqn.ipynb
and file rocket_dqn.py
both readily run
the Deep Q Network implementations. They contain the newest hyperparameters
but may not be readily functional. Due to the implementation of the
animation, the notebook version is not able to generate an animation. If
you need animation at run time, please run rocket_dqn.py
. We suggest
training in the notebook and load the model to see the animated gameplay in
rocket_dqn.py
. For execution instructions, see Execution
For hyperparameters along with versions with those parameters, please refer to the running notes
The 2D version of the circularization problem does not work at the moment.
We transformed the problem into a 1D problem with only radial motion
and radial thrust. It has a separate gym environment. To train, use the
radial_rocket.ipynb
notebook. The environment only provides a summary
chart, but no animation.
For an animation, use rocket_dqn.py
with the
2D environment. Note that for radial stabilization to work, we need
to set initialization function init_func
to target_l()
in the reset
method in rocket_gym.py
. This ensures that the rocket has the desired
angular momentum of a circular orbit at the target radius.
Additionally, the environment should also be configured with the class
decorators
env = rocket_gym.RadialObservation(
rocket_gym.RadialThrust(
rocket_gym.PolarizeAction(env)))
and network settings
model = DeepQNetwork(dims=[2, 128, 128, 3],
epsilon=1.0, epsilon_decay=.1, gamma=.95,
memory=100000, start_updating=10000,
batch_size=32, learning_rate=1e-4, descent_frequency=16, update_frequency=1,
use_target=True, target_frequency=8)
We will work on making that more customizable in the future.
A VPG implementation is also available, but without Experience Replay, it is not as efficient as DQN. Some improved policy networks may work better, but they are not well-explored.
To run the VPG implementation, make sure that wandb
is installed, and run
python main.py
As an alternative, the notebook run.ipynb
can also run the VPG model.
LQR is a type of simplified Optimal Control Problem. By linearizing the dynamics near an equilibrium, we can approximate the Rocket Circularization problem to an LQR problem. It works for states close enough to the equilibrium. For more information, checkout the Google Colab demonstration.
To run LQR, uncomment the LQR code in main.py
and run the file like before.
According to Newton's Law of Gravitation (Inverse Square Law), we have the following second-order vector ODE.
where
We use the Euler-Cromer Method to approximate the motion:
$$\mathbf{v}_{t+1} = \mathbf{v}t + \mathbf{a}{net, t} \cdot \Delta t$$
$$\mathbf{r}_{t+1} = \mathbf{r}t + \mathbf{v}{t+1} \cdot \Delta t$$
where
To ensure that observations stay in a resonable range for the network, we clipped velocity as well as the radius with the norm. Additionally, when the craft hits some boundaries, it will loose all the velocity normal to the boundary in an inelastic collision. This ensures that the craft stays inbounds.
Open AI gym is one of the standard APIs for Reinforcement Learning.
Other APIs differ slightly but usually have a similar format.
For a detailed specification of Gym Environments, checkout this link. In this repository, rocket_gym.py
provides the gym environment. To use the environment, call the make
function. More customization can be done in the make
function itself.
import rocket_gym
with rocket_gym.make('RocketCircularization-v0') as env:
# Simulation Loop
obs = env.reset()
done = False
while not done:
env.render()
obs, rwd, done, info = env.step(u)
# Note that to produce an animation or a summary,
# the show method must be run
env.show()
The state is a vector of length 4
. The first two elements show the
position, and the last two show the velocity, both in cartesian.
This u
is a control vector in cartesian space. To obtain polar observations
and give polar controls, use the class wrappers provided in rocket_gym
.
To use class wrappers, simply apply them to the environment instance.
env = rocket_gym.PolarizeAction(env)
Note that the order they are applied may matter depending on how they are implemented.
Reference the documentation in rocket_gym.py
for more details on the class
wrappers.
The details of the environment are not certain yet. This may include choice of simulation parameters, number of timesteps, reward structure, etc. Reference running notes and documentation for more details.
rocket_circularization.py
is an environment previously used for training.
Without modifications, VPG and LQR still use this environment. It is also
NOT an Open AI Gym environment, and is NOT guaranteed to work with the
wrappers, but it does have settings that accomplish the those functions
and possibly more.
To run the notebook, using VS Code with the Jupyter Notebook
plugin should be sufficient. Otherwise, type jupyter notebook
in the
command line with the environment activated. Navigate to the file and it should run.
Note that for changes to the imported files to apply, the notebook kernel
need to be restarted to clear the module cache. If necessary, the ./__pycache__
folder can also be deleted and the notebook restarted.
To run the python file, use the command python rocket_dqn.py
.