Orbital Circularization with Reinforcement Learning

This code base is an attempt at solving an orbital circularization control problem for a small craft around a fixed massive object with Reinforcement Learning

Requirements

Python version: 3.8.12
Numpy version: 1.19.5
Matplotlib version: 3.5.0
Tensorflow version: 2.6.0
ffmpeg version: 4.2.2
gym version: 0.21.0

Remark: These runs were completed on an Intel MacBook Pro with MacOS 12.3 Monterey in a Conda environment. Compatibility may differ for other machines.

Models

Deep Q Network (DQN)

General Environment

The notebook rocket_dqn.ipynb and file rocket_dqn.py both readily run the Deep Q Network implementations. They contain the newest hyperparameters but may not be readily functional. Due to the implementation of the animation, the notebook version is not able to generate an animation. If you need animation at run time, please run rocket_dqn.py. We suggest training in the notebook and load the model to see the animated gameplay in rocket_dqn.py. For execution instructions, see Execution

For hyperparameters along with versions with those parameters, please refer to the running notes

Radial Thrust Environment

The 2D version of the circularization problem does not work at the moment. We transformed the problem into a 1D problem with only radial motion and radial thrust. It has a separate gym environment. To train, use the radial_rocket.ipynb notebook. The environment only provides a summary chart, but no animation.

For an animation, use rocket_dqn.py with the 2D environment. Note that for radial stabilization to work, we need to set initialization function init_func to target_l() in the reset method in rocket_gym.py. This ensures that the rocket has the desired angular momentum of a circular orbit at the target radius. Additionally, the environment should also be configured with the class decorators

env = rocket_gym.RadialObservation(
    rocket_gym.RadialThrust(
        rocket_gym.PolarizeAction(env)))

and network settings

model = DeepQNetwork(dims=[2, 128, 128, 3],
                    epsilon=1.0, epsilon_decay=.1, gamma=.95,
                    memory=100000, start_updating=10000,
                    batch_size=32, learning_rate=1e-4, descent_frequency=16, update_frequency=1,
                    use_target=True, target_frequency=8)

We will work on making that more customizable in the future.

Vanilla Policy Gradient (VPG, REINFORCE with Baseline)

A VPG implementation is also available, but without Experience Replay, it is not as efficient as DQN. Some improved policy networks may work better, but they are not well-explored.

To run the VPG implementation, make sure that wandb is installed, and run

python main.py

As an alternative, the notebook run.ipynb can also run the VPG model.

Linear Quadratic Regulator (LQR)

LQR is a type of simplified Optimal Control Problem. By linearizing the dynamics near an equilibrium, we can approximate the Rocket Circularization problem to an LQR problem. It works for states close enough to the equilibrium. For more information, checkout the Google Colab demonstration.

To run LQR, uncomment the LQR code in main.py and run the file like before.

Environment

Dynamics

According to Newton's Law of Gravitation (Inverse Square Law), we have the following second-order vector ODE.

$$m\ddot{\mathbf{r}} + \frac{GMm}{r^3}\mathbf{r} = F\mathbf{u}$$

where $\mathbf{r}$ is the position vector with $r$ as the magnitude, $G$, $M$, $m$ are the gravitational constant, the mass of the center mass, and the mass of the craft respectively. $F$ is the magnitude of the force, with $\mathbf{u}$ being a control to the system with magnitude $|\mathbf{u}|\leq 1$. To prevent overflows and underflows, we set natural units for the system, i.e. $G=1$, $M=1$. Additionally, we set $m=.01$.

We use the Euler-Cromer Method to approximate the motion:

$$\mathbf{v}_{t+1} = \mathbf{v}t + \mathbf{a}{net, t} \cdot \Delta t$$

$$\mathbf{r}_{t+1} = \mathbf{r}t + \mathbf{v}{t+1} \cdot \Delta t$$

where $\mathbf{r_t}$, $\mathbf{v_t}$ are position and velocity of the object at timestep $t$ respectively. $a_{net, t}$ is the total acceleration of the object at timestep $t$. The timestep is $\Delta t$. The timestep may be subject to change based on simulation accuracy and simulation speed, but it is around the magnitude of $.01$ or $.1$ based on the values given above.

To ensure that observations stay in a resonable range for the network, we clipped velocity as well as the radius with the norm. Additionally, when the craft hits some boundaries, it will loose all the velocity normal to the boundary in an inelastic collision. This ensures that the craft stays inbounds.

Open AI Gym

Open AI gym is one of the standard APIs for Reinforcement Learning. Other APIs differ slightly but usually have a similar format. For a detailed specification of Gym Environments, checkout this link. In this repository, rocket_gym.py provides the gym environment. To use the environment, call the make function. More customization can be done in the make function itself.

import rocket_gym
with rocket_gym.make('RocketCircularization-v0') as env:
    # Simulation Loop
    obs = env.reset()
    done = False
    while not done:
        env.render()
        obs, rwd, done, info = env.step(u)
    # Note that to produce an animation or a summary,
    # the show method must be run
    env.show()

The state is a vector of length 4. The first two elements show the position, and the last two show the velocity, both in cartesian. This u is a control vector in cartesian space. To obtain polar observations and give polar controls, use the class wrappers provided in rocket_gym.

Class Wrappers

To use class wrappers, simply apply them to the environment instance.

env = rocket_gym.PolarizeAction(env)

Note that the order they are applied may matter depending on how they are implemented.

Reference the documentation in rocket_gym.py for more details on the class wrappers.

Ongoing Experimentation

The details of the environment are not certain yet. This may include choice of simulation parameters, number of timesteps, reward structure, etc. Reference running notes and documentation for more details.

Deprecated Environment

rocket_circularization.py is an environment previously used for training. Without modifications, VPG and LQR still use this environment. It is also NOT an Open AI Gym environment, and is NOT guaranteed to work with the wrappers, but it does have settings that accomplish the those functions and possibly more.

Code Structure

Execution

To run the notebook, using VS Code with the Jupyter Notebook plugin should be sufficient. Otherwise, type jupyter notebook in the command line with the environment activated. Navigate to the file and it should run.

Note that for changes to the imported files to apply, the notebook kernel need to be restarted to clear the module cache. If necessary, the ./__pycache__ folder can also be deleted and the notebook restarted.

To run the python file, use the command python rocket_dqn.py.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
bounded_3		bounded_3
dqn_test_10		dqn_test_10
dqn_test_11		dqn_test_11
dqn_test_12		dqn_test_12
dqn_test_9		dqn_test_9
radial_2		radial_2
radial_5		radial_5
radial_6		radial_6
.DS_Store		.DS_Store
.gitignore		.gitignore
DQN.py		DQN.py
LQR.py		LQR.py
README.md		README.md
VPG.py		VPG.py
__init__.py		__init__.py
animation.py		animation.py
bounds.py		bounds.py
experience_replay.py		experience_replay.py
initial_condition.py		initial_condition.py
main.py		main.py
nn.py		nn.py
radial_rocket.ipynb		radial_rocket.ipynb
radial_rocket.py		radial_rocket.py
rocket_circularization.py		rocket_circularization.py
rocket_dqn.ipynb		rocket_dqn.ipynb
rocket_dqn.py		rocket_dqn.py
rocket_gym.py		rocket_gym.py
run.ipynb		run.ipynb
summary.py		summary.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Orbital Circularization with Reinforcement Learning

Requirements

Models

Deep Q Network (DQN)

General Environment

Radial Thrust Environment

Vanilla Policy Gradient (VPG, REINFORCE with Baseline)

Linear Quadratic Regulator (LQR)

Environment

Dynamics

Open AI Gym

Class Wrappers

Ongoing Experimentation

Deprecated Environment

Code Structure

Execution

About

Releases

Packages

Languages

YizhongHu/rocket_circularization

Folders and files

Latest commit

History

Repository files navigation

Orbital Circularization with Reinforcement Learning

Requirements

Models

Deep Q Network (DQN)

General Environment

Radial Thrust Environment

Vanilla Policy Gradient (VPG, REINFORCE with Baseline)

Linear Quadratic Regulator (LQR)

Environment

Dynamics

Open AI Gym

Class Wrappers

Ongoing Experimentation

Deprecated Environment

Code Structure

Execution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages