Drone Obstacle Avoidance using RL

This project attempts to use Reinforcement Learning to train a model to perform simple maneuvers, plan navigation and avoid dynamic obstacles

Current State of the Project

Training the Model to perform simple maneuvers

Environment and Model

Environment

The environment used is Pybullet Gym Environment for quadrotors.

The base class for all 'drone aviary' environments is defined under BaseAviary.py. The file loads Drone Model, Physics Simulator, and defines various Environment parameters and loads parameters from URDF files, to render the simulation

BaseSingleAgentAviary.py is a subclass of BaseAviary, dedicated for all single drone environments for RL. The action space and observation space are defined here, also a function is defined to compute current observation of the environment.

There are 6 different available Action Types

RPM - Desired rpm values for each propeller
DYN - Desired thrust and torque values for each propeller
PID - PID controller
ONE_D_RPM - Identical input rpm value to all propellers
ONE_D_DYN - Identical thrust/torque to all propellers
ONE_D_PID - Identical PID controller for all propellers

While training our agent, we have used the default action type, ActionType.RPM.

Also, there are 2 different Observation Types

KIN - Kinematic information (position, linear and angular velocities)
RGB - camera capture of each drone's Point of View

ObservationType.KIN, was used in training our agent

The above class is then used to construct four single agent RL problems:

TakeOffAviary : Single agent RL problem: take-off
HoverAviary : Single agent RL problem: hover at position
FlyThruGateAviary : Single agent RL problem: fly through a gate
TuneAviary : Single agent RL problem: optimize PID coefficients

We trained various agents extensively on HoverAviary, and intend to discuss various problems we faced, and our progress through this repo.

Model

The models used in this project are from Stable Baseline3. We tried a variety of RL algorithms to train models for all the tasks. Up till now, we've used two: Deep Deterministic Policy Gradient (DDPG) and Proximal Policy Optimization (PPO).

DDPG

While using DDPG to train our model, we found that training the model using DDPG took a lot of time, i.e., it processed through the training episodes at a slow pace, and the results didn't look too promising.

So we shifted to PPO.

PPO

We found that PPO had a better training speed than DDPG for the same number of time steps in the environment. The results weren't encouraging. So we increased the network size of the model from [32, 32] to [512, 512, 256, 128] and the larger neural network showed better results

Tasks

Hovering

In this task, the drone starts from a position on the ground. The agent then is supposed to cause the drone to take-off and hover at a target location.

The environment for this task was supposed to be HoverAviary. The reward function used for this environment didn't seem rich enough, so we made certain modifications, as

Eucledian Distance

The default reward function in 'HoverAviary' generated the reward using euclidian distance as follows:

reward = -1 * np.linalg.norm(np.array([0, 0, 1])-state[0:3])**2 #here state[0:3] denotes the current position of the drone

Error Sphere

Euclidean distance alone was not able to ensure that the agent moved towards the target location. So we added error spheres which basically give a different magnitude of rewards based on distance. This was done to ensure that the agent did not stray too far from the target location and maintained a stable hover near it.

Reward based on Rotation

The reward functions up till now only rewarded reaching the target location and were not dependent on the roll, pitch, or yaw. So we picked certain value ranges for the roll, pitch, yaw, and their respective time derivatives, and returned negative when they exceeded these value ranges.

Exponential and Euclidean Distance Rewards

The Error Sphere idea was not too successful in training the agent to move towards the target so we returned rewards by taking the exponential of the euclidean distance.

Results with Action Space ONE_D_RPM

We changed the action space of the environment to apply same action (RPM Value)to all four propellers.
After a training of 3,250,000 Million steps, the model was able to learn the task of hovering

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Training Curves		Training Curves
logs/PPO_0		logs/PPO_0
models/PPO		models/PPO
HA.py		HA.py
README.md		README.md
Testing.py		Testing.py
Training.py		Training.py
hover.py		hover.py
hover2.py		hover2.py
hover3.py		hover3.py
hover4.py		hover4.py
hover5.py		hover5.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Drone Obstacle Avoidance using RL

Current State of the Project

Environment and Model

Environment

Model

DDPG

PPO

Tasks

Hovering

Eucledian Distance

Error Sphere

Reward based on Rotation

Exponential and Euclidean Distance Rewards

Results with Action Space ONE_D_RPM

Results and Simulation

About

Releases

Packages

Contributors 2

Languages

IvLabs/Autonomous-Quadcopter-Control-RL

Folders and files

Latest commit

History

Repository files navigation

Drone Obstacle Avoidance using RL

Current State of the Project

Environment and Model

Environment

Model

DDPG

PPO

Tasks

Hovering

Eucledian Distance

Error Sphere

Reward based on Rotation

Exponential and Euclidean Distance Rewards

Results with Action Space ONE_D_RPM

Results and Simulation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages