This project explores making a model of a car self-driving. The model is controlled by a neural network which is trained with Q-learning. The environment for making custom tracks as well as collision detection is also built from scratch. Below is an example of the car driving on its own. The red lines are an illustration of how the car measures distance, which is used as features along with a couple other parameters.
RL-racecar-compressed.mp4
Example of a neural network controlling the car model, trained for around 5 minutes on an NVIDIA GeForce RTX 3050 laptop GPU. That's right, just a few minutes, on a laptop! The neural network generalizes well when applied to previously unseen test tracks, as seen when drawing a new track and letting the agent race on it. Most tracks have not been used for training. We can even use several cars, as seen in the end. At one point, a car overtakes another one.
At the time being, the brain of the car is a simple fully connected neural network.
The neural network accepts seven features:
- Far-left distance
- Left-front distance
- Front distance
- Right-front distance
- Far-right distance
- Angle of wheels
- Speed
All of the features are normalised to an absolute value between 0 and 1. I try to make this normalisation a habit since it makes sense to me to have an interval where the weights of the neural network are likely to be initialised in the vicinity of. It may also generalise better, since I normalise by dividing with physical measures and thus make the parameters nondimensional. As such, it does not matter if the measures of the car, track and speed were to be scaled up by some factor, the network still sees the same input.
Deep Q-learning can be very unstable. One way to stabilise the learning procedure is to introduce a target network. The target network is synchronised with the prediction network periodically, and is thus kept constant for prolonged periods. The target network is used to estimate the future Q-values, while the prediction network estimates the current Q-values.
The network outputs Q-values for:
- Increasing the angle of the wheels relative to the car
- Decreasing the angle of the wheels relative to the car
- Increasing the speed of the car
- Decreasing the speed of the car
By changing the angle of the wheels, the turning radius is changed as well. There is a smallest allowed turning radius, which decides the angular velocity of the car. The angular velocity is found as the quotient between the speed of the car and the signed turning radius. The turning radius is in turn found as the cotangent of the angle of the wheels, multiplied by the minimum turning radius.
The neural network is designed with PyTorch (1.8.1+cu102). In addition, NumPy (1.18.5) is used for many operations and information storing. To remove a lot of computational burden, I have used Numba (0.51.2) for several movement handling operations, collision detections, as well as distance measuring etc. For parsing command line arguments, argparse (1.4.0) is used.
First of, make sure you have all of the necessary packages. Have a virtual environment set up, and install the necessary packages listed under Dependencies.
Draw.track.mp4
I've stuck to a pretty simple approach of defining the tracks. When you run python draw_track.py
, you can start drawing a track by holding the letter A
and moving the mouse to define a set of nodes. When you are close enough to the original node again, the track will form. Defining the borders is not particularly easy to get right, since one border needs to be shorter than the other in a turn. I have a provisional solution for this, but it is not perfect. Making too sharp turns may lead to yanks in the track. Sufficiently smooth curves will lead to a good track.
To train a neural network, use the script train.py
. There are many parameters that can be changed, but I find that the ones defined already in the script work well. To see a short explanation of the parameters, visit the script racing_agent.py
. epsilon
controls the initial randomness of the Q-learning algorithm, and thus promotes exploration. It is kept between 1.0 and 0.0. Using epsilon_start = 1.0
, epsilon_final = 0.0
and epsilon_scale = runs
usually works well. These settings mean that epsilon
linearly decreases from 1.0 to 0.0 from the first generation to the last.
To train a new agent, specify an agent name, such as agent_dense2
, and run the script with the command python train.py
. This will train and save an agent in the folder build
.
Currently, two neural network architectures are implemented: DenseNetwork
and RecurrentNetwork
. For RecurrentNetwork
, you can specify a sequence length for a series of features which the neural network will use as input. However, I find that DenseNetwork
works the best for this application. By default, the sequence length seq_length
is kept as 1, which it should be for DenseNetwork
. To change the number of hidden neurons, you can change the setting hidden_neurons
in the instance of RacingAgent
. Relatively few neurons work well, such as [64, 32, 16]
. This means that hidden layer 1 has 64 neurons, hidden layer 2 has 32 neurons and hidden layer 3 has 16 neurons. The pretrained agent agent_dense
has hidden_neurons = [32, 32, 32]
.
To test a trained agent, you can use the script game.py
. You can either change the name of the agent and the racetrack in the script itself, or provide command line arguments. The argument --agent-name
specifies the name of the agent to be loaded from the folder build
. The argument --track_name
specifies the name of the race track to be loaded from the folder tracks
. For example, you can run
python game.py --agent-name agent_dense --track-name racetrack1
in the command line, which will load the pretrained agent agent_dense
and try it out on the track racetrack1
. Note that there's no guarantee that the Deep Q-learning algorithm will lead to an optimal solution. Sometimes, the agent might end up going quite slow, even if it has been trained to be fast. Training for longer periods can mitigate this.