Skip to content

Latest commit

 

History

History
62 lines (42 loc) · 2.93 KB

README.md

File metadata and controls

62 lines (42 loc) · 2.93 KB

A Systematic Comparison of Activation Functions

This project is a comparison of various activation functions on the MNIST dataset. We compared the performance of five different activation functions based on the accuracy, precision/recall, time, and number of epochs in their respective models.

Model

We utilized the same machine learning model for each experiment. We decided on a Feedforward Neural Network with 784 inputs (each pixel), 100 hidden units, and 10 outputs [0, 9].

Activation Functions

Sigmoid/Tanh

The downside to Signmoid/Tanh activation functions is that they are susceptible to the vanishing gradient problem which drastically slows down training, and is very sensitive to initial weights.

ReLU

The ReLU activation function grows for positive values so for x >> 1 there's no vanishing gradient. This mean you can get faster training times. The downside is that you collect many dead neurons as the gradients go negative, additionally teh computation can explode as teh output is unbound.

Saturating Linear Function

This is essentially ReLU with an upper bound which stop the output explosion problem.

Leaky ReLU (custom)

This function is ReLU with a small gradient below zero. This slope is controlled by a hyper parameter α. This fixes the dead neuron problem as there is now a gradient for outputs below zero.

Linear

This is a control for our experiments, using a linear activation function is essentially the same as having no activation function.

Results

Model Accuracy Precision Recall
(sigmoid) 93.12% 0.9786 0.9946
(tanh) 91.81% 0.9776 0.9939
(ReLU) 93.80% 0.9735 0.9953
(satlin) 92.86% 0.9806 0.9945
(Leaky ReLU, a=0.1) 93.84% 0.9786 0.9952
(purelin/None) 91.16% 0.9684 0.9938

Other Results/ Meta Learning

Model Accuracy Precision Recall
(sigmoid) 93.12% 0.9786 0.9946
(tanh) 91.81% 0.9776 0.9939
(ReLU) 93.80% 0.9735 0.9953
(satlin) 92.86% 0.9806 0.9945
(Leaky ReLU, a=0.1) 93.84% 0.9786 0.9952
(purelin/None) 91.16% 0.9684 0.9938

alpha