A Deep Learning Algorithm with TensorFlow
The purpose of this project is to perform to Neural Networks Machine Learning algorithms to analyze how efficiently funds and donations from the foundation is used.
- The Variable considered Target for the model:
After carefully investigating the dataset IS_SUCCESSFULL
feature was used as target variable which is what our model will predict by utilizing rest of the features in the dataset.
-
The variables considered be the features Features for the model:
- NAME *
- APPLICATION_TYPE
- AFFILIATION
- CLASSIFICATION
- USE_CASE
- ORGANIZATION
- STATUS
- INCOME_AMT
- SPECIAL_CONSIDERATIONS
- ASK_AMT
-
The variables are neither targets nor features and removed from the input data
For 1st model, EIN
and NAME
was removed from the dataset before perform any feature engineering to establish our model .
However, after compiling the model and extract the accuracy report, it was discovered that keeping NAME
as feature benefits for optimizing and increase accuracy of the model.
Therefore, EIN
was removed from the input data.
After applying variety of optimization steps the model performance increased from 72%
to 75%
.
- 1- Use
NAME
as feature and create bin for it instead of removing it. - 2- Increase numbers of neurons for each layer.
- 3- Insert 3rd layers to the model, then removed it as it didn’t make significant changes and decrease the model performance. Since we are using a relatively small dataset, it was not suggested to use more than 2 layers for the model.
- 4- Changed activation function for 1st layer from
Relu
totanh
. In my opinion,tanh
normalizes the data better thanRelu
. - 5- No changes made in number of
Epochs
as it would easily overfit the model.
Only 2 different model were created to reach optimum accuracy for this project. There are simple steps that can increase the model performance instantly such as add more neurons to each layer, change the activation functions and reengineering the features. In addition, the loss score is around 50% which is also acceptable range for the model. One of the fundamental issue while optimizing the model is the overfitting. In order to refrain from overfitting, numbers of layers should be minimized for small datasets and also iteration should be in certain range.
Since it is a categorical dataset, using Random Forest Classifier would be more efficient since it uses less resources and requires less coding.
- Dataset: charity_data.csv
- Software/Languages: Jupyter Notebook- Google Colab, Python.
- Libraries: Scikit-learn, TensorFlow, Pandas, Matlib