The nonprofit foundation Alphabet Soup wants a tool that can help it select the applicants for funding with the best chance of success in their ventures. With your knowledge of machine learning and neural networks, you’ll use the features in the provided dataset to create a binary classifier that can predict whether applicants will be successful if funded by Alphabet Soup.
From Alphabet Soup’s business team, you have received a CSV containing more than 34,000 organizations that have received funding from Alphabet Soup over the years. Within this dataset are a number of columns that capture metadata about each organization, such as:
EIN
andNAME
—Identification columnsAPPLICATION_TYPE
—Alphabet Soup application typeAFFILIATION
—Affiliated sector of industryCLASSIFICATION
—Government organization classificationUSE_CASE
—Use case for fundingORGANIZATION
—Organization typeSTATUS
—Active statusINCOME_AMT
—Income classificationSPECIAL_CONSIDERATIONS
—Special consideration for applicationASK_AMT
—Funding amount requestedIS_SUCCESSFUL
—Was the money used effectively
- Step 1: Preprocess the Data
- Step 2: Compile, Train, and Evaluate the Model
- Step 3: Optimize the Model
- Step 4: Write a Report on the Neural Network Model
Overview
The purpose of creating these NN models was to find a tool that could help select applicants for funding. After the first model was created, I tried a few optimization models to see if I could improve accuracy while maintaining a low loss rate.
ResultsMethod 1 - 73% Accuracy, 0.6% Loss
- 🔸 Achieved by dropping non-beneficial ID columns "EIN" and "NAME"
- 🔸 “APPLICATION_TYPE“ and “CLASSIFICATION“ used for binning
- 🔸 Two hidden relu layers
- 🔸 100 epochs to train the model
Method 2 - 78% Accuracy, 1.7% Loss
- 🔹 Achieved by dropping one non-beneficial ID column “EIN“
- 🔹 “NAME“ and “APPLICATION_TYPE“ used for binning
- 🔹 Two hidden relu layers
- 🔹 100 epochs to train the model
Method 3 – 78% Accuracy, 2% Loss
- 🔸 Achieved by dropping one non-beneficial ID column “EIN“
- 🔸 “NAME“ and “APPLICATION_TYPE“ used for binning
- 🔸 Two hidden relu layers
- 🔸 200 epochs to train the model
Method 4 – 78% Accuracy, 0.8% Loss
- 🔹 Achieved by dropping more columns "EIN", "STATUS" and "SPECIAL_CONSIDERATIONS"
- 🔹 “NAME“ and “APPLICATION_TYPE“ used for binning
- 🔹 Three hidden relu layers
- 🔹 100 epochs to train model
Method 5 – 79% Accuracy, 0.5% Loss
- 🔸 Achieved by dropping one non-beneficial ID column “EIN“
- 🔸 “APPLICATION_TYPE“ and “CLASSIFICATION“ used for binning
- 🔸 Used kerastuner to obtain the best hyperparameters
- 🔸 Maximum of 20 epochs to train the model
- 🔸 Maximum of 30 hidden layers and neutrons
- 🔸 Uses relu and tanh to train model
- 🔸 Test trials split into two – 25 each round with tanh performing best in both trials
Summary
Overall, the best results were seen when kerastuner was used to perform test as it uses various epochs and relu and tanh for activation layers to test. While watching trials it was clear that tanh performs best, noticeably with a smaller number of epochs. I was successul at achieving over 75% accuracy with all optimization models, if I had more time I would like to see if it is possible to achieve 80-90% accuracy while maintaining a low loss rate.