UNC_data_bootcamp_module_21
The nonprofit foundation Alphabet Soup wants a tool that can help it select the applicants for funding with the best chance of success in their ventures. With your knowledge of machine learning and neural networks, you’ll use the features in the provided dataset to create a binary classifier that can predict whether applicants will be successful if funded by Alphabet Soup. From Alphabet Soup’s business team, you have received a CSV containing more than 34,000 organizations that have received funding from Alphabet Soup over the years. Within this dataset are a number of columns that capture metadata about each organization, such as:
- EIN and NAME—Identification columns
- APPLICATION_TYPE—Alphabet Soup application type
- AFFILIATION—Affiliated sector of industry
- CLASSIFICATION—Government organization classification
- USE_CASE—Use case for funding
- ORGANIZATION—Organization type
- STATUS—Active status
- INCOME_AMT—Income classification
- SPECIAL_CONSIDERATIONS—Special considerations for application
- ASK_AMT—Funding amount requested
- IS_SUCCESSFUL—Was the money used effectively
from the UNC Bootcamp instructions for this challenge
This challenge will be executed using Google Colab and completed by performing the following 5 steps per challenge instructions:
- Preprocess the Data
- Compile, Train, and Evaluate the Model
- Optimize the Model
- Write a Report on the Neural Network Model
- Copy Files Into This Repository
In this step I'll use my knowledge of Pandas and scikit-learn’s StandardScaler()
, to preprocess the dataset, following the instructions outlined below for the initial model called deep_learning_nn_SDT.ipynb
. These steps prepare the data for Step 2, where I'll compile, train, and evaluate the neural network model.
Start by uploading the starter file to Google Colab, then using the information we provided in the Challenge files, follow the instructions to complete the preprocessing steps.
- Read in the
charity_data.csv
to a Pandas DataFrame, and be sure to identify the following in your dataset:
- What variable(s) are the target(s) for your model?
- What variable(s) are the feature(s) for your model?
- Drop the
EIN
andNAME
columns. - Determine the number of unique values for each column.
- For columns that have more than 10 unique values, determine the number of data points for each unique value.
- Use the number of data points for each unique value to pick a cutoff point to bin "rare" categorical variables together in a new value,
Other
, and then check if the binning was successful. - Use
pd.get_dummies()
to encode categorical variables. - Split the preprocessed data into a features array,
X
, and a target array,y
. Use these arrays and thetrain_test_split
function to split the data into training and testing datasets. - Scale the training and testing features datasets by creating a
StandardScaler
instance, fitting it to the training data, then using thetransform
function.
For the next step in the challenge, I’ll design a neural network, or deep learning model, to create a binary classification model that can predict if an Alphabet Soup-funded organization will be successful based on the features in the dataset. I will need to think about how many inputs there are before determining the number of neurons and layers in your model. Once that step is completed, then I can compile, train, and evaluate the binary classification model to calculate the model’s loss and accuracy. Complete the following from the challenge instructions:
- Continue using the file in Google Colab in which you performed the preprocessing steps from Step 1.
- Create a neural network model by assigning the number of input features and nodes for each layer using TensorFlow and Keras.
- Create the first hidden layer and choose an appropriate activation function.
- If necessary, add a second hidden layer with an appropriate activation function.
- Create an output layer with an appropriate activation function.
- Check the structure of the model.
- Compile and train the model.
Create a callback that saves the model's weights every five epochs.Not required for this challenge.- Evaluate the model using the test data to determine the loss and accuracy.
- Save and export your results to an HDF5 file. Name the file
AlphabetSoupCharity.h5
.
After running the model the for the first time I will need to optimize the model to achieve a target predictive accuracy higher than 75%. This could take multiple attempts, however per the instructions we should not exceed three. Below are listed steps and/or hints to complete the next models, and perhaps achieve the target.
Use any or all of the following methods to optimize your model:
- Adjust the input data to ensure that no variables or outliers are causing confusion in the model, such as:
- Dropping more or fewer columns.
- Creating more bins for rare occurrences in columns.
- Increasing or decreasing the number of values for each bin.
- Add more neurons to a hidden layer.
- Add more hidden layers.
- Use different activation functions for the hidden layers.
- Add or reduce the number of epochs to the training regimen.
- Create a new Google Colab file and name it
AlphabetSoupCharity_Optimization.ipynb
. - Import your dependencies and read in the
charity_data.csv
to a Pandas DataFrame. - Preprocess the dataset as you did in Step 1. Be sure to adjust for any modifications that came out of optimizing the model.
- Design a neural network model, and be sure to adjust for modifications that will optimize the model to achieve higher than 75% accuracy.
- Save and export your results to an HDF5 file. Name the file
AlphabetSoupCharity_Optimization.h5
.
For this part of the challenge, I’ll write a report in a markdown file on the performance of the deep learning model I created for Alphabet Soup. The report will be called Report_AlphabetSoup_SDT.md
and it will contain the following format and answer the questions per challenge instructions:
- Overview of the analysis: Explain the purpose of this analysis.
- Results: Using bulleted lists and images to support your answers, address the following questions:
- Data Preprocessing
- What variable(s) are the target(s) for your model?
- What variable(s) are the features for your model?
- What variable(s) should be removed from the input data because they are neither targets nor features?
- Compiling, Training, and Evaluating the Model
- How many neurons, layers, and activation functions did you select for your neural network model, and why?
- Were you able to achieve the target model performance?
- What steps did you take in your attempts to increase model performance?
- Summary: Summarize the overall results of the deep learning model. Include a recommendation for how a different model could solve this classification problem, and then explain your recommendation.
When finished with the analysis in Google Colab, I'll move the files into my repository for final submission performing the following steps.
- Download your Colab notebooks to your computer.
- Move them into your Deep Learning Challenge directory in your local repository.
- Push the added files to GitHub.
Starter_Code
- Stater_Code.ipynb
Special Thanks:
- Jamie Miller
- Mounika Mamindla
- Lisa Shemanciik
(where possible will provide link to website)