ACM Research Coding Challenge (Spring 2022)

No Collaboration Policy

You may not collaborate with anyone on this challenge. You are allowed to use Internet documentation. If you do use existing code (either from Github, Stack Overflow, or other sources), please cite your sources in the README.

Submission Procedure

Please follow the below instructions on how to submit your answers.

Create a public fork of this repo and name it ACM-Research-Coding-Challenge-S22. To fork this repo, click the button on the top right and click the "Fork" button.
Clone the fork of the repo to your computer using git clone [the URL of your clone]. You may need to install Git for this (Google it).
Complete the Challenge based on the instructions below.
Submit your solution by filling out this form.

Assessment Criteria

Submissions will be evaluated holistically and based on a combination of effort, validity of approach, analysis, adherence to the prompt, use of outside resources (encouraged), promptness of your submission, and other factors. Your approach and explanation (detailed below) is the most weighted criteria, and partial solutions are accepted.

Question One

Binary classification is a type of classification task that labels elements of a set (i.e. dataset) into two different groups. An example of this type of classification would be identifying if people had a specific disease or not based on certain health characteristics. The dataset found in mushrooms.csv holds data (22 different characteristics, specifically) about different types of mushrooms, including a mushroom's cap shape, cap surface texture, cap color, bruising, odor, and more. Remember to split the data into test and training sets (you can choose your own percent split). Information about the meaning of the letters under each column can be found within the file attributelegend.txt.

With the file mushrooms.csv, use an algorithm of your choice to classify whether a mushroom is poisonous or edible.

You may use any programming language you feel most comfortable. We recommend Python because it is the easiest to implement. You're allowed to use any library or API you want to implement this, just document which ones you used in this README file. Try to complete this as soon as possible.

Regardless if you can or cannot answer the question, provide a short explanation of how you got your solution or how you think it can be solved in your README.md file. However, we highly recommend giving the challenge a try, you just might learn something new!

My Approach:

Neural Network (using Keras/TensorFlow)

Model details:

keras.Sequential
adam optimizer function
- chosen because it's general purpose/ works well on a lot of dl problems, and has adaptive learning rate which simplified things for me
cross entropy loss function
- standard for binary classification problems
I also tried using batch normalization and dropout layers but they didn't provide a noticeable benefit to speed or accuracy so I removed them later to simplify the model.

Model diagram:

Libraries used:

Keras/Tensorflow for the actual model
Matplotlib for graphing the training results
Scikit-learn for data preprocessing ( BinaryEncoding of the class column, OneHotEncoding of the categorical data columns, training/testing splits )
Pandas for data manipulation (reading the csv, input/output separation, selecting training history data to plot)

Resources used:

Keras/Tensorflow docs
Scikit-Learn docs
Kaggle Introduction to Deep Learning

Results:

Best Validation Loss: 0.00008
Best Validation Accuracy: 1.00000

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
saved_model/my_model		saved_model/my_model
.gitattributes		.gitattributes
Figure_1.png		Figure_1.png
Figure_2.png		Figure_2.png
README.md		README.md
attributelegend.txt		attributelegend.txt
model_plot.png		model_plot.png
mushroom_classification.py		mushroom_classification.py
mushrooms.csv		mushrooms.csv
predict.py		predict.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACM Research Coding Challenge (Spring 2022)

No Collaboration Policy

Submission Procedure

Assessment Criteria

Question One