AMLProject2

Attributes:

Age
Sex
Chest Pain Type
Resting Blood Pressure
Cholesteral
Fasting Blood Sugar
Resting Electrocardiographic Results
Maximum Heart Rate
Exercise Induced Angina
ST Depression Induced By Exercise Relative To Rest
Slope of the peak exercise ST segment
Number of major vessels
thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
Diagnosis of heart disease (predicted)

Final project – Logistic Regression, SVM and Neural Networks

For project 1, you selected a data set and investigated how kNN classifier could help with classifying testing samples after training/learning. For project 2, you will be continuing your exploration with other supervised ML approaches – logistic regression, support vector machine, and Neural Nets.

This project is to be done in a group. Each student is responsible for contributing to the group, including problem formulation, dataset selection, ML tool implementation, and project presentation.

Requirement:

Source code in Python (done in a group). Your code must run on CS lab machines.
Individual project report (~6 pages + appendices if needed, fonts>=11)

Specific requirements:

Dataset:

Each group needs to pick a new dataset to work on.
Dataset must be interesting and challenging (if the accuracy is very high, say 99% using a knn or very low (<50%), select a different dataset! That means either the problem can be solved without any machine learning algorithm or beyond what we have learned in this class.)

Your individual report that includes:

Abstract - Give a brief presentation of the problem, dataset used, summarize the methods, and outline your results and conclusions.

Introduction - Detailed problem description and background of the dataset. Justify the dataset is appropriate and worth to explore. Outline approaches you take to solve the problem.
Statistical summary of your data - For each class, what are: max, min, mean, median, mode, standard deviation. If you used only a subset of attributes, justify why other attributes were not used. Summary what the statistics tells you, any insights you have obtained from the statistics.
Methods - A brief description of each model, logistic regression, support vector machine (linear kernel), and neural networks. Also include what ranges of parameters and neural network architectures (consider at least 2 different hidden layers with different # of neurons and 2 different gradient decent solvers) you’ll consider exploring and why? Demonstrate you have an intuitive understanding of the ML algorithms.
Results - Summary of your classification results, including best set of parameters and architectures, accuracy, and confusion matrices from a) logistic regression, b) SVM, and c) neural nets.
Discussion - Describe and analyze the results. Are the results what you expected? How do the three different models compare? Why one is better or worse than another?
Conclusion of your exploration. Did you solve the problem? How helpful are the ML algorithms in terms of answering your questions? What have you learned?
(Graduate student) Give a more detailed description of each model, logistic regression, support vector machine, and neural nets, and compare SVM with linear kernel and SVM with Gaussian kernel. One page per model. So, 3 additional pages.
References

Demo:

Your Python script.
Here are some sample datasets:

Flight Delays and Cancellations: https://www.kaggle.com/usdot/flight-delays
Heart Disease Data Set: https://archive.ics.uci.edu/ml/datasets/Heart+Disease
MIT Leukemia cancer dataset: http://portals.broadinstitute.org/cgibin/cancer/publications/pub_paper.cgi?mode=view&paper_id=43

Submission instructions:

This project has multiple due dates, tentatively:

March 9th: Dataset selection
March 30: Intro, Stats section of your individual report (words or pdf)
April 6: Methods and Results from Logistic regression & SVM (words or pdf)
April 13: Results from neural nets (words or pdf)
April 20: Discussions & Conclusions (words or pdf)
April 25: Presentation (ppt, 1 copy per group)
April 25: An electronic copy of your Python scripts (yes, need only 1 copy per group)
April 25: Final individual project report (words or pdf).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
README.md		README.md
heart_cleveland_upload.csv		heart_cleveland_upload.csv
projectTwo.py		projectTwo.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AMLProject2

Attributes:

Final project – Logistic Regression, SVM and Neural Networks

Requirement:

Specific requirements:

Dataset:

Your individual report that includes:

Demo:

Submission instructions:

About

Releases

Packages

Languages

AMTuttle02/AMLProject2

Folders and files

Latest commit

History

Repository files navigation

AMLProject2

Attributes:

Final project – Logistic Regression, SVM and Neural Networks

Requirement:

Specific requirements:

Dataset:

Your individual report that includes:

Demo:

Submission instructions:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages