Skip to content

mariagilr/ds-projects

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

56 Commits
 
 
 
 
 
 

Repository files navigation

Data Science Projects

Presentations about data science.

Supervised Learning Projects

  • Detecting_Implicit_Bias_in_Traffic_Stops by Mark Ferguson.

  • Lemons: Predicting whether a Vehicle will be kicked back to the auction by Will Morgan.

  • Predicting the success of cyber-related terrorist attacks by Rebecca Green.

  • Breast cancer survivor models by Rich Gohram.

  • Predicting Disruptive Children (Including visualization of PCA on binary variables) by Greg Condit

  • Predicting churning teleco customers by Eve Ben Ezra. Churn, or customer attrition, is the loss of customers. Churn is an area of interest for many industries, since it is often more expensive to bring in a new customer than to retain one. Using the popular Telco Customer Churn dataset from Kaggle, I hope to explore the data and determine which features might cause a customer to leave, and if a combination of features might make a customer "high risk" for leaving the company.

  • Santander Bank Customer Transaction Prediction by Fred Etter. Bank Santander is trying to predict if a customer will make a specific transaction in the future. Ananomyzed data was presented to Kaggle with 200,000 rows and 200 columns. Multiple supervised learning algorithms were tested and evaluated to determine the best method and produce an accuracy metric.

  • Safe Driver Prediction for Automobile Insurance by Murali Mandayam. Correctly classifying a driver during underwriting is an important aspect in automobile insurance. All the supervised learning algorithms I used classify a driver as a 1, to indicate safe driver, or 0, to indicate that the drivers' information needs a review prior to issuing a policy.

  • Digit Recognizer by Slava Sablin. A pretty straightforward approach to test some basic models and their combinations on a classic machine learning problem. The goal is to correctly identify digits from a MNIST ("Modified National Institute of Standards and Technology") dataset of tens of thousands of handwritten images.

  • Heart Disease Prediction by Valentin Fehr. A look into the UCI heart disease datset. Predicting heart disease using supervised learning with a focus on feature selection using SHAP and real life cost of acquiring said features.

  • Predicting Divorce by Helen Skinner. Is it possible to predict whether an individual has ever been divorced based on their demographic traits? This supervised learning project tests 5 different algorithms to find out.

  • Predicting Forest Fire Causes by Matt Francsis. Human-caused fires account for between 43 and 59% of all wildfires in the western US. While wildfires can be beneficial to the ecosystem, they also pose serious threats to lives, property, and infrastructure. Predicting the cause of forest fires can assist investigators bring arsonists to justice and act as a catalyst for fire abatement strategies. This talk will discuss supervised learning modeling techniques for this large, imbalanced, multi-class, problem.

Unsupervised learning report

  • Math lectures Part 1 Combine NLP with supervised and unsupervised learning to classify math lectures. By William Morgan.

Final capstone

  • Predicting Life Expectancy by Country by Trent Casillas. Using linear regression, mixed effect models, and clustering to predict and determine important factors for a country's life expectancy average.

  • Cover to Cover: A (not so) Novel Approach to Book Reccommendations by Mark Espina. The saying goes "Don't Judge a book by it's cover" But Why? Anyone who shops at a local bookstore is definitely paying attention to the covers. And from personal experience, it is a key determinant on whether I end up purchasing a book. First, I will discuss the pros and cons of applying Convolutional Neural Nets to Image Classification, attempting to predict genre labels. In the second half, I will be exploring the application of feature extraction with similarity models as the basis for an Image Content-based retrieval system, Cover-to-Cover.

  • Using machine learning to cluster and classify math lectures by Will Morgan. Using machine learning to cluster and classify math lectures.

  • Capstone_2016_us_elections by Emile Badran. In this capstone project, I process tweets from the leading Democratic (Hillary Clinton) and Republican (Donald Trump) candidates and key 2016 US election hashtags. I apply Natural Language Processing and Network Analysis techniques to find the key topics, and the most influential actors that have guided the public debate.

  • DNA Sequence detection with Genetically trained weights by Chistopher Sanchez

  • Assessing Gender Bias in Tech Job Descriptions by Tiffany French. After reading a report and infographic from the World Economic Forum about gender inequity in AI positions, I designed this project to use NLP techniques to assess for bias in job descriptions, that could ultimately lead to the gender inequity we see in hiring. I used web scraping techniques, LDA and PyLDAviz, as well as supervised techniques to gain understanding and identify future areas of research.

  • PyTrader: Algorithmic Trading and Time Series Predictions Using LSTM by Sohaib Khuram. After exploring the capabilitites of time series models through traditional ARIMA methods and LSTM neural networks, I decided to use these models to predict stock price direction and implement algoritmic trading strategies to see how accurate the results are. Using 4 separate strategies based on technical indicators, I was able to create an accurate model using LSTM that closely replicated trade signals around the original data. The strategies were then backtested on Quantopian to see how they performed on historical data.

  • Unmasking the Face: Emotion detection using Machine Learning by Maria Gil Rodriguez. Humans are responsible for six basic emotions: happiness, anger, surprise, sadness, fear, and disgust. Creating a model that accurately classifies these emotions can be extremely useful in a variety of areas such as image processing, cybersecurity, robotics, psychological studies, virtual reality, etc. The main objective of this capstone is to create models to categorize faces based on the emotion shown in the expression into one of seven categories (the six basic emotions plus one category for neutral).

  • The Twelve Pack: Classifying Beer by Images of Label by Kyle Knoebel. The Twelve Pack is a concept program that aims to classify beers from pictures of the label using Tensorflow/Keras. Images were collected from twelve beers using an iPhone 8 Plus, and the images were fed into a Keras model using the Flow from Dataframe method of the Keras ImageDataGenerator. The final objective of this program is to classify beers from images of the label, and connect to an API to provides more information about the beer such as ingredients, flavor profile, and locations where the beer is sold.

Coursework repositories

  • Please fork this repo, add link and make a pull request to add your repo here.

About

Data science portfolio examples.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published