Skip to content

Portfolio including my data science projects for academic, self-learning, and hobby.

License

Notifications You must be signed in to change notification settings

Shantanu-Gupta-au16/Data-Science-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Science-Portfolio:fire:

alt text

Contents

  • R

    1. Data Visualization: Corruption and Human Development: The purpose of this project is to perform data visualization to explore the relationship between Corruption and Human Development across various nations based on UN Human Development Report. A scatter plot for the relationship between the 'Human Development Index' and the 'Corruption Perceptions Index' of countries.

    2. Visualizing Inequalities in Life Expectancy: Do women live longer than men? How long? Does it happen everywhere? Is life expectancy increasing? Everywhere? Which is the country with the lowest life expectancy? Which is the one with the highest? In this Project, I will answer all these questions by manipulating and visualizing United Nations life expectancy data using ggplot2.The dataset can be found here and contains the average life expectancies of men and women by country (in years). It covers four periods: 1985-1990, 1990-1995, 1995-2000, and 2000-2005.

    3. Rise and Fall of Programming Languages: How can you tell what programming languages and technologies are used by the most people? How about what languages are growing and which are shrinking, so that you can tell which are most worth investing time in? One excellent source of data is Stack Overflow, a programming question and answer site with more than 16 million questions on programming topics. By measuring the number of questions about each technology, you can get an approximate sense of how many people are using it. In this project, you'll use open data from the Stack Exchange Data Explorer to examine the relative popularity of languages like R, Python, Java and Javascript have changed over time.

    4. Degress that pay you back: Wondering if that Philosophy major will really help you pay the bills? Think you're set with an Engineering degree? Whether you're in school or navigating the postgrad world, this project will guide you in exploring the short- and long-term financial implications of this major decision. After doing some data clean up, you'll compare the recommendations from three different methods for determining the optimal number of clusters, apply a k-means cluster analysis, and visualize the results.

    5. A Visual History of Nobel Prize Winners:The Nobel Prize is perhaps the world's most well known scientific award. Every year it is given to scientists and scholars in chemistry, literature, physics, medicine, economics, and peace. The first Nobel Prize was handed out in 1901, and at that time the prize was Eurocentric and male-focused, but nowadays it's not biased in any way. Surely, right?Well, let's find out! In this project, you get to explore patterns and trends in over 100 years worth of Nobel Prize winners. What characteristics do the prize winners have? Which country gets it most often? And has anybody gotten it twice? It's up to you to figure this out.

  • Python

    1. Telecom Customer Churn:Customer churn occurs when customers or subscribers stop doing business with a company or service, also known as customer attrition. It is also referred as loss of clients or customers. One industry in which churn rates are particularly useful is the telecommunications industry, because most customers have multiple options from which to choose within a geographic location.- Data Source: The dataset is available on kaggle data source and you can directly read this notebook into Google Colaboratory. By building a model to predict customer churn with Logistic Regression, ideally we can nip the problem of unsatisfied customers in the bud and keep the revenue flowing.

    2. Directing Customers to Subscription Through App Behavior Analysis:In today’s market, many companies have a mobile presence. Often, these companies provide free products/services in their mobile apps in an attempt to transition their customers to a paid membership. Some examples of paid products, which originate from free ones, are Youtube Red, Pandora premium, audible subscription, and you need a budget. Since marketing efforts are never free, these companies need to know exactly who to target with offers and promotions.

    3. Minimizing Churn Rate Through Analysis of financial habits:Subscription Products often are the main source of revenue for companies across all industries. These products can come in the form of a ‘one size fits all’ overcompassing subscription, or in multi-level memberships. Regardless of how they structure their memberships, or what industry they are in, companies almost try to minimize customer churn (a.k.a subscription cancellations).To retain their customers, these companies first need to identify the behavioural pattern that acts as a catalyst in disengagement with the product.

    4. Car Price Prediction:The dataset for this paper has been obtained from the UCI Machine Learning Repository. Car Price Prediction using Ridge & Lasso Regression:

    5. Customer Segmentation using RFM analysis: Python code using RFM model to segment customers. You can use it to perform RFM anlaysis to segment customers based on their purchase history.

    6. Movie Recommendations using Recommender Systems: recommender systems are used to suggest movies or songs to users based on their interests.A micro project to build a recommendation system that makes movie recommendations based on user review similarities.

    7. Minimizing Churn Rate through analysis of financial habits:Developed an Machine learning model with Random Forest classifier after feature selection and hyper parameter tuning the model accuracy was 79.83% based on the financial habits of the customers in the Bank Database.

    8. Declining in Viewership in Digital Media Company:A digital media company (similar to Voot, Hotstar, Netflix, etc.) had launched a show. Initially, the show got a good response, but then witnessed a decline in viewership. The company wants to figure out what went wrong.This is a real life case study related to a streaming video company say Hotstar/Netflix. The company launched a particular show, the problem is initially the TRP for that show was very good but suddenly the company notice the decline in the TRP for that particular show. They were interested to find out that what can be the possible reason due to which their show viewership has been decreased and what action they can take to fix that problem. This is a multiple regression model case and we have to build a perfect model to know what are the particular factors/columns which are impacting the viewership and to predict its views in the future.

    9. 911 calls:Exploratory Data Analysis of the 911 calls dataset hosted on Kaggle. Demonstrates extraction of useful features from different variables.

    10. Predicting the likelihood of E-signing a loan based on financial history:Lending companies work by analyzing the financial history of their loan applicants, and choosing whether or not the applicant is too risky to be given a loan. If the applicant is not, the company then determines the terms of the loan. To acquire these applicants, companies can organically receive them through their websites/apps. often with the help of advertisement campaigns. Other times. lending companies partner with peer-to-peer (P2P) lending marketplaces, in order to acquire leads of possible applicants. Some example marketplaces include Upstart. Lending Tree, and Lending club. In this project, we are going to asses the quality of the leads our company receives from these marketplaces.

  • Deep Learning

    1. Fashion-Class-Classification-using-MNIST-dataset: Training AI machine learning models on the Fashion MNIST dataset.Read the full article at Image Recognition for Fashion with Machine Learning

    2. MNIST Using PCA The global fashion industry is valued at three trillion dollars and accounts for 2 percent of the world's GDP the fashion industry is undergoing a dramatic transformation by adopting new computer vision,machine learning and deep learning techniques.

    3. Deep Learning for Time Series:Americans are driving more than ever before.Predicted and plotted the future traffic trends using the RNN & LSTM deep learning models.

  • Natural Language Processing

    1. Named Entity Recognition: Named entity recognition (NER)is probably the first step towards information extraction that seeks to locate and classify named entities in text into pre-defined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Named Entity Recognition, also known as entity extraction classifies named entities that are present in a text into pre-defined categories like “individuals”, “companies”, “places”, “organization”, “cities”, “dates”, “product terminologies” etc. It adds a wealth of semantic knowledge to your content and helps you to promptly understand the subject of any given text.

    2. Part of Speech Assessment:

    3. Text Classification:

    4. Text Generation with Neural Networks:

  • Time Series

    1. Mauna Loa Atmospheric CO2 Concentration Forecasting using SARIMA: Trends and seasonal variation in time-series models.Atmospheric CO2 concentrations (measured in parts per million) derived from air samples collected at Mauna Loa Observatory, Hawaii.

    2. Miles Travelled using ARIMA Model:Americans are driving more than ever before.Predicted and plotted the future traffic trends using the RNN & LSTM deep learning models.

    3. Avocado Price Prediction:Predict the avocado prices given Kaggle dataset.

License

MIT

Help

If you find any mistakes or you can't figure out something, raise a question. I will get back to you as soon as possible. If you liked what you saw, want to have a chat with me about the portfolio, work opportunities, or collaboration, shoot an email at 📧shantanu97@gmail.com .More information about me: LinkedIn 🔎