Data Science Portfolio

Data Science Portfolio

This is a portfolio containing my data science projects for academic, self-learning purpose.

They are written in either Python or R.

Data Analysis and Visualization

San Francisco City Data Analysis

In this notebook, I look deep into an SQLite database that contains data of crime, parking, schools, housing, etc., and gain insights of housing choices from various maps using Matplotlib Basemap toolkit.

Naural Language Processing

Open-Source Python package steam-review-scraper

(PyPI page; Github page)

Be inspired by the project of Steam review analysis on Game No Man's Sky I did previously, I decided to wrap up a package that can download reviews of any game from Steam given the game id. Therefore, this package has the following functionalities:

get game id from a given game name
get a list of n game ids
get the total number of reviews for a game
get all reviews of a game including attributes such as review content, review post date, recommendation, helpfulness.

Web Scraping and Sentiment Analysis for Steam reviews of No Man's Sky

(Jupyter: Data Collection; Analysis)

No Man's Sky got massive negative reviews at its launch time because of its failure to meet the features it had promised to players. The situation has changed after several big updates. This notebook collects all the english steam reviews of this indie game, and explores how this game improves. Finally, a model is trained for sentiment analysis.

Machine Learning

Home Credit Default Risk Challenge

(Jupyter)

This problem is a former competition on Kaggle that predicts the applicants' capability of repaying loans. In this notebook, I am not compete. I read all the great kernels in this competition and tried to learn from those top competitors about how to gain insights from massive data and do feature engineering.

Time Series

Nintendo Stock Price Analysis

(R Markdown)

Time series analysis using ARIMA, POMP, GARCH models to log return of Nintendo Stock Price.

Statistical Test

NBA Player Salary Analysis

(R Markdown)

In this project I use the the 2017-2018 season players's salaries and stats of regular seaseon data to explore the most important factors that affect players salaries with a linear regression model. I use mocked data and Monte Carlo approach to choose between nonparametric bootstrap and parametric bootstrap based on their power of Hypothesis Test that whether coefficients equal to zero, and use the preferred one to create confidence interval for the coefficients.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data Science Portfolio

Data Analysis and Visualization

San Francisco City Data Analysis

Naural Language Processing

Open-Source Python package steam-review-scraper

Web Scraping and Sentiment Analysis for Steam reviews of No Man's Sky

Popular topics in mainstream media

Machine Learning

Home Credit Default Risk Challenge

Time Series

Nintendo Stock Price Analysis

Statistical Test

NBA Player Salary Analysis

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data Science Portfolio

Data Analysis and Visualization

San Francisco City Data Analysis

Naural Language Processing

Open-Source Python package steam-review-scraper

Web Scraping and Sentiment Analysis for Steam reviews of No Man's Sky

Popular topics in mainstream media

Machine Learning

Home Credit Default Risk Challenge

Time Series

Nintendo Stock Price Analysis

Statistical Test

NBA Player Salary Analysis