Welcome to my portfolio!
In here listed all of the projects I've tried my hands on. Check them out!
Mostly data-related things
Project | Completion Date | Tools | Description |
---|---|---|---|
Coursera Data Warehouse for Business Intelligence Capstone | June 2023 | Postgres, dbt, Looker | Finished the final part of Coursera's Data Warehouse for Business Intelligence Specializations assignments and case studies involving modeling (Kimball-style), integration, query problems, and dashboard building |
Fluffy Folks Dashboard | April 2023 | GraphQL API, sqlite, dbt, Streamlit, Airflow, Google Sheets | End-to-end pipeline of my online circle's anime and manga-related statistics. Initially built to address painful manual tracking and curation that a friend of mine did weekly. It includes a sheet and dashboard for user's access with 11 kinds of statistics. Sheets available online here |
Anime and Manga Recommendation API | March 2023 | FastAPI, Docker | Deploying the recommendation model and association rules I've created on an API. Using best practices I've learned through various resource about how to best use Docker for a Python deployment |
Caltech Data Engineering Exercises | January 2023 | Postgres, MongoDB | Working through Caltech Data Engineering course exercises and case studies alongside the materials given |
Project | Completion Date | Tools | Description |
---|---|---|---|
Hackernews Topic Modeling | November 2022 | SQL, asyncio, pandas, BERTopic | Extract Hacker News public dataset served in BigQuery, performs data cleaning routine, and builds a topic model from the data. Curates personal browsing history to build user profile from the topic model that can be used for content-based recommendation |
MAL Favorites Association Rules | February 2022 | pyspark | Uses Spark's FP Growth algorithm to mine user's favourites data to discover patterns and rules of favourites between each anime, manga, characters, and staff |
MAL Recommendation Model | November 2021 | pandas, Matplotlib, PyTorch, Spotlight | Build a Matrix Factorization model that could be used as an engine for a recommendation systems that can recommend anime and manga for users. Trained using users' ratings data from MyAnimelist |
Project | Completion Date | Tools | Description |
---|---|---|---|
Stratascratch Business Analysis | June 2023 | pyspark, pandas, Matplotlib, H3, folium | A data projects (case study) of analyzing reasons of failures for a corporate transportation management company. Answering business questions given in the task and doing the bonus task of visualizing where most failures happen in a map |
Tab Session Manager Deduplicator | October 2022 | pandas | Data cleaning and record deduplication of my Tab Session Manager sessions, along with a little bit of fun exploration |
Skripsi Kawanku | 2021 ~ | requests, bs4, Scrapy, Selenium, Tweepy, pandas, NLTK, scikit-learn, Matplotlib, seaborn | Helping some of my friends working through their bachelor's thesis; I get involved in various things in all phases of the data science lifecycle in many projects |
MAL Scraper and EDA | 2021 | requests, bs4, pandas, Matplotlib | Collects users' profile information and ratings from MyAnimelist, run deduplication, clean the data to find sources of errors, and do EDA on the dataset |
Project | Completion Date | Tools | Description |
---|---|---|---|
SQL Interview Preps | June 2023 ~ | SQL | Working through SQL exercises in various coding platforms (Hackerrank, Leetcode, Stratascratch), and other exercises I could find |
Scala Forth Exercism | March 2023 | Scala | Implementing a simple subset of Forth, a stack-oriented programming language |
Scala Advent of Code | December 2022 | Scala | Working through Advent of Code 2022 problems (only made it through 14 days) |
Non-data related things!
- Fixing Vim Vixen extension bug (2022)
- Testing Notes (2021)
- React and Firebase E-Voting App (2019)
- Plant Identification using AutoML (2019)
- Personal shell scripts repository (2019 ~)
This portfolio is inspired by Katie Huang's Portfolio