Skip to content

This is the collection of all the projects I have done related to DataScience, Machine Learning and AI.

Notifications You must be signed in to change notification settings

MuhammadAhmedSuhail/DataScience-AI-ML-Portfolio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 

Repository files navigation

DataScience-AI-ML Portfolio

This is a collection of projects related to DataScience, Machine Learning and AI done for academic and self-learning purposes.
This is updated on regular basis.

Projects

Exploratory Data Analysis (EDA):

  • EDA on Pakistan Floods: This project aims to perform exploratory data analysis (EDA) using visualizations on the floods in Pakistan. The goal is to scrape data, extract insights and patterns from the flood data and create meaningful visualizations to better understand the situation. The project involves data scraping, data preprocessing, and data visualization using Python libraries like Selenium matplotlib and Seaborn.
  • EDA on DBLP using Python and MongoDB: The purpose of this project is to perform exploratory data analysis (EDA) on the DBLP computer science bibliography. By analyzing the bibliographic information on major computer science journals and proceedings, I hope to gain insights into the trends and patterns in computer science research over time. I will use MongoDB to store and manage the data, and use pandas, matplotlib and Seaborn to generate visualizations.

Web Scraping:

  • WebScraping Ecommerce Website: This project is a web scraping tool that extracts data from the e-commerce website Daraz.pk. The tool can scrape reviews of a single product page, search for products using a keyword search and return the top 80 elements for that keyword, and extract products on a flash sale and return the product name, price, discounted price, top 3 reviews, and rating.

Data Analysis:

  • Weather Data Analysis using Python: The aim of this project is to perform a comprehensive analysis of weather data using Python. Weather data provides us with information about temperature, humidity, wind speed, and other atmospheric conditions. This project also aims to address the common challenge of missing data in data analysis. The dataset used is "Weather Records.xlsx", which contains 13 columns, including a prediction column for precipitation (in).
  • Customer Churn Analysis: In today’s retail industry, understanding customers better is critical for businesses to stay competitive and profitable. Retail data is growing exponentially in variety, volume, velocity & value with each passing year. Smart retailers are well aware that this data can be utilized and eventually holds the prospective for profit. As a result, retailers are becoming more conscious about utilization of data and information kept in their repositories, so they can integrate and analyze these large volumes of data to come up with results that can support the quality of their decision-making, in order to stay at a competitive advantage and to increase profits. This project analyzes a dataset provided by a food store in Pakistan dated between September 17, 2014, and October 26, 2014 (6 weeks approx.) to understand customer churn behavior. The dataset involves only three columns, and each row represents the purchasing done by customers in each day.
  • Retail Analysis using R: In today’s retail industry, understanding customers better is critical for businesses to stay competitive and profitable. Retail data is growing exponentially in variety, volume, velocity & value with each passing year. Smart retailers are well aware that this data can be utilized and eventually holds the prospective for profit. As a result, retailers are becoming more conscious about utilization of data and information kept in their repositories, so they can integrate and analyze these large volumes of data to come up with results that can support the quality of their decision-making, in order to stay at a competitive advantage and to increase profits. This project analyzes a dataset provided by a food store in Pakistan dated between September 17, 2014, and October 26, 2014 (6 weeks approx.) to understand customer churn behavior. The dataset involves only three columns, and each row represents the purchasing done by customers in each day.
  • Admissions Analysis using R: This project uses R to analyze the actual data of NUCES 2018 admissions. Each row corresponds to one student. The dataset includes various features such as HSSC marks, Matric marks, city of residence, and more. The goal of this project is to answer several questions about the dataset and gain insights into the admissions process.
  • Relation Analysis R: This project aims to analyze the admissions in a university using R programming language. The focus of this project is to test the relation between various variables such as SSC Marks Obtained, NU_BS_Test Marks, and HSSC Marks Obtained. The project also involves creating residual plots and calculating r2 values to confirm the results obtained.
  • Near Real-Time DataWarehouse Analysis: The objective of this project is to design, implement, and analyze a near-real-time Data Warehouse (DW) prototype for METRO Shopping Store in Pakistan. As one of the biggest superstore chains in Pakistan, METRO has thousands of customers, and it is important for the store to analyze the shopping behavior of their customers in real-time. Based on that, the store can optimize their selling strategies, such as giving promotions on different products.
  • MapReduce DBLP using Hadoop Framework: This project implements a MapReduce algorithm in Java to find the number of articles published in each journal per year from the DBLP articles dataset. The input to the MapReduce program is a XML file.The dataset contains bibliographic information about articles, including the journal/book title, authors, year, etc. We are interested in the number of articles published in each journal per year. The MapReduce algorithm consists of two stages: the map stage and the reduce stage.

Natural Language Processing (NLP):

  • Text Normalization NLTK: This project implements text normalization using the Natural Language Toolkit (NLTK) library. Text normalization is the process of converting text into a canonical (standard) form. The project uses various normalization techniques such as stemming, lemmatization, and removing stop words to preprocess text data. The project also implements specific rules to normalize certain types of text such as URLs, dates, and phone numbers. The langauge of the Text is Roman-Urdu.

Data Mining:

  • Apriori Algorithm and Multiple Minimum Support: This project involves the implementation of the Apriori algorithm in Python from scratch and extending it to handle multiple minimum support thresholds simultaneously. The project also involves the use of data visualization techniques to explore and analyze the frequent itemsets generated by the Apriori implementation. The maximum value for k in both algorithms will be 5.
  • Face Recognition Using KNN: In this project, I used a simplified version of the CMU Pose, Illumination, and Expression (PIE) Dataset to implement a k-Nearest Neighbors (k-NN) classifier for face recognition. The dataset consisted of 10 subjects spanning five near-frontal poses, and there were 170 images for each individual. In addition, all the images were resized to 32x32 pixels. The dataset was provided in the form of a CSV file with 1700 rows and 1024 columns. Each row was an instance and each column a feature. The first 170 instances belonged to the first subject, the next 170 to the second subject, and so on.
  • Foreground Segmentation using K-Means: In this project, I implemented a basic version of the interactive image cut-out / segmentation approach called Lazy Snapping. I was given several test images along with corresponding auxiliary images depicting the foreground and background seed pixels marked with red and blue brush-strokes, respectively. My program exploited these partial human annotations to compute a precise figure-ground segmentation.

Machine Learning:

  • House Price Prediction: The problem statement for this project is to predict the final sale price of residential homes in Ames, Iowa using machine learning algorithms based on the explanatory variables in the dataset. The target variable is SalePrice, which represents the property's sale price in dollars.
  • Kafka ML Classification App: This project is aimed at collecting and classifying data from a mobile phone's accelerometer and gyroscope sensors. The project is divided into three main parts Data collection, Data classification and Frontend implementation. The data collection involves creating a mobile app that collects data from the phone's sensors and stores it in a database. The data classification involves processing the data using machine learning classification models. The frontend implementation involves displaying live data from the phone and predicting the position or state of the phone based on the labeled data.
  • RUHA Voice Assistant: This project aims to train 5 different classifiers on the RUHA dataset using the scikit-learn library. The dataset is an audio dataset, and I will explore different machine learning algorithms and techniques to train the classifiers. After training the classifiers, I will evaluate their performance by displaying the confusion matrix, accuracy, recall, precision, and F1-measure. The most important part of the project is to create a web application using Flask, a Python web framework. I will deploy our project on the web application, where users can interact with it by providing input and getting the output. For example, a user can input "Ruha lights band kardo" on the website, and the 5 trained classifiers will predict the result, which will be creatively displayed on the webpage. After displaying the result, I will provide a result analytics button that will redirect the user to another webpage, where the complete result will be displayed, including how the 5 classifiers performed on the dataset.

Artificial Intelligence (AI):

  • Clashless-TimeTable-using-AI: This project aims to generate a clash-free timetable for a given set of courses and rooms using a Genetic Algorithm. The algorithm takes a timetable in CSV format as input, preprocesses the data to get the number of rooms, all courses, and all time slots, and then generates a new timetable that satisfies certain constraints, such as avoiding course clashes no section having multiple courses at the same time and no teacher having multiple classes at the same time.
  • Image Reconstruction using A* Algorithm: In this project, I implemented the A* algorithm to solve the problem of image reconstruction. My task was to reconstruct a 512x512 image that had been divided into 16x16 boxes and shuffled. To accomplish this, I created a Python program that takes input as the shuffled image and uses the A* algorithm to determine the optimal sequence of moves to reconstruct the original image. I used the skimage library to display the shuffled and reconstructed images side by side.
  • Sales Automation Assist: Developed Sales Automation Assist, an automated solution enhancing sales efficiency using Python. Implemented the Opener Agent to generate personalized cold opening emails and track interactions, and the Escalator Agent to handle lead responses and determine follow-up actions. Utilized LangChain, llama-index, and OpenAI APIs to optimize sales processes.
  • LegalEase: LegalEase is a platform designed to simplify the understanding of legal rights under Pakistani law, addressing the issue of legal illiteracy. It leverages advanced natural language processing (using Mistral) and secure data storage (Firebase Firestore) to provide clear and accessible legal information to the public. Promoted on LinkedIn, LegalEase quickly gained traction and received positive feedback for its impactful approach to improving legal awareness and accessibility.

Core Competencies

  • Methodologies: Machine Learning, Data Mining, Natural Language Processing, Advanced Statistics, AI, Big Data Analytics,Exploratory Data Analysis, Data Preprocessing,
  • Languages: Python (Pandas, Numpy, Scikit-Learn, Seaborn, Matplotlib), R (Dplyr, Ggplot2), SQL, C++, Java, JavaScript ( React, Node, Express )
  • Tools: Jupyter Notebook, Tableau, Git, Flask, MS Excel, VS Code, R Studio, Eclipse, MongoDB

About

This is the collection of all the projects I have done related to DataScience, Machine Learning and AI.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published