Skip to content

Data Mining to Extract Insights from Steel Industry Energy Consumption.

Notifications You must be signed in to change notification settings

mihson95/data_mining_project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Summary

This project was developed for CS4168 - Data Mining at UL. This project showcases data mining techniques such as EDA, Data Preparation, Clustering, Classification, and Regression.

Overview

The project delved into the nuances of energy consumption in the steel industry, leveraging Python and Jupyter Notebook for analysis. Beginning with thorough Exploratory Data Analysis (EDA), the project unveiled insights into statistical summaries, distribution patterns, correlations, and temporal trends within the dataset. Subsequent data preparation involved a meticulous process of cleaning, formatting, imputation, and transformation to ensure data integrity and usability. Clustering techniques were then applied to identify inherent patterns and groupings within the data, shedding light on potential energy consumption profiles. Classification tasks were undertaken to classify and predict energy consumption behaviour using a variety of classifiers, facilitating a deeper understanding of the dataset's characteristics. Lastly, regression analysis was employed to model and predict energy consumption trends, providing actionable insights for optimizing energy usage and promoting sustainability in the steel industry. Through this multifaceted approach, the project aimed to unlock valuable insights critical for informed decision-making and operational efficiency improvements.

Summary

The project analyzed energy consumption in the steel industry using Python and Jupyter Notebook. It began with Exploratory Data Analysis (EDA) to uncover trends and patterns. The EDA involved creating statistical summaries, analyzing distributions and correlations, and performing time series analysis with visualizations such as histograms, pair plots, and heatmaps. Data preparation included cleaning, formatting, imputation, transformation, feature selection, sampling, and validation. Clustering utilized K-Means, MDS, t-SNE, and Hierarchical methods, evaluated by the elbow method and silhouette score. Classification involved training models like SVM, Random Forest, K-Neighbors, MLP, and Naïve Bayes, with a thorough comparison using accuracy, precision, recall, F1-score, TPR, and AUC metrics. Regression models, including Random Forest, Linear Regression, and Lasso Regression, were evaluated and optimized using grid search and dimensionality reduction techniques. Each step ensured a comprehensive analysis, enhancing the understanding and management of energy consumption in the steel industry.

How to use this repo

  1. git clone repo
  2. install virtual environment name as venv
  3. pip install -r requirements.txt
  4. checkout main
  5. create a new branch example "task-{task name eg. eda }-{your name}"
  6. push to origin [DO NOT MERGE WITH MAIN]

About

Data Mining to Extract Insights from Steel Industry Energy Consumption.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%