This project aims to classify the quality of sensors based on various features extracted from sensor data. It consists of several Python scripts organized into a pipeline for data preprocessing, model training, inference, and evaluation. Below is an overview of the project structure and how to use it.
The inputs of various sensors for different wafers have been provided. In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor used for the fabrication of integrated circuits. The goal is to build a machine learning model which predicts whether a wafer needs to be replaced or not(i.e., whether it is working or not) based on the inputs from various sensors. There are two classes: +1 and -1. • +1 means that the wafer is in a working condition and it doesn’t need to be replaced. • -1 means that the wafer is faulty and it needs to be replaced.
- config: Contains configuration files.
- saved_artifacts: Directory to store model artifacts files.
- sensorqualityclassifier:(src)
- pipeline:
- data_extraction_pipeline.py: Module for downloading the traing data from the datasource mentioned in config.yml
- data_transform_and_loading_pipeline.py: Module for data loading and preprocessing.
- model_training_pipeline.py: Module for training the classification model.
- inference_pipeline.py: Module for running inference on new data.
- utils: Utility functions used across the project like logger.
- pipeline:
- README.md: Overview and instructions for the project.
In this project, we have utilized several tools and technologies to streamline the development, deployment, and operation processes:
- Poetry: Manages the project's environment setup and dependencies.
- Hopsworks: Used for data loading, which aids in data versioning, data integrity, and monitoring for data drift and concept drift.
- XGBClassifier: we have utilized the
XGBClassifier
from XGBoost for solving the binary classification problem. XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. - Hopsworks: Employed for model registry to maintain and manage different versions of the models.
- Streamlit: Utilized to create a prototype user interface for demonstrating the model's usage.
- GitHub Actions: Implements CI/CD pipelines to automate the testing and deployment processes.
- Docker: Used to containerize the application, ensuring consistency across various development and deployment environments.
- Heroku: Serves as the platform for hosting the application during the development phase.
-
Clone the repository:
git clone https://github.com/your_username/sensor-quality-classifier.git cd sensor-quality-classifier
-
Install the required Python dependencies:
pip install -r requirements.txt
To begin the process, run the data extraction pipeline to gather and assemble data from various sources:
(The data used in this project is provided by a tutorial video of Krish Naik. It is important to note that the data is not taken from actual sources and is used here solely for demonstration purposes.)
The data_extraction_pipeline.py
script is responsible for downloading this dataset from the specified repository for use in the project.)
python sensorqualityclassifier/pipeline/data_extraction_pipeline.py
Modify the configuration file config/config.yml
according to your data locations and parameters.
Run the data loading and preprocessing pipeline:
python sensorqualityclassifier/pipeline/data_transform_and_loading_pipeline.py
Modify the configuration file config/config.yml
if necessary.
Run the model training pipeline:
python sensorqualityclassifier/pipeline/model_training_pipeline.py
Ensure that the trained model is available at the location specified in the configuration.
Modify the configuration file config/config.yml
if necessary.
Run the inference pipeline:
python sensorqualityclassifier/pipeline/inference_pipeline.py
The config/config.yml
file contains various parameters such as file paths, model hyperparameters, and feature settings. Modify this file according to your requirements.
Contributions are welcome! If you have any suggestions or improvements, please open an issue or create a pull request.
Open-Source This project is licensed under the MIT License.
This project is inspired by the need to classify sensor quality efficiently. Thanks to the contributors of the libraries used in this project.
- Krish Naik
- Pau Labarta Bajo
For any questions or inquiries, please contact krunalss@outlook.com