Industrial Scale Penicillin Simulation

Overview

This project aims to leverage the Industrial-scale Penicillin Simulation dataset to investigate and improve control strategies in large-scale fermentations. Our objectives include exploring the relationships between input parameters and their impacts on outcomes, identifying the batch achieving the highest penicillin concentration, and examining correlations between penicillin concentration and other variables. This study aims to advance bioprocess control and optimization in contemporary biopharmaceutical facilities.

Dataset

The dataset used in this project is from Kaggle: Big Data-Biopharmaceutical manufacturing.

Due to its size, the original dataset is 2.7GB and was compressed to 700MB for upload to GitHub. You can download the dataset from the above link.

Installation

Clone the repository:

git clone https://github.com/ansh-info/Industrial-Scale-Penicillin-Simulation.git

Navigate to the project directory:

cd Industrial Scale Penicillin Simulation

Install the required packages:
```
pip install -r requirements.txt
```
Download the original dataset from Kaggle and place it in the data/ directory:
- Download Original Dataset

Usage

Data Cleaning:
- Run the 1. Cleaning-Dataset.ipynb notebook to clean the dataset and generate the final CSV file with 33 columns.
Machine Learning Analysis:
- Run the 2. Regression-Penicillin Simulation.ipynb notebook to perform the regression analysis and other machine learning tasks.

Machine Learning Analysis

Data Cleaning

The 1. Cleaning-Dataset.ipynb notebook performs the following tasks:

Loads the dataset from CSV.
Handles missing values.
Normalizes and scales features.
Generates a cleaned dataset with 33 relevant columns.

Regression Analysis

The 2. Regression-Penicillin Simulation.ipynb notebook performs the following tasks:

Loads the cleaned dataset.
Performs exploratory data analysis (EDA).
Identifies highly correlated variables with penicillin concentration.
Develops regression models to predict penicillin concentration.
Evaluates model performance using various metrics.

Images

Data Cleaning Process

Regression Analysis

Acknowledgements

We would like to thank the original dataset creator and Kaggle for providing the platform to share this data.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
Dataset		Dataset
Notebooks		Notebooks
images		images
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Industrial Scale Penicillin Simulation

Overview

Dataset

Installation

Usage

Machine Learning Analysis

Data Cleaning

Regression Analysis

Images

Data Cleaning Process

Regression Analysis

Acknowledgements

About

Releases

Packages

Languages

ansh-info/Industrial-Scale-Penicillin-Simulation

Folders and files

Latest commit

History

Repository files navigation

Industrial Scale Penicillin Simulation

Overview

Dataset

Installation

Usage

Machine Learning Analysis

Data Cleaning

Regression Analysis

Images

Data Cleaning Process

Regression Analysis

Acknowledgements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages