README.md
This repository contains the code and data for a project focused on predicting crop yields based on various agricultural factors. This was my capstone project during my internship, where I worked on developing predictive models to optimize agricultural production.
The dataset used in this project contains information related to agricultural factors and crop yield. It includes data such as state names, crop types, nutrient levels (N, P, K), pH levels, weather conditions (rainfall, temperature), area in hectares, production in tons, and yield per hectare. The dataset was obtained from [provide data source if available].
The preprocessing pipeline included the following steps:
-
Data Cleaning:
- Handled missing values by either imputing them or removing rows/columns with missing data.
- Removed irrelevant columns such as 'Unnamed: 0' or other identifiers.
- Ensured data consistency and uniformity by standardizing formats and addressing any inconsistencies.
-
Exploratory Data Analysis (EDA):
- Visualized the distribution of states, crop types, and production by state using bar charts and pie charts.
- Explored relationships between different agricultural factors and crop yield using correlation analysis and visualizations like histograms and heatmaps.
-
Feature Engineering:
- Encoded categorical variables using techniques like Label Encoding or One-Hot Encoding.
- Scaled numerical features to handle skewness and ensure uniformity in feature scales.
-
Data Splitting:
- Split the dataset into training and testing sets to facilitate model training and evaluation.
-
Model Building and Evaluation:
- Built predictive models using machine learning algorithms such as Linear Regression, Decision Tree Regressor, Random Forest Regressor, and K Neighbors Regressor.
- Evaluated model performance using metrics like R-squared score, mean squared error (MSE), and mean absolute error (MAE).
-
Data Transformation:
- Reverted numerical data back into categorical format as needed for better interpretation or downstream analysis.
This preprocessing pipeline ensures that the dataset is cleaned, processed, and prepared for model training, enabling the development of accurate predictive models for crop yield prediction.
For detailed code implementation, refer to the Jupyter Notebook files provided in this repository.
Note: Replace file paths and specific details with appropriate information relevant to your project setup.