This project focuses on Medicinal Plants Detection using Machine Learning techniques to identify various medicinal plants through images. The process involves Segmentation, Gray Scale Conversion, and advanced Feature Extraction methods such as GLCM, Gabor, and LBP. Multiple models are trained and evaluated to identify the most suitable one for prediction.
The primary objective is to develop a reliable model capable of accurately identifying medicinal plants from images. The project emphasizes manual feature extraction techniques without relying on neural networks, ensuring high performance across various machine learning models.
The dataset comprises images of 30 different plant species, each labeled with both common and botanical names. These images vary in size, providing a comprehensive dataset for robust classification tasks.
- Dataset URL: Medicinal Leaf Dataset
Images are segmented using the HSV (Hue, Saturation, Value) color space to isolate the relevant plant regions, removing unnecessary background.
The segmented images are then converted to grayscale to simplify further processing while preserving essential details.
A Sobel Filter is applied to highlight the edges in the images, making them more suitable for feature extraction.
Features are extracted from both segmented and grayscale images using the following techniques:
LBP converts an image into a binary pattern by comparing neighboring pixels to the center pixel. This binary code is then converted to a decimal value, serving as a texture descriptor.
GLCM analyzes pixel pairs within a specific spatial relationship, creating a matrix that reflects their frequency. This matrix is used to derive texture features such as contrast, correlation, energy, and homogeneity.
Gabor filters process the image by convolving it with a sinusoidal wave modulated by a Gaussian envelope. These filters are sensitive to specific image features, such as edges and textures, at various scales and orientations.
Color moments, including mean, standard deviation, and skewness, capture the color distribution in each channel (e.g., RGB). These moments are highly effective for image classification tasks.
Note: A total of 62 features are extracted from each image and stored in a FeatureExtracted.csv
file. If you require this file, please contact me using the information provided at the end.
To improve model performance and reduce dimensionality, the dataset is split into training and test sets, and the following techniques are applied:
PCA identifies the principal components that capture the most variance in the data, reducing the dimensionality while retaining essential information.
The StandardScaler normalizes the data by transforming each feature to have a mean of 0 and a standard deviation of 1, ensuring consistency during model training.
The extracted features are used to train multiple models, with a focus on identifying the best-performing one.
SVM achieved the highest accuracy of 99%, outperforming all other models. Its ability to find the optimal hyperplane that maximizes class separation makes it particularly effective for this classification task.
Real-time data can be used for prediction by leveraging pre-trained models, including PCA for dimensionality reduction, StandardScaler for normalization, and SVM for classification. The process involves:
- Data Preprocessing: Incoming data is normalized using the saved StandardScaler to ensure consistency.
- Dimensionality Reduction: The data is transformed using the saved PCA model, retaining the most significant features.
- Prediction: The transformed data is fed into the saved SVM model to generate real-time predictions.
This approach ensures efficient and accurate predictions on new, unseen data by utilizing the computational efficiency and patterns learned by the pre-trained models.
If you have any questions, need additional data, or have suggestions or feedback, feel free to contact me:
- 📧 Email: logeshwaranks01@gmail.com
- LinkedIn: Logeshwaran KS
Thank You for Checking Out This Project! 😄