Spam Email Classification

This project demonstrates how to classify emails as Spam or Ham (Not Spam) using Natural Language Processing (NLP) and a Random Forest Classifier.

Features

Preprocessing: Cleans and processes email text (removes punctuation, converts to lowercase, stems words, and removes stopwords).
Vectorization: Converts text data into numerical format using CountVectorizer.
Model Training: Uses a Random Forest Classifier for prediction.
Prediction: Classifies new emails as Spam or Ham.

Requirements

Python 3.7 or higher
Libraries:
- numpy
- pandas
- nltk
- scikit-learn

Install required libraries:

pip install numpy pandas nltk scikit-learn

Dataset

The dataset used for this project:

Columns:
- text: The email content.
- label_num: The label (0 for Ham, 1 for Spam).

Replace 'spam_ham_dataset.csv' with your dataset file.

How It Works

Data Preprocessing:
- Converts text to lowercase.
- Removes punctuation.
- Applies stemming to reduce words to their root forms.
- Removes stopwords (e.g., "the", "is", "in").
Feature Extraction:
- Text is converted to a bag-of-words representation using CountVectorizer.
Model Training:
- Splits data into training and testing sets.
- Trains a Random Forest Classifier on the training data.
Email Prediction:
- Takes an example email, preprocesses it, and predicts if it's Spam or Ham.

Usage

Load the dataset:

data = pd.read_csv('spam_ham_dataset.csv')

Run the code to train the model and evaluate accuracy:
```
cl.score(X_test, y_test)
```

Predict an email:

prediction = cl.predict(x_email)
print(f"Prediction: {'Spam' if prediction[0] == 1 else 'Ham'}")

Output

Prints the model's prediction (Spam or Ham) for a sample email.
Displays the actual label from the dataset for comparison.

Notes

Ensure the dataset is in the correct format before running the notebook.
The nltk library requires downloading stopwords:
```
nltk.download('stopwords')
```

License

Feel free to use and modify this project for learning purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
Spam Mail detection.ipynb		Spam Mail detection.ipynb
spam_ham_dataset.csv		spam_ham_dataset.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Email Classification

Features

Requirements

Dataset

How It Works

Usage

Output

Notes

License

About

Releases

Packages

Languages

VedantVare/Spam-Mail-Detection

Folders and files

Latest commit

History

Repository files navigation

Spam Email Classification

Features

Requirements

Dataset

How It Works

Usage

Output

Notes

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages