GitHub - syedahmedrz/AI-Based-Spam-Email-Classification-Using-Naive-Bayes: A spam email classification model using Naive Bayes with machine learning, designed to categorize emails as spam or not spam.

Import Libraries

import numpy as np 
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
import json

Libraries and Their Uses

numpy and pandas: Used for handling and manipulating data.
sklearn: Contains tools for machine learning tasks:
- train_test_split: To split the data into training and testing sets.
- CountVectorizer: Converts text data into numerical data.
- MultinomialNB: A type of Naive Bayes classifier suitable for text data.
- accuracy_score: Evaluates the accuracy of the model.
json: For reading JSON files.

Read Data from JSON Files

def read_file(file_path):
    """Reads a JSON file and returns the data."""
    with open(file_path) as file:
        data = json.load(file)
    return data

# load data
data = read_file('data.json')
test_data = read_file('test_data.json')

Prepare the Data

# Create DataFrame
df = pd.DataFrame(data)
X = df['email']
Y = df['label']

df: Converts the loaded data into a pandas DataFrame. X: Contains the email texts. Y: Contains the labels (e.g., "spam" or "not spam").

Text Data to Numerical Data

vectorizer = CountVectorizer()
X_vectorized = vectorizer.fit_transform(X)

CountVectorizer: Converts text into a matrix of token counts (i.e., numerical data). X_vectorized: Contains the numerical representation of the email texts.

Split Data into Training and Testing Sets

X_train, X_test, y_train, y_test = train_test_split(X_vectorized, Y, test_size=0.25, random_state=42)

train_test_split: Splits the data into training (75%) and testing (25%) sets. X_train and y_train: Used to train the model. X_test and y_test: Used to evaluate the model's performance.

Train the Model

model = MultinomialNB()
model.fit(X_train, y_train)

MultinomialNB: Creates a Naive Bayes classifier model. model.fit: Trains the model using the training data.

Make Predictions and Evaluate

y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy:.2f}')

model.predict: Makes predictions on the test set. accuracy_score: Calculates the accuracy of the predictions. print: Displays the accuracy of the model.

Classify New Email Function

def classify_new_email(new_email):
    """Classifies a new email as 'spam' or 'not spam'."""
    new_email_vectorized = vectorizer.transform([new_email])
    prediction = model.predict(new_email_vectorized)
    return prediction[0]

# Test with a new email subject
new_email = "Important information about your account"
result = classify_new_email(new_email)
print(f'The new email is classified as: {result}')

classify_new_email: Function that takes a new email, converts it to numerical data, and predicts its label. new_email: An example email to test the classification. result: Shows whether the email is classified as "spam" or "not spam".

Classify Multiple Emails

# Test a bunch of subjects
for subject in test_data:
    test_subject = classify_new_email(subject)
    print(f'The new email is classified as: {test_subject}')

for loop: Iterates through the test_data and classifies each email using the classify_new_email function. print: Displays the classification result for each email.

Email Subjects Dataset Example

Email	Label
"Win money now!"	Spam
"Important information about your account"	Not Spam
"Cheap loans available"	Spam
"Meeting on Friday"	Not Spam
"Get rich quick"	Spam
"Your invoice is ready"	Not Spam
"Earn money while you sleep"	Spam
"Team lunch tomorrow"	Not Spam

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
data		data
models		models
sec-backend		sec-backend
sec-frontend		sec-frontend
.gitignore		.gitignore
README.md		README.md
project_hierarchy.txt		project_hierarchy.txt
spam_classifier_model.py		spam_classifier_model.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Import Libraries

Libraries and Their Uses

Read Data from JSON Files

Prepare the Data

Text Data to Numerical Data

Split Data into Training and Testing Sets

Train the Model

Make Predictions and Evaluate

Classify New Email Function

Classify Multiple Emails

Email Subjects Dataset Example

About

Releases

Packages

Languages

syedahmedrz/AI-Based-Spam-Email-Classification-Using-Naive-Bayes

Folders and files

Latest commit

History

Repository files navigation

Import Libraries

Libraries and Their Uses

Read Data from JSON Files

Prepare the Data

Text Data to Numerical Data

Split Data into Training and Testing Sets

Train the Model

Make Predictions and Evaluate

Classify New Email Function

Classify Multiple Emails

Email Subjects Dataset Example

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages