Quora Insincere Question Classifier

Insincere Question Classification: Unmasking Deception on Quora

This repository contains the code and resources for the project on insincere question classification on Quora. The goal of this project is to develop a machine learning model that can accurately identify and classify insincere or deceptive questions on the Quora platform.

Dataset

The project utilizes the Quora Insincere Questions Classification dataset, which is publicly available on Kaggle. This dataset consists of a large collection of questions from Quora, along with labels indicating whether each question is sincere or insincere. The dataset serves as the foundation for training and evaluating our classification model.

Dataset Columns Description

qid: Unique question identifier
question_text: Text of the question
target: Binary label indicating whether the question is insincere (1) or sincere (0)

Methodology

The project follows a supervised learning approach to tackle the problem of insincere question classification. It involves several key steps, including data collection, data preprocessing, feature extraction, model selection, model training, and model evaluation.

Approach

Data Collection: The Quora Insincere Questions Classification dataset is collected and downloaded from Kaggle. The data structures and features will be explored to gain a better understanding of the dataset.
Data Preprocessing: Text data is preprocessed by tokenization, lowercasing, and removal of stop words and punctuation.
Feature Extraction: The preprocessed text data will be transformed into numerical representations suitable for machine learning algorithms. For this project, we will be using word embeddings, such as Word2Vec or GloVe, to convert the text into dense vector representations that capture semantic relationships between words.
Model Selection and Training: For this project, we will explore various NLP models suitable for insincere question classification, such as recurrent neural networks (RNNs), convolutional neural networks (CNNs), long short term memory (LSTM), or transformer models like BERT. These models have shown promising results in NLP tasks and can capture complex patterns and dependencies in text data. We will select the most appropriate model based on its performance on the validation dataset and train it using the labeled training dataset.
Model Evaluation: The trained NLP model will be evaluated using appropriate evaluation metrics, such as accuracy, precision, recall, and F1-score. The performance of the model will be assessed on the test dataset.

Repository Structure

1. Data/: Holds the Quora Insincere Questions Classification dataset.
2. Notebooks/: Contains Jupyter notebooks with code for data collection, data preprocessing, NLP model development, and evaluation.
3. Models/: Contains the trained NLP model checkpoints.
4. Utils/: Includes utility functions and scripts used in the project.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
1. Data		1. Data
2. Notebooks		2. Notebooks
3. Models		3. Models
4. Documentations		4. Documentations
5. Utils		5. Utils
README.md		README.md
quora_cla_env_setup.yml		quora_cla_env_setup.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quora Insincere Question Classifier

Insincere Question Classification: Unmasking Deception on Quora

Dataset

Dataset Columns Description

Methodology

Approach

Repository Structure

License

About

Languages

ibenjamin-ojo/Quora-Insincere-Question-Classifier

Folders and files

Latest commit

History

Repository files navigation

Quora Insincere Question Classifier

Insincere Question Classification: Unmasking Deception on Quora

Dataset

Dataset Columns Description

Methodology

Approach

Repository Structure

License

About

Topics

Resources

Stars

Watchers

Forks

Languages