Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 2.82 KB

README.md

File metadata and controls

10 lines (9 loc) · 2.82 KB

COVID19 Fake News Detection Using a Multimodal Classifcation Approach

Introduction

The rapid explosion of communication applications in the era of Web 2.0 has led to unlimited and uncharted access to instantaneous information and news. This was not only through approved news outlet websites and credible journalism anchors, but also - and mostly - through social media platforms. This has created a problem in terms of verifying the validity of the information shared and the integrity of the person reporting it. This issue has manifested itself quite well in the midst of the COVID-19 pandemic when medical knowledge and advice on how to manage the effects of the disease on a micro-level became of critical importance, as spreading false information and "Fake News" would not only lead to further political polarization in the long-term, a distortion and slowing of epidemic management efforts, but was also the direct cause of many endangered and lost lives. To try and solve this issue by framing it as a Data Mining problem, it would be of interest to lean on the power of the currently available machine learning models for text processing and classification to filter out credible and true information from fake news.
Providing a model for predicting the class of a given social media post can help fact checking organizations in automating aspects of their work. Moreover, academic researchers and even industrial research can further explore best practices to improve the automatic detection of misinformation on social media platforms and mitigate its spread.

Project Description

Our project aims to use data generated from news tweets on COVID-19 posted during the height of the pandemic to train different machine learning models on a succinct binary classification task to “fake” and “real” clusters. The dataset we have has been compiled and used by Patwa, et al, 2020. The curated set consists of 10700 manually annotated (“fake”, “real”) tweets.
Exploring the potential of natural language processing, classical machine learning classifiers and even natural language processing in the task of classifying information such as social media posts and articles as fake or real became of high interest and urgency to researchers, especially with the spike of information during COVID-19 pandemic. Researchers are testing different combinations of attributes along with the text, various NLP models and machine learning classifiers. In our project, we will be exploring the effectiveness of using a combination of NLP pipelines to do feature extraction and various classification models, and contrast prediction-accuracy results with those we get from using these bare models with word count features within the task of classifying COVID-19 related tweets to either “fake” or “real”.