VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

This repository contains the artifacts for our paper titled VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

Our Proposed solution utilizes Real-Time Web Scraping and Natural Language Inference(NLI) Models for the task of Fake News Detection :

Dynamic web scraping allows for real-time external knowledge retrieval for the fact-checking of headlines.
Natural Language Inference models are used to find support for the claimed headline in the web scraped text, to ascertain the veracity of an input headline

Visual Abstract

Directory Structure

├── Liar.csv  #LIAR Dataset used to train the Classical ML and BERT Models
├── Test_dataset(FINAL).csv #Our new evaluation dataset of curated and synthetic headlines
├── classical_ml_EVAL
│   ├── Classical_ml_EVAL.csv #Predictions from our Classical ML models on the evaluation dataset
│   └── *.ipynb #Notebooks used to train and test out classical baseline models
│ 
├── classical_ml_LIAR
│   └── *.ipynb
│
├── BERT_EVAL
│   ├── BERT_eval.csv #Predictions from our fine-tuned BERT model on the evaluation dataset
│   └── BERT_eval.ipynb #Notebook to compute baseline BERT predictions and results.
│ 
├── Pipeline_Article.ipynb
├── Pipeline_QNA.ipynb
├── Pipeline_SLM(Mistral).ipynb
├── Pipeline_SLM(Phi).ipynb
│ 
├── FactCC_Results #Contains the results for our pipelines utilizing FactCC as the NLI model
│   ├── Pipeline_Article.csv
│   ├── Pipeline_QNA.csv
│   ├── Pipeline_SLM(Mistral).csv
│   └── Pipeline_SLM(Phi).csv
├── Pipeline_Results_FactCC.ipynb #Computation of metrics for the pipelines utilizing FactCC
│ 
├── SummaC_Results  #Contains the results for our pipelines utilizing SummaC (ZS and Conv) as the NLI model
│   ├── Pipeline_Article.csv
│   ├── Pipeline_QNA.csv
│   ├── Pipeline_SLM(Mistral).csv
│   └── Pipeline_SLM(Phi).csv
├── Pipeline_Results_SummaC.ipynb #Computation of SummaC threshold and metrics for the pipelines utilizing SummaC(ZS and Conv)
│ 
├── Efficiency test
│   ├── Efficiency_Test.ipynb #Computes the average execution time for each step of our pipeline
│   └── *.csv #Contain the results for the execution times for each of our different pipelines and their configurations
│
├── unique_decisions.ipynb #Used to generate the venn-diagram plots of unique correct-incorrect decisions
├── scraping_selenium.py #Contains selenium function used for web-scraping in the QNA and LLM pipelines
└── requirements.txt

Illustrating the workflow of our three proposed pipelines with an input headline.

Preprint - https://arxiv.org/abs/2410.09455

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

Visual Abstract

Directory Structure

Illustrating the workflow of our three proposed pipelines with an input headline.

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
BERT_EVAL		BERT_EVAL
Efficiency test		Efficiency test
FactCC_Results		FactCC_Results
SummaC_Results		SummaC_Results
classical_ml_EVAL		classical_ml_EVAL
classical_ml_LIAR		classical_ml_LIAR
Liar.csv		Liar.csv
Pipeline_Article.ipynb		Pipeline_Article.ipynb
Pipeline_QNA.ipynb		Pipeline_QNA.ipynb
Pipeline_Results_FactCC.ipynb		Pipeline_Results_FactCC.ipynb
Pipeline_Results_SummaC.ipynb		Pipeline_Results_SummaC.ipynb
Pipeline_SLM(Mistral).ipynb		Pipeline_SLM(Mistral).ipynb
Pipeline_SLM(Phi).ipynb		Pipeline_SLM(Phi).ipynb
README.md		README.md
SentenceLevelPred.ipynb		SentenceLevelPred.ipynb
Test_dataset(FINAL).csv		Test_dataset(FINAL).csv
scraping_selenium.py		scraping_selenium.py
summac_conv_vitc_sent_perc_e.bin		summac_conv_vitc_sent_perc_e.bin
unique_decisions.ipynb		unique_decisions.ipynb

Hetens/VERITAS-NLI

Folders and files

Latest commit

History

Repository files navigation

VERITAS-NLI : Validation and Extraction of Reliable Information Through Automated Scraping and Natural Language Inference

Visual Abstract

Directory Structure

Illustrating the workflow of our three proposed pipelines with an input headline.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages