Skip to content

This repository includes the meat of a pipeline that generates a reliable news database. This database is meant to be employed by an LLM-powered retrieval augmented generation browser extension to find low-credibility posts on Facebook and generate bridging conversations.

License

Notifications You must be signed in to change notification settings

mr-devs/reliable_news_db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

reliable_news_db

This repository includes the meat of a pipeline that generates a reliable news database. An AI browser extension, powered by retrievel-augmented LLMs, that finds low-credibility posts on Facebook and allows users to generate bridging responses rooted in reliable news. This database is meant to be employed by an AI browser extension, powered by retrieval-augmented LLMs, that finds low-credibility posts on Facebook and allows users to generate bridging responses rooted in reliable news.

Note:

  • Some of this code may need tweaking and/or cleaning up. The production version of the linked extension was moved to another repository.
  • This code requires paid API keys for serp (to collect Google News links) as well as OpenAI (to summarize article text).

Overview of the pipeline

  1. serp is used to search for recent Google News articles in the US from a specific list of domains
  2. The text of these articles is programmatically scraped.
  • We use newspaper3k to do this automatically. Note that scraping works as of July 2024 for the current list of domains. If you change this or a great deal of time has passed, the scrapping process may not work anymore. You should check your data!
  1. Each article is summarized using OpenAI's ChatGPT-3.5 Turbo.
  2. Summary text is inserted into a vector database for fast semantic search by the browser extension.

The entire pipeline is run by a single bash script code/collect_summarize_update_vdb.sh

Repository structure

  • code/: contains all code/scripts
  • data/: contains all data

About

This repository includes the meat of a pipeline that generates a reliable news database. This database is meant to be employed by an LLM-powered retrieval augmented generation browser extension to find low-credibility posts on Facebook and generate bridging conversations.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published