Skip to content
This repository has been archived by the owner on Jul 5, 2022. It is now read-only.
/ phish-food Public archive

Access to quality, relevant data on what retail investors are talking about

License

Notifications You must be signed in to change notification settings

fischersean/phish-food

Repository files navigation

PhishFood

This project has migrated to sourcehut

Deploy CodeQL

Like the Ben & Jerry's Ice Cream

About

PhishFood is the source code for the ETL pipeline and associated cloud infrastructure required to produce the TheKettle database. This project aims to provide quality, reliable data that an end user can have confidence in. At it's core, the pipeline and database attempts to collect and summarize what is "Hot" on Reddit's most popular trading subreddits.

Below is an example database entry (NoSQL version):

{
    "id": "wallstreetbets_20210408",
    "hour": 18,
    "data": [
        {
            "Stock": {
                "Symbol": "GME",
                "FullName": "GameStop Corporation Common Stock",
                "Exchange": "NYSE"
            },
            "Count": {
                "PostScore": 16852,
                "CommentScore": 3306,
                "TotalScore": 975.1710719570256,
                "PostMentions": 2,
                "CommentMentions": 50
            }
        }
    ]
}

Currently there are 3 supported subreddits:

  • stocks
  • wallstreetbets
  • investing

Get the SQLite Version of the database here

Why?

It was widely reported during the GameStop hype that hedge funds were setting up or buying applications to scrape Reddit for the latest trending stock data. I thought it would be helpful to a retail trader to have access to the same data.

About

Access to quality, relevant data on what retail investors are talking about

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Languages