Skip to content
/ Bunyip Public

Bunyip is a Chrome Extension, which allows us to detect AI generated text, it helps users detect fake news articles which might be generated automatically and not by a real human!

License

Notifications You must be signed in to change notification settings

CT83/Bunyip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bunyip demo

image-20191228091711520

Bunyip is a Chrome Extension, which allows us to detect AI generated text, it helps users detect fake news articles which might be generated automatically and not by a real human!

You can install the extension from Chrome Store! - Bunyip - Detect all the Glitter in the Wild

As with several of my older projects the aim with Bunyip, initially was to monetize it.

Buut, I could not find a market for this, and for now I decided against making it a paid extension on Chrome Store so I Open Sourced it instead.

Working

  1. The selected text is sent to a Serverless Function for classification
  2. The response contains the words with the likelihood of each word being generated by an AI.
  3. The extension then visualizes these words, using different words to correspond to different probabilities.

bunyip-screencap

This project builds on the Giant Language Model Test Room, which enables a forensic analysis of how likely an automatic system has generated a text. - Live Demo

Components

There are about 3 components which make up Bunyip.

  1. Bunyip - Chrome Extension

    This simply sends the selected text to the GCP Cloud Function Proxy which then forwards it to GLTR.

  2. Serverless Proxy running on Google Cloud Platform

    The Algorithmia REST call contains an API Key which is required to make a request to it, so and the only way I could think of to keep it not hardcoded in the Chrome Extension was to use a proxy hence the workaround.

  3. Modified Version of GLTR - A tool to detect automatically generated text

    This is deployed on Alogrithmia's Serverless Environment and is interacted with - through a REST API, the GCP Function makes a call to this internally and returns the response to the Chrome Extension.

bunyip

How did I go about making it?

Step 1 - Analyzing the problem statement at hand

To create a Chrome Extension to detect if the selected text was generated by an AI

I created a list of all the things I needed to learn, Chrome Extensions, Serverless Deployment, GCP Cloud Functions, the GLTR Integration.

Step 2 - Getting GLTR up and running locally

This was way easier than I thought it would be, everything worked in a jiffy - installed requirements and started the flask server, used PostMan to test everything locally.

Step 3 - Creating the Chrome Extension

extension

This was the easiest but the most time consuming part of the process, the UI took longer than I expected to make but the results were impressive!

Step 4 - Deploying the Flask App onto a Serverless Cloud Platform

This was super tricky and I touch more on this in the Challenges section.

Step 5 - Publishing the Extension to Chrome Webstore

Documentation on how to do this was pretty clear, so I was able to power through this.

image-20191228101319163

Challenges

Deployment is always a doozy

Yes, one of the most understated parts of building Bunyip was the overwhelming amounts of extra work that needed to be done to make run in the wild and not just on my laptop. Deploying the entire setup somewhere cheap and scalable was the major challenge.

1. Models cannot be directly deployed onto Serverless functions

I had assumed that I would just be able to directly deploy my entire app to some Serverless Environment and everything would be a breeze, well....

Turns out the PyTorch package which is needed to run the model was over 500 MB big, which meant it was too big for AWS Lambda Functions and GCP Cloud Functions to handle.

lambda-limit

Then, I thought about deploying the Flask App to AWS EC2 instances instead.

ec2-pricing

But, I noticed how I was going to need at least a t2.large instance and it was more than what I wanted to spend on a side project.

Then I stumbled across Algorithmia, which allows you to wrap your Python code in a REST complete with Authentication, Hosting, Logging, client side libraries for all major languages and so much more!

With a little bit of refactoring and after a few tries I was able to get by App on it. The next step was simply making POST calls to it from my Chrome Extension.

2. Accessing the Algorithmia API without hardcoding the API Keys in the Chrome Extension

Algorithmia requires you to include an API Key every time you make a request to it, traditionally this would mean, the Bunyip - Chrome Extension would have to do do this. But, I didn't think it was wise to just expose my credentials to all of the internet!

The way I went around this was I created a simple proxy function and deployed it as a GCP Cloud Function, the proxy made authenticated calls on behalf of the browser and returned the appropriate responses, this meant, my API Keys were totally private and secure.

img

Motivation

Andrej Karpathy tweeted this, and I thought, "Yes! That something which I could actually do!".

karpathy_tweet_1192169928079503360

So I did!

_Rohan_Sawant__tweet_1205442557041229824

References

  • This project builds on the strong foundation provided by the Giant Language Model Test Room built by Hendrik Strobelt, Sebastian Gehrmann and Alexander M. Rush. GLTR, enables a forensic analysis of how likely an automatic system has generated a text.

  • You can find the GLTR instance deployed as an API on Algorithmia - bunyip-gpt-detector

  • You can find OpenAI's original GPT Detector, deployed as an API here - gpt-detector

Credits