Skip to content

Latest commit

 

History

History
165 lines (128 loc) · 6.13 KB

README.md

File metadata and controls

165 lines (128 loc) · 6.13 KB

Sentiment Analysis as a Service

made-with-python

Quick Links


Getting Started

The Goal of this project is to build a sentiment analysis micro service that could take a new EDGAR file in json format and generate sentiments for each statement in the referenced EDGAR file.

To build this service, we are creating a Sentiment analysis model that has been trained on “labeled”,“Edgar” datasets and deploy the model using a Flask App.

Architecture Diagram

Untitled Diagram

Annotation Pipeline

  • We made an annotation pipeline by ingesting 44 earning call files from various companies.
  • We have pre-processed each file to remove white spaces and special characters.
  • Each earnings call file will now contain a list of sentences.
  • We have made use of IBM Watson to label all these lines with sentiment scores.
  • We then normalized the score to a scale of -1 negative to 1 positive.
  • After that, we were successful in creating the file in a csv format containing sentences along with their scores and labelled them as positive and negative and eventually pushed the data to s3 Bucket.
  • We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Training Pipeline

  • We fetched the labelled csv stored in S3 bucket which was generated by running the Annotation Pipeline.
  • We made use of BERT to fine-tune our model to perform sentiment analysis on our labelled data in order to train and validate our labelled data.
  • We received an accuracy of 92 %
  • We successfully saved the model in our BERT folder so as to load the model in our flask app.
  • We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Microservice

Prerequisites

Your development and production environments are constructed by Docker. Install Docker for Desktop for your OS.

To verify that Docker is installed, run docker --version.

Simple Case: One Container

In this directory, we have Dockerfile, a blueprint for our development environment, and requirements.txt that lists the python dependencies.

We made use of the following command to create our docker image:

  • We are pulling the following tensorflow image which satisfies our tensorflow version
ARG BASE_IMG=tensorflow/tensorflow:2.1.0-py3-jupyter
FROM $BASE_IMG
ARG PROJECT_ROOT="."
ARG PROJECT_MOUNT_DIR="/"
ADD $PROJECT_ROOT $PROJECT_MOUNT_DIR
WORKDIR $PROJECT_MOUNT_DIR
RUN pip install --upgrade pip && \
    pip install -r requirements.txt
ENTRYPOINT [ "python" ]
CMD [ "/app.py" ]

To serve the provided pre-trained model, follow these steps:

  1. git clone this repo
  2. cd assignment_2/microservices/app/
  3. docker build -t assign:latest . -- this references the Dockerfile at . (current directory) to build our Docker image & tags the docker image with assign:latest
  4. `docker run -it --rm -p 5000:5000 assign' -- this refers to the image we built to run a Docker container

If everything worked properly, you should now have a container running, which:

  1. Spins up a Flask server that accepts POST requests at http://0.0.0.0:5000/predict
  2. Runs our BERT sentiment classifier on the "data" field of the request (which should be a list of text strings: e.g. '{"data": ["this is the best!", "this is the worst!"]}')
  3. Returns a response with the model's prediction (1 = positive sentiment, 0 = negative sentiment)

To test this, you can either:

  1. Write your own POST request (e.g. using Postman or curl), here is an example response:
{
    "input": {
        "data": [
            "this is the best!",
            "this is the worst!"
        ]
    },
    "pred": [
        [
            0.9935178756713867
        ],
        [
            0.6359626054763794
        ]
    ]
}

Inference Pipeline

  • We made use of FASTAPI for performing the inference pipeline test.
  • After running the fastapi.py file we will receive the transcript for any of the 8 company we fill as input.
  • We have preprocessed the transcript and created a list of sentences to input into our flask app.
  • We will then receive output for all the sentence through our BERT model and finally input all the sentence and their respective predictions into a CSV file.
  • We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Project Structure

Assignment_2/
├── Annotation_pipeline/
│   └── dags/
│       ├── annotation_pipeline.py
│       └── preprocessing.py
├── Inference_pipeline/
│   ├── app.py
│   ├── CompanyList.csv
│   ├── dags/
│   │   └── inference_pipeline.py
│   ├── fastapi.py
│   ├── inference-data/
│   │   ├── ACFN
│   │   ├── BLFS
│   │   ├── BMMJ
│   │   ├── CELTF
│   │   ├── GHLD
│   │   ├── IRIX
│   │   ├── KGFHF
│   │   └── TME
│   ├── main.py
│   └── requirements.txt
├── Microservices/
│   └── app/
│       ├── __init__.py
│       ├── app.py
│       ├── bert/
│       ├── Dockerfile
│       └── requirements.txt
├── README.md
├── requirements.txt
├── sec-edgar/
│   └── call_transcripts/
└── Training_pipeline/
    └── dags/
        ├── bert.py
        └── ml_pipeline.py

Team Members:

  1. Nidhi Goyal
  2. Kanika Damodarsingh Negi
  3. Rishvita Reddy Bhumireddy

Citation: