Sentiment Analysis as a Service

Quick Links

CLAAT document

Getting Started

The Goal of this project is to build a sentiment analysis micro service that could take a new EDGAR file in json format and generate sentiments for each statement in the referenced EDGAR file.

To build this service, we are creating a Sentiment analysis model that has been trained on “labeled”,“Edgar” datasets and deploy the model using a Flask App.

Architecture Diagram

Annotation Pipeline

We made an annotation pipeline by ingesting 44 earning call files from various companies.
We have pre-processed each file to remove white spaces and special characters.
Each earnings call file will now contain a list of sentences.
We have made use of IBM Watson to label all these lines with sentiment scores.
We then normalized the score to a scale of -1 negative to 1 positive.
After that, we were successful in creating the file in a csv format containing sentences along with their scores and labelled them as positive and negative and eventually pushed the data to s3 Bucket.
We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Training Pipeline

We fetched the labelled csv stored in S3 bucket which was generated by running the Annotation Pipeline.
We made use of BERT to fine-tune our model to perform sentiment analysis on our labelled data in order to train and validate our labelled data.
We received an accuracy of 92 %
We successfully saved the model in our BERT folder so as to load the model in our flask app.
We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Microservice

Prerequisites

Your development and production environments are constructed by Docker. Install Docker for Desktop for your OS.

To verify that Docker is installed, run docker --version.

Simple Case: One Container

In this directory, we have Dockerfile, a blueprint for our development environment, and requirements.txt that lists the python dependencies.

We made use of the following command to create our docker image:

We are pulling the following tensorflow image which satisfies our tensorflow version

ARG BASE_IMG=tensorflow/tensorflow:2.1.0-py3-jupyter
FROM $BASE_IMG
ARG PROJECT_ROOT="."
ARG PROJECT_MOUNT_DIR="/"
ADD $PROJECT_ROOT $PROJECT_MOUNT_DIR
WORKDIR $PROJECT_MOUNT_DIR
RUN pip install --upgrade pip && \
    pip install -r requirements.txt
ENTRYPOINT [ "python" ]
CMD [ "/app.py" ]

To serve the provided pre-trained model, follow these steps:

git clone this repo
cd assignment_2/microservices/app/
docker build -t assign:latest . -- this references the Dockerfile at . (current directory) to build our Docker image & tags the docker image with assign:latest
`docker run -it --rm -p 5000:5000 assign' -- this refers to the image we built to run a Docker container

If everything worked properly, you should now have a container running, which:

Spins up a Flask server that accepts POST requests at http://0.0.0.0:5000/predict
Runs our BERT sentiment classifier on the "data" field of the request (which should be a list of text strings: e.g. '{"data": ["this is the best!", "this is the worst!"]}')
Returns a response with the model's prediction (1 = positive sentiment, 0 = negative sentiment)

To test this, you can either:

Write your own POST request (e.g. using Postman or curl), here is an example response:

{
    "input": {
        "data": [
            "this is the best!",
            "this is the worst!"
        ]
    },
    "pred": [
        [
            0.9935178756713867
        ],
        [
            0.6359626054763794
        ]
    ]
}

Inference Pipeline

We made use of FASTAPI for performing the inference pipeline test.
After running the fastapi.py file we will receive the transcript for any of the 8 company we fill as input.
We have preprocessed the transcript and created a list of sentences to input into our flask app.
We will then receive output for all the sentence through our BERT model and finally input all the sentence and their respective predictions into a CSV file.
We made use of Airflow to automate the workflow by creating tasks to install the libraries and run our python file.

Project Structure

Assignment_2/
├── Annotation_pipeline/
│   └── dags/
│       ├── annotation_pipeline.py
│       └── preprocessing.py
├── Inference_pipeline/
│   ├── app.py
│   ├── CompanyList.csv
│   ├── dags/
│   │   └── inference_pipeline.py
│   ├── fastapi.py
│   ├── inference-data/
│   │   ├── ACFN
│   │   ├── BLFS
│   │   ├── BMMJ
│   │   ├── CELTF
│   │   ├── GHLD
│   │   ├── IRIX
│   │   ├── KGFHF
│   │   └── TME
│   ├── main.py
│   └── requirements.txt
├── Microservices/
│   └── app/
│       ├── __init__.py
│       ├── app.py
│       ├── bert/
│       ├── Dockerfile
│       └── requirements.txt
├── README.md
├── requirements.txt
├── sec-edgar/
│   └── call_transcripts/
└── Training_pipeline/
    └── dags/
        ├── bert.py
        └── ml_pipeline.py

Team Members:

Nidhi Goyal
Kanika Damodarsingh Negi
Rishvita Reddy Bhumireddy

Citation:

https://github.com/Harvard-IACS/2020-ComputeFest/tree/master/notebook_to_cloud/ml_deploy_demo
https://github.com/holladileep/CSYE7245-Spring2021-Labs/tree/main/transcript-simulated-api
https://www.docker.com/sites/default/files/d8/2019-09/docker-cheat-sheet.pdf
https://www.tensorflow.org/tutorials/text/classify_text_with_bert

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Sentiment Analysis as a Service

Quick Links

Getting Started

Architecture Diagram

Annotation Pipeline

Training Pipeline

Microservice

Prerequisites

Simple Case: One Container

Inference Pipeline

Project Structure

Team Members:

Citation:

Files

README.md

Latest commit

History

README.md

File metadata and controls

Sentiment Analysis as a Service

Quick Links

Getting Started

Architecture Diagram

Annotation Pipeline

Training Pipeline

Microservice

Prerequisites

Simple Case: One Container

Inference Pipeline

Project Structure

Team Members:

Citation: