Overview

Disclosure

⚠️ NOT INVESTMENT ADVICE ⚠️

The content produced by this application is for informational purposes only, you should not construe any such information or other material as legal, tax, investment, financial, or other advice. Nothing contained in this article, Git Repo or withing the output produced by this application constitutes a solicitation, recommendation, endorsement, or offer by any member involved working on this project, any company they represent or any third party service provider to buy or sell any securities or other financial instruments in this or in in any other jurisdiction in which such solicitation or offer would be unlawful under the securities laws of such jurisdiction.

The use of word "opinion" or "recommendation" or any other word with a similar meaning, in this article, within the Technitrade application, or within information produced by the application is for demonstration purposes only, and is not a recommendation to buy or sell any securities or other financial instruments!

This application was created solely to satisfy the requirements of Columbia University FinTech Bootcamp Project #2 Homework, and the results produced by this application may be incorrect.

Overview

Technitrade lets user track a portfolio of stocks, periodically getting News Sentiment, Twitter Sentiment, and Machine Learning AI Stock Opinion. The machine learning model calculates "opinion" based on market data and technical analysis, while the investor sentiment calculated by natural language processing analysis of recent news articles and Tweets.

The user interacts with the program via an Amazon Lex chatbot. The machine learning analysis is performed using LSTM (Long Short-Term Memory) model. The model is trained on technical analysis indicators. Sentiment analysis is performed by Google Cloud Natural Language using NewsAPI and Twitter APIs as data source.

Demo Jupyter Notebooks

Technical Analysis Demo : technicals_demo.ipynb
Machine Learning Demo : lstm_demo.ipynb
Sentiment Analysis Demo : nlp_demo.ipynb

Production Code

Flask API
Application (Production Machine Learning LSTM model, Sentiment Analysis, etc. )
Infrastructure
Docker container

Can all be found here: code/api/

Lambda file can be viewed here: lambda.py

Application Logic

Libraries

The following libraries are used:

Data Computation and Visualization

Numpy - "The fundamental package for scientific computing with Python".
Pandas - data analysis and manipulation tool.
Matplotlib - comprehensive library for creating static, animated, and interactive visualizations in Python.

Database

boto3 - AWS SDK for Python to create, configure, and manage AWS services, such as Amazon Elastic Compute Cloud (Amazon EC2) and Amazon Simple Storage Service (Amazon S3). The SDK provides an object-oriented API as well as low-level access to AWS services.
psycopg2 - database adapter for the Python programming language.

Data Source APIs

Dotenv - python-dotenv reads key-value pairs from a .env file and can set them as environment variables.
Alpaca Trade API - Internet brokerage and market data connection service.
NewsAPI - NewsAPI locates articles and breaking news headlines from news sources and blogs across the web and returns them as JSON.
Twitter API - Twitter API enables programmatic access to Twitter.
- tweepy - An easy-to-use Python library for accessing the Twitter API.

Machine Learning

Scikit-Learn - Machine learning library for python
Tensorflow - end-to-end open source platform for machine learning.
Keras - aython API used to interact with Tensorflow.
NLTK - leading platform for building Python programs to work with human language data.
Google Cloud language_v1 - API that connects to Google Cloud Natural Language

Other Development Frameworks

Flask - micro web framework written in Python.
AWS Lex Bot - service for building conversational interfaces into any application using voice and text.
Twilio - service to programaticly send and receve SMS messages via Python API.
Twilio SendGrid - communication platform for transactional and marketing email.

Interface

User interfaces with the application using SMS enabled by Twilio service. Twilio service connects to AWS Lex Bot which handles all the conversation logig.

Amazon Lex Bot gathers the following user info:

Name
Email
n number of portfolio stock tickers

The user gets the News Sentiment, Twitter Sentiment, and Machine Learning AI Stock Opinion via periodic emails. The first email is received right after the Machine Learning model finished training and is fitted with data to predict future stock prices.

The emails are distributed via Twilio's SendGrid service.

The resulting email looks something like this:

Flask API

Overview

A Flask API was built in order to handle all tasks between the:

Amazon Lex Bot via Lambda
Data sources: Market Data Connection (see [code/marketdata/] folder), NewsAPI, Twitter API
Technical Analysis module : technicals.py
Machine Learning module : lstm_model.py
Sentiment Analysis service
Amazon RDS PostgreSQL server

All events are triggered by AWS Cloudwatch. AWS Lambda function handle all of the production python code.

Flask API services can he found here: Project2API
Project Application code can be found here: Project2Application
Project Infrastructure code can be found here: Project2Infrastructure

Flask API steps

The steps by which the Flask API executes application workflow is outlines in the table below.

	Objective	Action	Trigger
1	User Data	User & Portfolio Creation	Amazon LEX
2	Model - Training	Trigger the API to run the training	Lambda / CloudWatch
3	Model - Training	Save the model in Amazon S3	API
4	Model - Forecast	Forecast the tickers	Lambda / CloudWatch / API
5	User Data	Update the user portfolio	Lambda / CloudWatch / API
6	User Data	Send email to the users	Lambda / CloudWatch / API

SQL Database

Database Overview

A PostgreSQL database hosted on Amazon RDS is utilized to store all the user data and machine learning models.

All database code can be viewed here: code/src/

Amazon RDS

Amazon Relational Database Service (Amazon RDS) makes it easy to set up, operate, and scale a relational database in the cloud. It provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.

Postgres

PostgreSQL is a powerful, open source object-relational database system with over 30 years of active development that has earned it a strong reputation for reliability, feature robustness, and performance.

psycopg2 was used to interface python with PostgreSQL database. pgAdmin was used for testing and debugging.

Database Schematics

Technical Analysis

Technical analysis is performed via technicals module. A demonstration of the module can be seen in technicals_demo.ipynb

Indicators

Relative Strength Index (RSI)

RSI is a momentum indicator which measures the magnitude of recent price changes to evaluate overbought or oversold conditions in the price of a stock. [Investopedia]

RSI Equation

where:
relative strenght (RS) = average gain - average loss

William's Percent Range (Williams %R)

Williams %R is a momentum indicator which measures overbought and oversold levels. It has a domain between 0 and -100.The Williams %R may be used to find entry and exit points in the market. [Investopedia]

Williams %R Equation

where:
Highest High = Highest price in the lookback period.
Close = Most recent closing price.
Lowest Low = Lowest price in the lookback period.

Money Flow Index

The money flow index (MFI) is an oscillator that ranges from 0 to 100. It is used to show the money flow (an approximation of the dollar value of a day's trading) over several days. [Wikipedia]

Money Flow Index Equation

- Positive money flow is calculated by adding the money flow of all the days where the typical price is higher than the previous day's typical price.
- Negative money flow is calculated by adding the money flow of all the days where the typical price is lower than the previous day's typical price.
- If typical price is unchanged then that day is discarded.
- The money flow is divided into positive and negative money flow.

Stochastic Oscillator

The stochastic oscillator is a momentum indicator comparing a particular closing price of a security to a range of its prices over a certain period of time. The sensitivity of the oscillator to market movements is reducible by adjusting that time period or by taking a moving average of the result. It is used to generate overbought and oversold trading signals, utilizing a 0–100 bounded range of values. [Investopedia]

Stochastic Oscillator Equation

where:
C = The most recent closing price
Low_n = The lowest price traded of the n previous trading sessions
High_n = The highest price traded during the same n-day period
%K = The current value of the stochastic indicator

Moving Average Convergence Divergence (MACD)

MACD is a trend-following momentum indicator that shows the relationship between two moving averages of a security’s price. The MACD is calculated by subtracting the 26-period exponential moving average (EMA) from the 12-period EMA. [Investopedia]

MACD Equation

Exponential moving average is a moving average that places a greater weight to most recent data points and less to the older data points. In finance, EMA reacts more significantly to recent price changes than a simple moving average (SMA)which applies an equal weight to all observations in the period. In statistics, a moving average (MA), also known as simple moving average (SMA) in finance, is a calculation used to analyze data points by creating a series of averages of different subsets of the full data set.

Moving Average

The moving average is a calculation used to smooth data and in finance used as a stock indicator. [Investopedia]

Moving Average Equation

where:
A = Average in period n
n = Number of time periods

Exponential Moving Average

The exponential moving average is a type of moving average that gives more weight to recent prices in an attempt to make it more responsive to new information. [Investopedia]

EMA Equation

where:
EMA_t = EMA today
EMA_y = = EMA yesterday
V_t = Value today
s = smoothing
d = number of days

High Low and Close Open

the high-low and close-open indicators are the difference between the high and low prices of the day and close and open prices of the day respectively.

High-Low and Close-Open Equations

Bollinger Bands

A Bollinger Band® is a technical analysis tool defined by a set of trendlines plotted two standard deviations (positively and negatively) away from a simple moving average (SMA) of a security's price. Bollinger Bands® were developed and copyrighted by famous technical trader John Bollinger, designed to discover opportunities that give investors a higher probability of properly identifying when an asset is oversold or overbought. [Bollinger Bands],[Investopedia]

Bollinger Bands Equation

where:
σ = standard deviation
m = number of standard deviations
n = number of days in the smoothing period

Machine Learning Model

LSTM (Long Short-Term Memory) model using TensorFlow and Keras is used. An example of the machine learning model code is provided in lstm_demo.ipynb notebook.

LSTM Overview

This application utilizes LSTM (Long Short-Term Memory) machine learning model. LSTM model was developed by Sepp Hochreiter and published in Neural Computation in 1997 [Hochreiter 1997]. A common LSTM unit is composed of a cell, an input gate, an output gate and a forget gate. The cell remembers values over arbitrary time intervals and the three gates regulate the flow of information into and out of the cell Wikipedia.

Machine Learning Libraries

TensorFlow

TensorFlow is an end-to-end open source platform for machine learning. It has a comprehensive, flexible ecosystem of tools, libraries and community resources that lets developers easily build and deploy ML powered applications.

Keras

Keras is an open-source software library that provides a Python interface for artificial neural networks. Keras acts as an interface for the TensorFlow library. Keras allows for easy implementation of TensorFlow methods without the need to build out complex machine learning infrastructure.

Implementation

Data Acquisition

Data is acquired from Alpaca Trade API and processed using the technicals module. The resulting DataFrame contains Closing price and all of the technical indicators.

The market data is obtained by calling the ohlcv() method within the alpaca module. The methods takes a list of tickers, as well as the start_data and end_date, and returns a pd.DataFrame.

end_date  = datetime.now().strftime('%Y-%m-%d')
start_date  = (end_date - timedelta(days=1000)).strftime('%Y-%m-%d')

ohlcv_df = alpaca.ohlcv(['tickers'], start_date=start_date, end_date=end_date)

The TechnicalAnalysis class must first be instantiated with the pd.DataFrame containing market data.

tech_ind = technicals.TechnicalAnalysis(ohlcv_df)
tech_ind_df = tech_ind.get_all_technicals('ticker')

LSTM model class

The LSTM model is contained within the MachineLearningModel class located in the lstm_model module. The class must first me instantiated with a pd.DataFrame containing the technical analysis data.

my_model = lstm_model.MachineLearningModel(tech_ind_df)

Build, fit and save model

Building and fitting the model is done by calling the build_model() class method.

hist = my_model.build_model()

The model is then saved as an .h5 file.

my_model.save_model('model.h5')

MachineLearningModel.build_model() Description

The MachineLearningModel is used to handle all machine learning methods. The build_model() class method, builds and fits the model. The class method implements the following methodology:

Model overview

The LSTM model is programmed to look back 100 days to predict 14 days. The number of features is set by the shape of the DataFrame.

n_steps_in = 100
n_steps_out = 14
n_features = tech_ind_df.shape[1]

Scaling

A RobustScaler is used to scale the technical analysis data [ScikitLearn].

sklearn.preprocessing.RobustScaler()

Scale features using statistics that are robust to outliers.

This Scaler removes the median and scales the data according to the quantile range (defaults to IQR: Interquartile Range). The IQR is the range between the 1st quartile (25th quantile) and the 3rd quartile (75th quantile). Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Median and interquartile range are then stored to be used on later data using the transform method.

Parsing

The DataFrame is then parsed to np.array and spit into X and y subsets.

X, y = split_sequence(tech_ind_df.to_numpy(), n_steps_in, n_steps_out)

Where split_sequence() is a helper method that splits the multivariate time sequences.

Model type

Sequential() model is utilized as it groups a linear stack of layers into a tf.keras.Model [TensorFlow]

model = tf.keras.Sequential()

Activation function

A hyperbolic tangent activation function is used : tanh[TensorFlow]

activation_function = tf.keras.activations.tanh

Input and hidden layers

LSTM input and hidden layers are utilized. [TensorFlow]

The input layer contains 60 nodes, while the hidden layers contain 30 nodes by default but can be set by the administrator to n arbitrary amount by setting the n_nodes variable. The number of hidden layers default to 1 but can also be modified by the administrator.

Hidden layers are added with a add_hidden_layers() helper function.

n_nodes = 30

# input layer
model.add(LSTM(60, 
               activation=activation_function, 
               return_sequences=True, 
               input_shape=(n_steps_in, n_features)))

# hidden layers ...
model.add(LSTM(n_nodes, activation=activation_function, return_sequences=True))

Dense layers

Two dense layers are used in the model. Dense layers are added using add_dense_layers class method.

model.add(Dense(30))

Optimizer

The model uses Adam optimizer (short for Adaptive Moment Estimation) [TensorFlow]. Adam is a stochastic gradient descent method that is based on adaptive estimation of first-order and second-order moments. Adam optimizer was developed by Diederik Kingma and Jimmy Ba and published in 2014 [Kingma et. al. 2014]. Adam optimizer is defined by its creators as "an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments."

optimizer = tf.keras.optimizers.Adam

Loss function

The model uses Mean Squared Error loss function, which computes the mean of squares of errors between labels and predictions [TensorFlow]

loss = tf.keras.losses.MeanSquaredError

Other model parameters

Model is trained for 16 epochs using 128 unit batch size. The validation split is 0.1.

Compiling and fitting

The model is then compiled and fit.

model.compile(optimizer='adam', loss='mse', metrics=['accuracy'])
hist = model.fit(X, y, epochs=16, batch_size=128, validation_split=0.1)

Training Results

An example of model training results with conducted with The Coca-Cola Company stock : KO.

Accuracy

Loss

Predictions

Predictions are calculated with a validator() helper method.

Forecasting stock prices

Implementation

To forecast stock prices using the saved model, the application uses the ForecastPrice class located within the lstm_model module.

The module pre-processes the date using the aforementioned methods and then utilizes <code.model.predict() TensorFlow method.

The application accomplished this by:

Getting stock prices for past 200 days using alpaca module
Getting technical indicators using the get_all_technicals() method withing the technicals.TechnicalAnalysis class
Instantiating the ForecastPrice class with the technical data

forecast_model = lstm_model.ForecastPrice(tech_ind_df)

Calling forecast() method within the ForecastPrice class

forecast = forecast_model.forecast()

ForecastPrice.forecast() Description

ForecastPrice class handles all of the forecasting functions. The forecast() class method implements the following methodology:

Load model using load_model Keras method.

from tensorflow.keras.models import load_model
forecast_model = load_model("model.h5")

Pre-processes the data following the same methodology as MachineLearningModel class.
Predicts the prices.

forecasted_price = forecast_model.predict(tech_ind_df)

Inverse scale the prices.

forecasted_price = scaler.inverse_transform(forecasted_price)[0]

Forecast Result

If the predicted price 14 days from now is higher than the current price, the application will issue a buy "opinion", if the price is lower that the current price it will issue a sell "opinion" on the date of the highest predicted price.

Sentiment Analysis

Sentiment analysis is performed using the Google Cloud Natural Language service.

The data utilized in sentiment analysis is obtained from 2 sources:

Implementation of NewsAPI and Tweepy can be found in the demo notebook: nlp_demo.ipynb

The sentiment analysis implementation:

from google.cloud import language_v1
from google.oauth2.credentials import Credentials

def GetSentimentAnalysisGoogle(text_content):
    os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = '../your_credentials_file.json'
    client = language_v1.LanguageServiceClient()
    type_ = language_v1.Document.Type.PLAIN_TEXT
    document = {'content': text_content, 'type_': type_}
    encoding_type = language_v1.EncodingType.UTF8
    response = client.analyze_sentiment(request={'document': document, 
                                                 'encoding_type': encoding_type})
    return {'score' : response.document_sentiment.score , 
            'magnitude' : response.document_sentiment.magnitude}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
code		code
img		img
.gitignore		.gitignore
README.md		README.md

illyanyc/technitrade

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

Application Logic

Libraries

Data Computation and Visualization

Database

Data Source APIs

Machine Learning

Other Development Frameworks

Interface

Flask API

Overview

Flask API steps

SQL Database

Database Overview

Amazon RDS

Postgres

Database Schematics

Technical Analysis

Indicators

Relative Strength Index (RSI)

William's Percent Range (Williams %R)

Money Flow Index

Stochastic Oscillator

Moving Average Convergence Divergence (MACD)

Moving Average

Exponential Moving Average

High Low and Close Open

Bollinger Bands

Machine Learning Model

LSTM Overview

Machine Learning Libraries

TensorFlow

Keras

Implementation

Data Acquisition

LSTM model class

Build, fit and save model

MachineLearningModel.build_model() Description

Model overview

Scaling

Parsing

Model type

Activation function

Input and hidden layers

Dense layers

Optimizer

Loss function

Other model parameters

Compiling and fitting

Training Results

Accuracy

Loss

Predictions

Forecasting stock prices

Implementation

ForecastPrice.forecast() Description

Forecast Result

Sentiment Analysis

Team

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages