To examine a number of different forecasting techniques to predict future stock returns based on past returns and numerical news indicators to construct a portfolio of multiple stocks in order to diversify the risk. We do this by applying supervised learning methods for stock price forecasting by interpreting the seemingly chaotic market data. The fluctuation of the stock market is violent and there are many complicated financial indicators. However, the advancement in technology provides an opportunity to gain steady fortune from stock market and also can help experts to find out the ost informative indicators to make better prediction. The prediction of the market value is of paramount importance to help in maximizing the profit of stock option purchase while keeping the risk low. We have used previous datasets of stocks and news headines for the forecasting.
You need to have installed following softwares and libraries in your machine before running this project.
Python 3 Anaconda: It will install ipython notebook and most of the libraries which are needed like sklearn, pandas, seaborn, matplotlib, numpy, scipy,streamlit.
Pandas: For creating and manipulating dataframes.
Scikit Learn: For importing k-means clustering.
JSON: Library to handle JSON files.
XML: To separate data from presentation and XML stores data in plain text format.
Beautiful Soup and Requests: To scrap and library to handle http requests.
Matplotlib: Python Plotting Module.
the dataset we considered is web scrapped from APIs. The Historical Dataset came from NASDAQ API and News Articles are from Yahoo Finance
HistoricalData_APPLE.csv
Data Source --> Dataset/
Data points --> 2517 rows
Dataset date range --> October 2011 to September 2021
Dataset Attributes:
-
Close/Last - Close/Last Prices
-
Volume - Volume of Stocks
-
Open - Opening Prices of Stocks
-
High - Highest Prices of Stocks
-
Low - Lowest Prices of Stocks
Deleted "Unnamed:7" Column For "Nan" Values Parsed The Date attribute in "datetime64" data type. Checked For Duplicate Rows(Not Found). Dropped features which are of no use the model. Removed outliers from data and make it more clean to use further.
Exploratory Data Analysis is a process of examining or understanding the data and extracting insights or main characteristics of the data. EDA is generally classified into two methods, i.e. graphical analysis and non-graphical analysis.
Technically, The primary motive of EDA is to
Examine the data distribution
Handling missing values of the dataset(a most common issue with every dataset)
Handling the outliers
Removing duplicate data
Encoding the categorical variables
Normalizing and Scaling
Data Visualization for all the columns for yearly wise
Data Visualization for all the columns for monthly wise
Data Visualization for all the columns for quarterly wise
Scatter PLot is Plotted between each Attribute(Trend)
Heat Matrix is Shown For Correlation Between Each Attribute(Linear Relation)
So, after the exploratory data analysis we started modelling using Python.So for modelling we used Machine Learning algorithms on the datasets to build model to that will generate output for prediction of Stocks Price.In this step we have divided the data into train and test as 80%,20% respectively. In this process we have used many algorithms and applied some hyperparameter tuning so that our algorithms can do better. The algorithms which we have tried are:
- Linear Regression
- Naïve bayes
- Neural networks
Linear Regression is a supervised learning algorithm in machine learning. It models a prediction value according to independent variables and helps in finding the relationship between those variables and the forecast and in this case we used last years dataset of companies to predict stocks value for future.
The accuracy score of model by linear regression
RMSE(Root Mean Sqaured Error) = 0.1459830874093662
R-2(R-Square Score) = 0.9998357614326422
Naïve bayes is a probabilistic classifier, which means it predicts on the basis of the probability of an object. It is called Naïve because it assumes that the occurrence of a certain feature is independent of the occurrence of other features. It is called Bayes because it depends on the principle of Bayes' Theorem.
Predicting the Impact of News articles on the Closed Price of the Apple Inc. Stocks using Naive Bayes Classifier. Firstoff all we merge the News Articles dataset and Historical Stocks Dataset into a single dataset on the 'Date' column after making some necessary changes to them. Now we add two more column named 'close_price_diff' and 'Impact' to the dataset, with 'close_price_diff' column containing the difference in Closed Price from the previous day and 'Impact' column containing 1 if the Closed Price difference is positive and 0 if it is negative. Afterwards we apply Natural Language Processing on the News Headlines text and obtain a Bag of words containing 20000 most common words from them by converting them to vectorized form. Now we train the Naive Bayes model (Gaussian, Multinomial or Bernoulli each in different files) by the splitting the dataset, 80% as training dataset and 20% as test dataset. Finally we do HyperParameter tuning to get the best predicted results.
We are classifying the news articles such that our model helps in classifying the news articles to be a profit or a loss.
We are doing this by calculating the diff in closed price of present day with the previous day.
The Accuracy score in Naïve bayes is 51.93%
And After Hyperparameter Tuning it increased to 53.29%
Neural networks, also known as artificial neural networks (ANNs) or simulated neural networks (SNNs), are a subset of machine learning and are at the heart of deep learning algorithms. As the name suggest Neural network, it is quiet like our brain where there are some neurons working to get us the output. Then comes RNN which is a type of Neural Network which uses sequential data or time series data. Long Short-Term Memory (LSTM) networks are a type of recurrent neural network capable of learning order dependence in sequence prediction problems.
The rmse score in LSTM is 101.3501
Predicting the closing stock price of a Apple Inc. using the past 20 day stock price by an artificial recurrent neural network called LSTM. We combine Historical data of Apple stocks prices and News articles data after some necessary changes to make them useful to get a combined dataset. Then we apply Sentiment Analysis on the News Headlines of the dataset to get 'compound', 'positive', 'negative' and 'neutral' values from it. After making some necessary changes and visualizing the data in various ways, we finalize 'close_price' and 'compound' as our features and 'close_price' as our dependent variable. The model is then trained on 80% of the data after applying the Feature Scaling (MinMaxScaler) on the features and tested on the remaining. We train the model by adding sufficient number of LSTM and Dense layers and using appropriate parameters values. At last the model predicts the values of 21st day Closed Price using past 20 days Closed Price and Compound value generated from the news headlines.
The model we choose finally is Linear Regression and Deployed it on heroku and streamlit. we used flask framework to upload model on website. Deploying the LSTM Combined_Data using Streamlit. It uses predicts the Closed Price of 21st day using past 20 days Close Price. In this we combined the News Articles data and Historical data to form new dataset named stock_data. We used the stock_data to train our model of Neural Networks which is build by using LSTM and Dense layers. At last we save the model in .h5 format, which is used by app.py file to display the results in the Streamlit interface. The app.py file uses the model.h5 file and predicts the result. It also is used to design the Streamlit interface and manipulate what to show on it. Finally the user can interact with the index.html file to enter a date for which he/se wants the Closed Price to be Predicted.
Here is the deployment link of the model Click Here
Here are some screenshots of website deployed in Streamlit.
- Web scrapped
- Data Loading
- Data Preprocessing
- Exploratory data analysis
- Feature engineering
- Feature selection
- Feature transformation
- Model building
- Model evalutaion
- Model tuning
- Prediction's
- Python
- Pycharm
- Jupyter Notebook
- Google Colab
- GitHub
- GitBash
- SublimeTextEditor
- Chandrachud Singh Chundawat
- V. Nanda Gopal
- Rahul Amarwal
- Kondapu Lavanya
- Sunil Mali
- Sandeep Mannam
- Giduturi Namrata Sai
- Bale Meghana
- Sital Agrawal
- Chandrachud Singh Chundawat
- Mr. Yasin shah