Skip to content
This repository has been archived by the owner on Feb 8, 2019. It is now read-only.

stepchoi/DL-Price-Mom

Repository files navigation

Dependencies

  • Python 3.6+, Postgres, Unix (tested on Ubuntu)
  • pip install -r requirements.txt
  • Install hyperopt from source

Setup

  1. createdb dl_price_mom
  2. cp config.example.json config.json then modify config.json
  3. Sign into Kaggle, download this dataset
  4. Unzip
    unzip price-volume-data-for-all-us-stocks-etfs.zip -d tmp
    pushd tmp
    unzip Data.zip
    popd
    

The important additions to your dir structure should be:

- config.json (modified)
- tmp
  - Stocks 

Import

python import.py

This will populate your database from the Kaggle dataset, it will take a few hours.

Note re: dataset

We have arrange the code to work with the Kaggle dataset but there are few important provisions:

  1. Kaggle data was NOT used for our analysis. Our research is based on proprietary licensed data from FTSE Russell which we cannot provide by the terms of our agreement.
  2. Kaggle does not specify the investment universe, which was sourced from FTSE Rusell database for the Russell 1000-based investment universe.

Run

You'll train three separate components to completion (they depend on each other sequentially):

  1. Autoencoder
  2. Embedded Clustering
  3. Recurrent Neural Network (GRU/FFN)

Each component can take 6h or more to run; EmbedClust, in particular, takes multiple days. Run each step in a tmux session and check back in 24h. Between each step (after completion), you'll choose optimal autoencoder configurations or embedded clusterings.

  1. python ae.py - runs hyperopt for autoencoding of the data
  2. python select_winners.py - selects optimal hyperparameter configurations for autoencoder.
  3. python embed_clust.py - runs embedded clustering on the data for origins and k clusters based on the selected autoencoder.
  4. Manually select embed_clust clustering outputs for each origin (set use to TRUE), based on normalized X_B or S_Dbw scores.
  5. python rnn.py - runs GRU/FFN based on selected optimal embedded clusterings.

Credit

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages