Skip to content

Commit

Permalink
Chore(AI): Fix structure in sp500
Browse files Browse the repository at this point in the history
  • Loading branch information
Oumaimafisaoui committed Sep 25, 2024
1 parent 9f96f29 commit f37ea58
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 11 deletions.
29 changes: 19 additions & 10 deletions subjects/ai/sp500-strategies/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,26 @@
## Financial strategies on the SP500
## SP500 strategies

### Overview

In this project, you'll apply machine learning to finance. Your goal as a Quant/Data Scientist is to create a financial strategy that uses a signal generated by a machine learning model to outperform the [SP500](https://en.wikipedia.org/wiki/S%26P_500).

The S&P 500 Index is a collection of 500 stocks that represent the overall performance of the U.S. stock market. The stocks in the S&P 500 are chosen based on factors like market value, liquidity, and industry. These selections are made by the S&P 500 Index Committee, which is a group of analysts from Standard & Poor's.

The S&P 500 started in 1926 with only 90 stocks and has grown to include 500 stocks since 1957. Historically, the average annual return of the S&P 500 has been about 10-11% since 1926, and around 8% since 1957.

### Role play

As a Quantitative Researcher, your challenge is to develop a strategy that can consistently outperform the S&P 500, not just in one year, but over many years. This is a difficult task and is the primary goal of many hedge funds around the world.

The project is divided in parts:
### Learning Objective

- **Data processing and feature engineering**: Build a dataset: insightful features and the target
- **Machine Learning pipeline**: Train machine learning models on the dataset, select the best model and generate the machine learning signal.
- **Strategy backtesting**: Generate a strategy from the Machine Learning model output and backtest the strategy. As a reminder, the idea here is to see what would have performed the strategy if you had invested.

### Data processing and features engineering
### Instructions

#### Data processing and features engineering

The file `HistoricalData.csv` contains the open-high-low-close (OHLC) SP500 index data and the other file, `all_stocks_5yr.csv`, contains the open-high-low-close-volume (OHLCV) data on the SP500 constituents.

Expand Down Expand Up @@ -42,7 +48,7 @@ We assume it is day `D`, and we want to take a position on the next n days. The

> Remark: The target used is the return computed on the price and not the price directly. There are statistical reasons for this choice - the price is not stationary. The consequence is that a machine learning model tends to overfit while training on not stationary data.
### Machine learning pipeline
#### Machine learning pipeline

- Cross-validation deliverables:
- Implements a cross validation with at least 10 folds. The train set has to be bigger than 2 years history.
Expand Down Expand Up @@ -80,7 +86,7 @@ Once you'll have run the grid search on the cross validation (choose either Bloc

- (optional): [Train an RNN/LSTM](https://towardsdatascience.com/predicting-stock-price-with-lstm-13af86a74944). This is a nice way to discover and learn about recurrent neural networks. But keep in mind that there are some new neural network architectures that seem to outperform recurrent neural networks. Here is an [interesting article](https://towardsdatascience.com/the-fall-of-rnn-lstm-2d1594c74ce0) about the topic.

### Strategy backtesting
#### Strategy backtesting

- Backtesting module deliverables. The module takes as input a machine learning signal, convert it into a financial strategy. A financial strategy DataFrame gives the amount invested at time `t` on asset `i`. The module returns the following metrics on the train set and the test set.
- Profit and Loss (PnL) plot: save it as `strategy.png`
Expand All @@ -107,7 +113,7 @@ Once you'll have run the grid search on the cross validation (choose either Bloc
- PnL plot
- strategy metrics on the train set and test set

### Example of strategies:
#### Example of strategies:

- Long only:
- Binary signal:
Expand Down Expand Up @@ -172,7 +178,7 @@ Here's an example on how to convert a machine learning signal into a financial s
project
β”œβ”€β”€ data
β”‚Β Β  └── sp500.csv
β”œβ”€β”€ environment.yml
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ README.md
β”œβ”€β”€ results
β”‚Β Β  β”œβ”€β”€ cross-validation
Expand All @@ -199,7 +205,10 @@ project

Note: `features_engineering.py` can be used in `gridsearch.py`

### Files for this project
### Tips

Remember, the goal of this project is not just to beat the S&P 500 in a backtest, but to learn about the process of developing and testing trading strategies using machine learning techniques.

### Resources

You can find the data required for this project in this :
[link](https://assets.01-edu.org/ai-branch/project4/project04-20221031T173034Z-001.zip)
You can find the data required for this project in this : [link](https://assets.01-edu.org/ai-branch/project4/project04-20221031T173034Z-001.zip)
2 changes: 1 addition & 1 deletion subjects/ai/sp500-strategies/audit/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#### Financial strategies on the SP500
#### SP500 strategies

###### Is the structure of the project like the one presented in the `Project repository structure` in the subject?

Expand Down

0 comments on commit f37ea58

Please sign in to comment.