In time-series scenario, the dataset can't not be split randomly. We should keep:
- The test data always follows the training data.
- Each split should keep the order of time.
- Packages:
- numpy
- pandas
- tsCrossValidation.py: Functions for Time Series Cross Validation.
- pic: Visualization and pseudo code for Time Series Cross Validation.
This Cross Validation is the same with scikit-learn's TimeSeriesSplit. The length of test split is fixed depending on how many splits you want totally.
Compare with Multiple Splits Cross Validation, Blocked Time Series Cross Validation can avoid the potential data leakage from the future data. That's why Blocked Time Series Cross Validation is introduced.
Walk Forward Validation is the one which are very similar with the real world(time-series incremental data) scenario.
This project is licensed under the MIT License - see the LICENSE file for details.
[1]. https://machinelearningmastery.com/backtest-machine-learning-models-time-series-forecasting/
[2]. https://hub.packtpub.com/cross-validation-strategies-for-time-series-forecasting-tutorial/
[3]. https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.TimeSeriesSplit.html