Skip to content

Commit

Permalink
Merge pull request #52 from WenjieDu/dev
Browse files Browse the repository at this point in the history
Adding the ETT dataset
  • Loading branch information
WenjieDu authored Dec 20, 2023
2 parents 6820642 + dc82c5e commit 69d48ac
Show file tree
Hide file tree
Showing 9 changed files with 124 additions and 71 deletions.
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,13 @@
</a>
</p>

> 📣 TSDB now supports a total of 1️⃣6️⃣8️⃣ time-series datasets ‼️
> 📣 TSDB now supports a total of 1️⃣6️⃣9️⃣ time-series datasets ‼️
<a href='https://github.com/WenjieDu/PyPOTS'><img src='https://pypots.com/figs/pypots_logos/PyPOTS_logo_FFBG.svg?sanitize=true' width='160' align='left' /></a>
TSDB is a part of
TSDB is a part of
<a href="https://github.com/WenjieDu/PyPOTS">
PyPOTS <img align="center" src="https://img.shields.io/github/stars/WenjieDu/PyPOTS?style=social">
</a>
</a>
(a Python toolbox for data mining on Partially-Observed Time Series), and was separated from PyPOTS for decoupling datasets from learning algorithms.

TSDB is created to help researchers and engineers get rid of data collecting and downloading, and focus back on data processing details. TSDB provides all-in-one-stop convenience for downloading and loading open-source time-series datasets (available datasets listed [below](https://github.com/WenjieDu/TSDB#-list-of-available-datasets)).
Expand Down Expand Up @@ -99,14 +99,15 @@ That's all. Simple and efficient. Enjoy it! 😃

## ❖ List of Available Datasets

| Name | Main Tasks |
|----------------------------------------------------------------------------------|-----------------------------------------|
| [PhysioNet Challenge 2012](dataset_profiles/physionet_2012) | Classification, Forecasting, Imputation |
| [PhysioNet Challenge 2019](dataset_profiles/physionet_2019) | Classification, Imputation |
| [Beijing Multi-Site Air-Quality](dataset_profiles/beijing_multisite_air_quality) | Forecasting, Imputation |
| [Electricity Load Diagrams](dataset_profiles/electricity_load_diagrams) | Forecasting, Imputation |
| [UCR & UEA Datasets](dataset_profiles/ucr_uea_datasets) (all 163 datasets) | Classification |
| [Vessel AIS](dataset_profiles/vessel_ais) | Classification, Forecasting, Imputation |
| Name | Main Tasks |
|---------------------------------------------------------------------------------------------------|-----------------------------------------|
| [PhysioNet Challenge 2012](dataset_profiles/physionet_2012) | Forecasting, Imputation, Classification |
| [PhysioNet Challenge 2019](dataset_profiles/physionet_2019) | Forecasting, Imputation, Classification |
| [Beijing Multi-Site Air-Quality](dataset_profiles/beijing_multisite_air_quality) | Forecasting, Imputation |
| [Electricity Load Diagrams](dataset_profiles/electricity_load_diagrams) | Forecasting, Imputation |
| [Electricity Transformer Temperature (ETT)](dataset_profiles/electricity_transformer_temperature) | Forecasting, Imputation |
| [Vessel AIS](dataset_profiles/vessel_ais) | Forecasting, Imputation, Classification |
| [UCR & UEA Datasets](dataset_profiles/ucr_uea_datasets) (all 163 datasets) | Classification |


## ❖ Citing TSDB/PyPOTS
Expand Down
22 changes: 22 additions & 0 deletions dataset_profiles/electricity_transformer_temperature/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Electricity Transformer Temperature

## Citing this dataset 🤗

`Zhou, H., Zhang, S., Peng, J., Zhang, S., Li, J., Xiong, H., & Zhang, W. (2021, May).
Informer: Beyond efficient transformer for long sequence time-series forecasting.
In Proceedings of the AAAI conference on artificial intelligence (Vol. 35, No. 12, pp. 11106-11115).`

or

```bibtex
@inproceedings{zhou2021informer,
author = {Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang},
title = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},
volume = {35},
number = {12},
pages = {11106--11115},
publisher = {{AAAI} Press},
year = {2021},
}
```
21 changes: 11 additions & 10 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -98,16 +98,17 @@ That's all. Simple and efficient. Enjoy it! 😃

❖ List of Available Datasets
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
=================================================================================================================================================================== ==========================================
Name Main Tasks
=================================================================================================================================================================== ==========================================
`PhysioNet Challenge 2012 <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/physionet_2012>`_ :cite:`silva2012physionet` Classification, Forecasting, Imputation
`PhysioNet Challenge 2019 <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/physionet_2019>`_ :cite:`reyna2019physionet` Classification, Imputation
`Beijing Multi-Site Air-Quality <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/beijing_multisite_air_quality>`_ :cite:`zhang2017airquality` Forecasting, Imputation
`Electricity Load Diagrams <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/electricity_load_diagrams>`_ :cite:`trindade2015electricity` Forecasting, Imputation
`UCR & UEA Datasets <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/ucr_uea_datasets>`_ (all 163 datasets) :cite:`bagnall2018uea` :cite:`dau2018ucr` Classification
`Vessel AIS data <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/vessel_ais>`_ :cite:`grgicevic2023ais` Imputation, Forecasting, Classification
=================================================================================================================================================================== ==========================================
========================================================================================================================================================================== ==========================================
Name Main Tasks
========================================================================================================================================================================== ==========================================
`PhysioNet Challenge 2012 <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/physionet_2012>`_ :cite:`silva2012physionet` Forecasting, Imputation, Classification
`PhysioNet Challenge 2019 <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/physionet_2019>`_ :cite:`reyna2019physionet` Forecasting, Imputation, Classification
`Beijing Multi-Site Air-Quality <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/beijing_multisite_air_quality>`_ :cite:`zhang2017airquality` Forecasting, Imputation
`Electricity Load Diagrams <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/electricity_load_diagrams>`_ :cite:`trindade2015electricity` Forecasting, Imputation
`Electricity Transformer Temperature (ETT) <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/electricity_transformer_temperature>`_ :cite:`zhou2021informer` Forecasting, Imputation
`Vessel AIS data <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/vessel_ais>`_ :cite:`grgicevic2023ais` Forecasting, Imputation, Classification
`UCR & UEA Datasets <https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/ucr_uea_datasets>`_ (all 163 datasets) :cite:`bagnall2018uea` :cite:`dau2018ucr` Classification
========================================================================================================================================================================== ==========================================

❖ Citing TSDB/PyPOTS
^^^^^^^^^^^^^^^^^^^^^
Expand Down
11 changes: 11 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -64,3 +64,14 @@ @misc{grgicevic2023ais
doi = {10.5281/zenodo.8064564},
url = {https://doi.org/10.5281/zenodo.8064564}
}

@inproceedings{zhou2021informer,
author = {Haoyi Zhou and Shanghang Zhang and Jieqi Peng and Shuai Zhang and Jianxin Li and Hui Xiong and Wancai Zhang},
title = {Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting},
booktitle = {The Thirty-Fifth {AAAI} Conference on Artificial Intelligence, {AAAI} 2021, Virtual Conference},
volume = {35},
number = {12},
pages = {11106--11115},
publisher = {{AAAI} Press},
year = {2021},
}
3 changes: 3 additions & 0 deletions tsdb/data_processing.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
load_physionet2012,
load_physionet2019,
load_electricity,
load_ett,
load_beijing_air_quality,
load_ucr_uea_dataset,
load_ais,
Expand Down Expand Up @@ -94,6 +95,8 @@ def load(dataset_name: str, use_cache: bool = True) -> dict:
result = load_physionet2019(dataset_saving_path)
elif dataset_name == "electricity_load_diagrams":
result = load_electricity(dataset_saving_path)
elif dataset_name == "electricity_transformer_temperature":
result = load_ett(dataset_saving_path)
elif dataset_name == "beijing_multisite_air_quality":
result = load_beijing_air_quality(dataset_saving_path)
elif dataset_name == "vessel_ais":
Expand Down
57 changes: 8 additions & 49 deletions tsdb/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,58 +36,17 @@
#
# https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/vessel_ais
"vessel_ais": "https://zenodo.org/record/8064564/files/parquets.zip",
#
# https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/electricity_transformer_temperature
"electricity_transformer_temperature": [
"https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTm1.csv",
"https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTm2.csv",
"https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTh1.csv",
"https://raw.githubusercontent.com/zhouhaoyi/ETDataset/main/ETT-small/ETTh2.csv",
],
}


# The list of raw data files to be downloaded
MATR_LINKS = (
(
"https://data.matr.io/1/api/v1/file/5c86c0b5fa2ede00015ddf66/download",
"2017-05-12_batchdata_updated_struct_errorcorrect.mat",
),
(
"https://data.matr.io/1/api/v1/file/5c86bf13fa2ede00015ddd82/download",
"2017-06-30_batchdata_updated_struct_errorcorrect.mat",
),
(
"https://data.matr.io/1/api/v1/file/5c86bd64fa2ede00015ddbb2/download",
"2018-04-12_batchdata_updated_struct_errorcorrect.mat",
),
(
"https://data.matr.io/1/api/v1/file/5dcef152110002c7215b2c90/download",
"2019-01-24_batchdata_updated_struct_errorcorrect.mat",
),
)

HUST_LINKS = (
(
"https://data.mendeley.com/public-files/datasets/nsc7hnsg4s/"
"files/5ca0ac3e-d598-4d07-8dcb-879aa047e98b/file_downloaded",
"hust_data.zip",
),
)

CALCE_LINKS = (
("https://web.calce.umd.edu/batteries/data/CS2_33.zip", "CS2_33.zip"),
("https://web.calce.umd.edu/batteries/data/CS2_34.zip", "CS2_34.zip"),
("https://web.calce.umd.edu/batteries/data/CS2_35.zip", "CS2_35.zip"),
("https://web.calce.umd.edu/batteries/data/CS2_36.zip", "CS2_36.zip"),
("https://web.calce.umd.edu/batteries/data/CS2_37.zip", "CS2_37.zip"),
("https://web.calce.umd.edu/batteries/data/CS2_38.zip", "CS2_38.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_16.zip", "CX2_16.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_33.zip", "CX2_33.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_35.zip", "CX2_35.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_34.zip", "CX2_34.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_36.zip", "CX2_36.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_37.zip", "CX2_37.zip"),
("https://web.calce.umd.edu/batteries/data/CX2_38.zip", "CX2_38.zip"),
)


RWTH_LINKS = (
("https://publications.rwth-aachen.de/record/818642/files/Rawdata.zip", "raw.zip"),
)

# https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/ucr_uea_datasets
# 128 UCR + 33 UEA + 2 old removed (NonInvasiveFatalECGThorax1 and 2) = 163
_ucr_uea_datasets = [
Expand Down
2 changes: 2 additions & 0 deletions tsdb/loading_funcs/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
from .physionet_2019 import load_physionet2019
from .ucr_uea_datasets import load_ucr_uea_dataset
from .vessel_ais import load_ais
from .electricity_transformer_temperature import load_ett

__all__ = [
"load_beijing_air_quality",
Expand All @@ -19,4 +20,5 @@
"load_physionet2019",
"load_ucr_uea_dataset",
"load_ais",
"load_ett",
]
54 changes: 54 additions & 0 deletions tsdb/loading_funcs/electricity_transformer_temperature.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
"""
Scripts related to dataset Electricity Transformer Temperature.
For more information please refer to:
https://github.com/WenjieDu/TSDB/tree/main/dataset_profiles/electricity_transformer_temperature
"""

# Created by Wenjie Du <wenjay.du@gmail.com>
# License: BSD-3-Clause

import os

import pandas as pd


def load_ett(local_path):
"""Load dataset Electricity Transformer Temperature.
Parameters
----------
local_path : str,
The local path of dir saving the raw data of Electricity Transformer Temperature.
Returns
-------
data : dict
A dictionary contains all four sub datasets:
ETTm1 : pandas.DataFrame
The time-series data of ETTm1
ETTm2 : pandas.DataFrame
The time-series data of ETTm2
ETTh1 : pandas.DataFrame
The time-series data of ETTh1
ETTh2 : pandas.DataFrame
The time-series data of ETTh2
"""
sub_datasets = [
"ETTm1.csv",
"ETTm2.csv",
"ETTh1.csv",
"ETTh2.csv",
]

data = {}
for sub_set in sub_datasets:
file_path = os.path.join(local_path, sub_set)
df = pd.read_csv(file_path, index_col="date")
df.index = pd.to_datetime(df.index)
df_name = sub_set.removesuffix(".csv")
data[df_name] = df

return data
2 changes: 1 addition & 1 deletion tsdb/utils/downloading.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ def _download_and_extract(url: str, saving_path: str) -> Optional[str]:
logger.info("Download cancelled by the user.")
raise

logger.info(f"Successfully downloaded data to {raw_data_saving_path}.")
logger.info(f"Successfully downloaded data to {raw_data_saving_path}")

if (
suffix in supported_compression_format
Expand Down

0 comments on commit 69d48ac

Please sign in to comment.