Skip to content

Commit

Permalink
add synchronized_brainwave_datase and its test case, modify readme (#48)
Browse files Browse the repository at this point in the history
* add synchronized_brainwave_datase and its test case, add readme

* change url, replace print with logger, add doc for function, change return type

* improve visualizations

* use file_utils.download to download file

* fix redundant import

---------

Co-authored-by: Alexander Nikitin <1243786+AlexanderVNikitin@users.noreply.github.com>
  • Loading branch information
uncircle and AlexanderVNikitin authored Jun 18, 2024
1 parent b821cd7 commit d5b8cd1
Show file tree
Hide file tree
Showing 3 changed files with 55 additions and 12 deletions.
25 changes: 13 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,18 +150,19 @@ TSGM implements many metrics for synthetic time series evaluation. Check Section


## :floppy_disk: Datasets
| Dataset | API | Description |
| ------------- | ------------- | ------------- |
| UCR Dataset | `tsgm.utils.UCRDataManager` | https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
| Mauna Loa | `tsgm.utils.get_mauna_loa()` | https://gml.noaa.gov/ccgg/trends/data.html |
| EEG & Eye state | `tsgm.utils.get_eeg()` | https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State |
| Power consumption dataset | `tsgm.utils.get_power_consumption()` | https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption |
| Stock data | `tsgm.utils.get_stock_data(ticker_name)` | Gets historical stock data from YFinance |
| COVID-19 over the US | `tsgm.utils.get_covid_19()` | Covid-19 distribution over the US |
| Energy Data (UCI) | `tsgm.utils.get_energy_data()` | https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction |
| MNIST as time series | `tsgm.utils.get_mnist_data()` | https://en.wikipedia.org/wiki/MNIST_database |
| Samples from GPs | `tsgm.utils.get_gp_samples_data()` | https://en.wikipedia.org/wiki/Gaussian_process |
| Physionet 2012 | `tsgm.utils.get_physionet2012()` | https://archive.physionet.org/pn3/challenge/2012/ |
| Dataset | API | Description |
| - |---------------------------------------------------| ------------- |
| UCR Dataset | `tsgm.utils.UCRDataManager` | https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
| Mauna Loa | `tsgm.utils.get_mauna_loa()` | https://gml.noaa.gov/ccgg/trends/data.html |
| EEG & Eye state | `tsgm.utils.get_eeg()` | https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State |
| Power consumption dataset | `tsgm.utils.get_power_consumption()` | https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption |
| Stock data | `tsgm.utils.get_stock_data(ticker_name)` | Gets historical stock data from YFinance |
| COVID-19 over the US | `tsgm.utils.get_covid_19()` | Covid-19 distribution over the US |
| Energy Data (UCI) | `tsgm.utils.get_energy_data()` | https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction |
| MNIST as time series | `tsgm.utils.get_mnist_data()` | https://en.wikipedia.org/wiki/MNIST_database |
| Samples from GPs | `tsgm.utils.get_gp_samples_data()` | https://en.wikipedia.org/wiki/Gaussian_process |
| Physionet 2012 | `tsgm.utils.get_physionet2012()` | https://archive.physionet.org/pn3/challenge/2012/ |
| Synchronized Brainwave Dataset | `tsgm.utils.get_synchronized_brainwave_dataset()` | https://www.kaggle.com/datasets/berkeley-biosense/synchronized-brainwave-dataset |

TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the [documentation](https://tsgm.readthedocs.io/en/latest/guides/datasets.html)

Expand Down
6 changes: 6 additions & 0 deletions tests/test_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -366,3 +366,9 @@ def test_extract_targz():

def test_version():
assert isinstance(tsgm.__version__, str)


def test_get_synchronized_brainwave_dataset():
X, y = tsgm.utils.get_synchronized_brainwave_dataset()
assert X.shape == (30013, 12)
assert y.shape == (30013,)
36 changes: 36 additions & 0 deletions tsgm/utils/datasets.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,42 @@ def get_eeg() -> T.Tuple[TensorLike, TensorLike]:
return X, y


def get_synchronized_brainwave_dataset() -> T.Tuple[pd.DataFrame, pd.DataFrame]:
"""
Loads the EEG Synchronized Brainwave dataset.
This function downloads the EEG Synchronized Brainwave dataset from dropbox
and returns the input features (X) and target labels (y).
:return: A tuple containing the input features (X) and target labels (y).
:rtype: tuple[pd.DataFrame, pd.DataFrame]
"""
url = ("https://www.dropbox.com/scl/fi/uqah9rthwrt5i2q6evtws/eeg-data.csv.zip?rlkey=z7sautwq74jow2xt9o6q7lcij&st"
"=hvpvvfez&dl=1")
cur_path = os.path.dirname(__file__)
path_to_folder = os.path.join(cur_path, "../../data/")
path_to_resource = os.path.join(path_to_folder, 'eeg-data.csv.zip')
path_to_renamed_csv = os.path.join(path_to_folder, "synchronized_brainwave_dataset.csv")
os.makedirs(path_to_folder, exist_ok=True)
if not os.path.exists(path_to_renamed_csv):
file_utils.download(url, path_to_folder)
logger.info("Download completed.")
file_utils.extract_archive(path_to_resource, path_to_folder)
logger.info("Extraction completed.")
original_csv = os.path.join(path_to_folder, "eeg-data.csv")
if os.path.exists(original_csv):
os.rename(original_csv, path_to_renamed_csv)
logger.info(f"File renamed to {path_to_renamed_csv}")
else:
logger.warning("The expected CSV file was not found.")
else:
logger.info("File exist")
df = pd.read_csv(path_to_renamed_csv)
X = df.drop("label", axis=1)
y = df["label"]
return X, y


def get_power_consumption() -> npt.NDArray:
"""
Retrieves the household power consumption dataset.
Expand Down

0 comments on commit d5b8cd1

Please sign in to comment.