Skip to content

Commit

Permalink
improve docs
Browse files Browse the repository at this point in the history
  • Loading branch information
AlexanderVNikitin committed Mar 14, 2024
1 parent dc62543 commit 127dd64
Show file tree
Hide file tree
Showing 5 changed files with 72 additions and 12 deletions.
6 changes: 6 additions & 0 deletions docs/guides/datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,9 @@ The package provides easy access to many time series datasets.
* - Stock data
- tsgm.utils.get_stock_data(ticker_name)
- Gets historical stock data from YFinance
* - COVID-19 over the US
- tsgm.utils.get_stock_data(ticker_name)
- https://github.com/AlexanderVNikitin/covid19-on-graphs
* - Energy Data (UCI)
- tsgm.utils.get_energy_data
- https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction
Expand All @@ -37,3 +40,6 @@ The package provides easy access to many time series datasets.
* - Samples from GPs
- tsgm.utils.get_gp_samples_data
- https://en.wikipedia.org/wiki/Gaussian_process
* - Physionet 2012
- tsgm.utils.get_physionet2012
- https://archive.physionet.org/pn3/challenge/2012/
2 changes: 2 additions & 0 deletions docs/guides/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,5 @@ The package itself can be installed via pip:
.. code-block:: none
$ pip install tsgm
To install TSGM from sources follow `CONTRIBUTING.md <https://github.com/AlexanderVNikitin/tsgm/blob/main/CONTRIBUTING.md>`_
70 changes: 61 additions & 9 deletions docs/guides/introduction.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ TSGM offers a wide range of features to support the generation and evaluation of

- **Evaluation Approaches:** TSGM provides multiple approaches for evaluating the quality of synthetic time series data. These evaluation methods help assess the fidelity of the generated data by comparing it to real-world time series, enabling researchers to measure the accuracy and statistical properties of the synthetic data.

- **Built on TensorFlow:** TSGM is built on top of the `TensorFlow <https://www.tensorflow.org/>`_ deep learning framework. TensorFlow offers efficient computation and enables seamless integration with other TensorFlow-based models and libraries, allowing users to leverage its extensive ecosystem for further customization and experimentation.
- **Built on Keras:** TSGM is built on top of the `Keras <https://www.keras.io/>`_ deep learning framework. It offers efficient computation and enables seamless integration with other TensorFlow-based models and libraries, allowing users to leverage its extensive ecosystem for further customization and experimentation.


Augmentations
Expand All @@ -37,17 +37,20 @@ A central concept of TSGM is `Generator`. The generator can be trained on histor

The training of data-driven simulators can be done via likelihood optimization, adversarial training procedures, or variational methods. Some of the implemented data-driven simulators include:

- `tss.models.cgan.GAN` - standard GAN model adapted for time-series simulation,\\
- `tss.models.cgan.ConditionalGAN` - conditional GAN model for labeled and temporally labeled time-series simulation,\\
- `tss.models.cvae.BetaVAE` - beta-VAE model adapted for time-series simulation,\\
- `tss.models.cvae.cBetaVAE` - conditional beta-VAE model for labeled and temporally labeled time-series simulation.
- `tsgm.models.sts.STS` - Structural Time Series model for time sires generation,\\
- `tsgm.models.cgan.GAN` - standard GAN model adapted for time-series simulation,\\
- `tsgm.models.cgan.ConditionalGAN` - conditional GAN model for labeled and temporally labeled time-series simulation,\\
- `tsgm.models.cvae.BetaVAE` - beta-VAE model adapted for time-series simulation,\\
- `tsgm.models.cvae.cBetaVAE` - conditional beta-VAE model for labeled and temporally labeled time-series simulation,\\
- `tsgm.models.cvae.TimeGAN` - extended GAN-based model for time series generation.

A minimalistic example of synthetic data generation with VAEs:

.. code-block:: python
import tsgm
from tensorflow import keras
n, n_ts, n_features = 1000, 24, 5
data = tsgm.utils.gen_sine_dataset(n, n_ts, n_features)
scaler = tsgm.utils.TSFeatureWiseScaler()
Expand All @@ -66,30 +69,79 @@ In TSGM, time series datasets are often stored in one of two ways: wrapped in a

Class `tsgm.dataset.DatasetProperties` implements generic placeholder for data when they are unavailable.

`tsgm.utils` has a plenty of datasets, see :ref:`datasets-label`.
`tsgm.utils` has a plenty of datasets, see :ref:`datasets-label`. For instance,

.. code-block:: python
import tsgm
ucr_data_manager = tsgm.utils.UCRDataManager(ds="gunpoint")
assert ucr_data_manager.summary() is None
X_train, y_train, X_test, y_test = ucr_data_manager.get()
Architectures Zoo
=============================
Architectures Zoo is a storage object of NN architectures that can be utilized by the framework users. It provides architectures for GANs, VAEs, and downstream task models. It also provides additional information on the implemented architectures via `zoo.summary()`.
Architectures Zoo is a storage object of NN architectures that can be utilized by the framework users.
It provides architectures for GANs, VAEs, and downstream task models. It also provides additional information on the implemented architectures via `tsgm.models.zoo.summary()`. `tsgm.models.zoo` object support API of Python dictionary. In particular the users can add their custom models to it.

For example, the models from zoo can be used as follows:

.. code-block:: python
import tsgm
model_type = tsgm.models.architectures.zoo["cgan_lstm_n"]
arch = model_type(
seq_len=seq_len, feat_dim=feat_dim,
latent_dim=latent_dim, output_dim=output_dim)
arch_dict = arch.get()
# arch will store `.generator` and `.discriminator` fields for cGAN
Metrics
=============================
In `tsgm.metrics`, we implemented several metrics for evaluation of generated time series. Essentially, these metrics are subdivided into five types:

- data similarity,
- data similarity / distance,
- predictive consistency,
- fairness,
- privacy,
- downstream effectiveness,
- visual similarity.

See the following code for an example of using metrics:

.. code-block:: python
import tsgm
import functools
import numpy as np
Xr, yr = tsgm.utils.gen_sine_vs_const_dataset(10, 100, 20, max_value=2, const=1) # real data
Xs, ys = Xr + 1e-5, yr # synthetic data
d_real = tsgm.dataset.Dataset(Xr, yr)
d_syn = tsgm.dataset.Dataset(Xs, ys)
statistics = [
functools.partial(tsgm.metrics.statistics.axis_max_s, axis=None),
functools.partial(tsgm.metrics.statistics.axis_min_s, axis=None)]
sim_metric = tsgm.metrics.DistanceMetric(
statistics=statistics, discrepancy=lambda x, y: np.linalg.norm(x - y)
)
sim_metric = tsgm.metrics.DistanceMetric(
statistics=statistics, discrepancy=discrepancy_func
)
sim_metric(d_real, d_syn)
Implementations and examples of these methods are described in `tutorials/metrics.ipynb`.


Citing
=======================
If you find the *Time Series Generator Modeling framework* useful, please consider citing our paper:
If you find the *TSGM* useful, please consider citing our paper:

.. code-block:: latex

Expand Down
1 change: 0 additions & 1 deletion tests/test_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,6 @@ def test_dataset():
assert d1.Xy_concat.shape == (10, 20, 23)



def test_temporally_labeled_ds():
X = np.ones((10, 100, 2))
y = np.ones((10, 100))
Expand Down
5 changes: 3 additions & 2 deletions tsgm/metrics/metrics.py
Original file line number Diff line number Diff line change
Expand Up @@ -332,8 +332,6 @@ def __call__(self, d: tsgm.dataset.DatasetOrTensor) -> float:


class DemographicParityMetric(Metric):
_DEFAULT_KS_METRIC = lambda data1, data2: scipy.stats.ks_2samp(data1, data2).statistic # noqa: E731

"""
Measuring demographic parity between two datasets.
Expand Down Expand Up @@ -361,6 +359,9 @@ class DemographicParityMetric(Metric):
>>> result = metric(dataset_hist, groups_hist, dataset_synth, groups_synth)
>>> print(result)
"""

_DEFAULT_KS_METRIC = lambda data1, data2: scipy.stats.ks_2samp(data1, data2).statistic # noqa: E731

def __call__(self, d_hist: tsgm.dataset.DatasetOrTensor, groups_hist: TensorLike, d_synth: tsgm.dataset.DatasetOrTensor, groups_synth: TensorLike, metric: T.Callable = _DEFAULT_KS_METRIC) -> T.Dict:
"""
Calculate the demographic parity metric for the input datasets.
Expand Down

0 comments on commit 127dd64

Please sign in to comment.