Improves documentation

* checks and improves doc strings and doc string formatting * adds a background page to the documentation * fixes the copyright note * adds verbosity flag to CNN predictions and suppresses the tensorflow warnings when loading paat * exposes ENMO calculation * makes mvpa_cutpoint and sb_cutpoint mandatory arguments * adds extended example plus example data * renames test workflow to match with the other badges
Trybnetic · Sep 11, 2024 · 305caa5 · 305caa5
1 parent 44b5d54
commit 305caa5
Show file tree

Hide file tree

Showing 16 changed files with 497 additions and 128 deletions.
diff --git a/.github/workflows/python-test.yml b/.github/workflows/python-test.yml
@@ -1,4 +1,4 @@
-name: Run Unit Tests
+name: tests
 
 on:
   push:

diff --git a/README.rst b/README.rst
@@ -62,11 +62,19 @@ examples and more information on the functions can be found in the documentation
     data.loc[:, "Time in Bed"] = paat.detect_time_in_bed_weitz2024(data, sample_freq)
 
     # Classify moderate-to-vigorous and sedentary behavior
-    data.loc[:, ["MVPA", "SB"]] = paat.calculate_pa_levels(data, sample_freq)
+    data.loc[:, ["MVPA", "SB"]] = paat.calculate_pa_levels(
+      data, 
+      sample_freq, 
+      mvpa_cutpoint=.069, 
+      sb_cutpoint=.015
+    )
 
     # Merge the activity columns into one labelled column. columns indicates the
     # importance of the columns, later names are more important and will be kept
-    data.loc[:, "Activity"] = paat.create_activity_column(data, columns=["SB", "MVPA", "Time in Bed", "Non Wear Time"])
+    data.loc[:, "Activity"] = paat.create_activity_column(
+      data, 
+      columns=["SB", "MVPA", "Time in Bed", "Non Wear Time"]
+    )
 
     # Remove the other columns after merging
     data =  data[["X", "Y", "Z", "Activity"]]

diff --git a/docs/source/background.rst b/docs/source/background.rst
@@ -1,7 +1,73 @@
 Background 
 ==========
 
+Accelerometers have become a popular assessment tool of physical activity over the last 
+decades. The small body-worn sensors provide an easy and more objective alternative 
+to classic questionaire-based assessment while simultaneously keeping the researcher and
+participant burden low. Especially, the field of raw data accelerometry, the analysis of 
+the raw acceleration signals measured in g (1 g = 9.806 65 m s−2), has received great focus
+over the last years and is a rapidly advancing field. Many algorithms have been proposed
+the last years; Also by our lab. Simultaneously, openly available data to benchmark algorithms on 
+is scarce due to privacy concerns. Nevertheless, new algorithms can only be adopted in 
+research after rigorous external validation. 
 
-.. warning::
+While publishing code and, in the context of machine learning, trained models has become
+more common, this often does not automatically imply that the published code is easily 
+usable for validation. Effectively, often reimplementations are necessary, even though 
+they increase potential biases by incorrect implementation. For that reason, we developed 
+*paat* as a simple and easy to use package to facilitate replicating and validating of our 
+findings and prospectively to apply the algorithms in research. The package is structured 
+according to the respective applications (io, preprocessing, features, wear time, sleep, 
+estimation) and the methods easily applicable also in isolation. An overview over the 
+different submodules can be found in the :doc:`API Documentation <paat>`.
+
+However, *paat* has already been used in various studies. Syed et al. [1]_, for 
+instance, developed and used the general gt3x reading functionality and implemented 
+and used the NWT algorithm from Van Hees et al. [2]_ for a comparison study of 
+different NWT algorithms. Syed et al. [3]_ also used the functions to develop a new 
+non-wear time algorithm which is now included in *paat*. Weitz et al. [4]_ used 
+the package to load and process the acceleration data to investigate the effect of 
+accelerometer calibration on physical activity in general and MVPA in particular. 
+
+If you are using *paat* in research, feel free to cite it as
+
+    Weitz, M., Syed, S., & Horsch A. (2024). PAAT: Physical Activity Analysis 
+    Toolbox for analysis of hip-worn raw accelerometer data
+
+If you are using BibTex you may want to use this example BibTex entry::
+
+    @misc{weitz_paat_2024,
+          author       = {Marc Weitz and
+                          Shaheen Syed and
+                          Alexander Horsch},
+          title        = {PAAT: Physical Activity Analysis Toolbox for analysis
+                          of hip-worn raw accelerometer data},
+          year         = 2024,
+          url          = {https://github.com/Trybnetic/paat}
+    }
+
+This also helps us to keep this page up to date.
+
+
+----
+
+.. [1] Syed, S., Morseth, B., Hopstock, L. A., & Horsch, A. (2020). Evaluating the 
+        performance of raw and epoch non-wear algorithms using multiple accelerometers 
+        and electrocardiogram recordings. *Scientific Reports*, 10(1), 1–18. 
+        https://doi.org/10.1038/s41598-020-62821-2
+
+.. [2] Van Hees VT, Renström F, Wright A, Gradmark A, Catt M, et al. (2011) Estimation 
+        of Daily Energy Expenditure in Pregnant and Non-Pregnant Women Using a Wrist-Worn 
+        Tri-Axial Accelerometer. *PLOS ONE*, 6(7): e22922. 
+        https://doi.org/10.1371/journal.pone.0022922
+
+.. [3] Syed, S., Morseth, B., Hopstock, L. A., & Horsch, A. (2021). A novel algorithm to 
+        detect non-wear time from raw accelerometer data using deep convolutional neural 
+        networks. *Scientific Reports*, 11(1), 8832. 
+        https://doi.org/10.1038/s41598-021-87757-z
+
+.. [4] Weitz, M., Morseth, B., Hopstock, L. A., & Horsch, A. (2024). Influence of 
+        Accelerometer Calibration on the Estimation of Objectively Measured Physical 
+        Activity: The Tromsø Study. *Journal for the Measurement of Physical Behaviour*, 7(1).
+        https://doi.org/10.1123/jmpb.2023-0019
 
-    This page is currently under construction. Soon you find useful background info here as well as links to relevant papers.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -12,14 +12,15 @@
 #
 import os
 import sys
+import time
 sys.path.insert(0, os.path.abspath('.'))
 
 
 # -- Project information -----------------------------------------------------
 
 project = 'Physical Activity Analysis Toolbox'
-copyright = '2021 Marc Weitz & Shaheen Syed'
-author = 'Marc Weitz & Shaheen Syed'
+author = 'Marc Weitz, Shaheen Syed & Alexander Horsch'
+copyright = time.strftime('2022 - %Y ') + author
 
 
 # -- General configuration ---------------------------------------------------

diff --git a/docs/source/example_notebooks/analyze_data.ipynb b/docs/source/example_notebooks/analyze_data.ipynb
diff --git a/docs/source/example_notebooks/data/example_day_calibrated.csv.tar.gz b/docs/source/example_notebooks/data/example_day_calibrated.csv.tar.gz
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -47,11 +47,19 @@ examples and more information on the functions can be found in the documentation
     data.loc[:, "Time in Bed"] = paat.detect_time_in_bed_weitz2024(data, sample_freq)
 
     # Classify moderate-to-vigorous and sedentary behavior
-    data.loc[:, ["MVPA", "SB"]] = paat.calculate_pa_levels(data, sample_freq)
+    data.loc[:, ["MVPA", "SB"]] = paat.calculate_pa_levels(
+        data, 
+        sample_freq,
+        mvpa_cutpoint=.069, 
+        sb_cutpoint=.015
+    )
 
     # Merge the activity columns into one labelled column. columns indicates the
     # importance of the columns, later names are more important and will be kept
-    data.loc[:, "Activity"] = paat.create_activity_column(data, columns=["SB", "MVPA", "Time in Bed", "Non Wear Time"])
+    data.loc[:, "Activity"] = paat.create_activity_column(
+        data, 
+        columns=["SB", "MVPA", "Time in Bed", "Non Wear Time"]
+    )
 
     # Remove the other columns after merging
     data =  data[["X", "Y", "Z", "Activity"]]

diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst
diff --git a/paat/__init__.py b/paat/__init__.py
@@ -29,7 +29,7 @@
 
 # Expose API functions
 from .estimates import calculate_pa_levels, create_activity_column
-from .features import calculate_actigraph_counts, calculate_vector_magnitude, calculate_brond_counts
+from .features import calculate_actigraph_counts, calculate_vector_magnitude, calculate_brond_counts, calculate_enmo
 from .io import read_gt3x, read_metadata
 from .calibration import calibrate
 from .sleep import detect_time_in_bed_weitz2024

diff --git a/paat/calibration.py b/paat/calibration.py
@@ -9,11 +9,58 @@
 import pandas as pd
 
 
-def estimate_calibration_coefficents(data):
+def estimate_calibration_coefficents(acc):
+    """
+    .. warning::
+        This function is not implemented yet
+    
+    Estimates the calibration correction coefficients based on the method proposed 
+    by Van Hees et al. (2014)
+
+    References
+    ----------
+    
+    Van Hees, V. T., Fang, Z., Langford, J., Assah, F., Mohammad, A., da Silva, I. C. M., 
+    Trenell, M. I., White, T., Wareham, N. J., & Brage, S. (2014). Autocalibration of 
+    accelerometer data for free-living physical activity assessment using local gravity 
+    and temperature: An evaluation on four continents. *Journal of Applied Physiology*, 
+    117(7), 738–744. https://doi.org/10.1152/japplphysiol.00421.2014
+    
+    Parameters
+    ----------
+    acc : array_like
+        numpy array with acceleration data
+
+    Returns
+    -------
+    scale : array_like
+        numpy array with the scale factors
+    offset : array_like
+        numpy array with the offset factors
+    
+    """
     raise NotImplementedError("Autocalibration is not implemented yet. Please use GGIR to estimate the calibration coefficients.")
 
 
 def calibrate(acc, scale, offset):
+    """
+    Calibrates the acceleration data based on the `scale` and `offset` variables.
+    
+    Parameters
+    ----------
+    acc : array_like
+        numpy array with acceleration data
+    scale : array_like
+        numpy array with the scale factors
+    offset : array_like
+        numpy array with the offset factors
+    
+    Returns
+    -------
+    acc : array_like
+        numpy array with calibrated acceleration data
+
+    """
     columns = ["Y", "X", "Z"]
     index = acc.index.copy()
     acc = (scale * acc[columns].values) + offset

diff --git a/paat/estimates.py b/paat/estimates.py
@@ -12,35 +12,26 @@
 from . import features
 
 
-def calculate_pa_levels(data, sample_freq, mvpa_cutpoint=.069, sb_cutpoint=.015, interval="1s"):
+def calculate_pa_levels(data, sample_freq, mvpa_cutpoint, sb_cutpoint, interval="1s"):
     """
     Calculate moderate to vigourous physical activity (MVPA) and sedentary behavior
-    based on cutpoints (mvpa_cutpoint and sb_cutpoint). On default, this procedure
-    uses the algorithm and  values from Sanders et al. (2019). This means
+    based on cutpoints (mvpa_cutpoint and sb_cutpoint) on 1s resolution. The algorithm
+    works by
 
         1. The Euclidian norm minus one (ENMO) is calculated from the triaxial signal
         2. The ENMO is averaged over 1s epochs
-        3. These epochs are compared against the cutpoints MVPA = 69mg and SB = 15mg
-
-    References
-    ----------
-
-    George J. Sanders, Lynne M. Boddy, S. Andy Sparks, Whitney B. Curry, Brenda Roe,
-    Axel Kaehne & Stuart J. Fairclough (2019) Evaluation of wrist and hip sedentary
-    behaviour and moderate-to-vigorous physical activity raw acceleration cutpoints
-    in older adults, Journal of Sports Sciences, 37:11, 1270-1279,
-    DOI: 10.1080/02640414.2018.1555904
+        3. These epochs are compared against the mvpa_cutpoint and the sb_cutpoint
 
     Parameters
     ----------
     data : DataFrame
         a DataFrame containg the raw acceleration data
     sample_freq : int
         the sampling frequency in which the data was recorded
-    mvpa_cutpoint : float (optional)
+    mvpa_cutpoint : float
         a float indicating the cutpoint between light physical activity and
         moderate-to-vigourous activity
-    sb_cutpoint : float (optional)
+    sb_cutpoint : float
         a float indicating the cutpoint between light physical activity and
         sedentary behavior
     interval : str (optional)

diff --git a/paat/features.py b/paat/features.py
@@ -82,16 +82,19 @@ def calculate_enmo(data, dtype=np.float32):
     """
     Calculate the Euclidean norm minus one from raw acceleration data.
     This function is a wrapper of `calculate_vector_magnitude`.
+    
     Parameters
     ----------
     data : array_like
         numpy array with acceleration data
     dtype : np.dtype (optional)
         set the data type of the return array. Standard float 32, but can be set to better precision
+    
     Returns
     -------
     vector_magnitude : np.array (acceleration values, 1)(np.float)
-       numpy array with vector magnitude of the acceleration
+       numpy array with the Eucledian Norm Minus One (ENMO) of the acceleration
+
     """
     if isinstance(data, pd.DataFrame):
         data = data[["Y", "X", "Z"]].values

diff --git a/paat/sleep.py b/paat/sleep.py
@@ -8,6 +8,8 @@
 """
 import os
 
+os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
+
 import pandas as pd
 import numpy as np
 import tensorflow as tf
@@ -66,7 +68,7 @@ def detect_time_in_bed_weitz2024(data, sample_freq, resampled_frequency="1min",
         model_path = os.path.join(os.path.pardir, os.path.dirname(__file__), 'models', 'TIB_model.h5')
         model = models.load_model(model_path)
 
-    predictions = (model.predict(X[np.newaxis], verbose=0).squeeze() > .5)
+    predictions = (model.predict(X[np.newaxis], verbose=0).squeeze() >= .5)
 
     seconds = pd.Timedelta(resampled_frequency).seconds
     predictions = np.repeat(predictions, seconds * sample_freq)