From 64610b39d095cd31fd550deffb86cd90f4d02b13 Mon Sep 17 00:00:00 2001
From: ifeoluwaale <ifeoluwaale@gmail.com>
Date: Thu, 21 Mar 2024 19:25:20 -0700
Subject: [PATCH] Updates to user and developer documentation

---
 README.md                          |  46 ++++-
 dev-instructions.md                | 315 ++++++++++++++++++++++++++++-
 tests/integs/test_integ_autoreg.py |   0
 3 files changed, 349 insertions(+), 12 deletions(-)
 create mode 100644 tests/integs/test_integ_autoreg.py
diff --git a/README.md b/README.md
index ab3de7b..945e651 100644
--- a/README.md
+++ b/README.md
@@ -1,5 +1,5 @@
  <p align="center">
- <img src="./docs/assets/dplpy.png" width="175"> 
+ <img src="https://github.com/OpenDendro/dplPy/blob/main/docs/assets/dplpy.png?raw=true" width="175"> 
 
 # dplPy -the Dendrochronology Program Library in Python
 The Dendrochronology Program Library (DPL) in Python has its roots in both the [original FORTRAN program](https://www.ltrr.arizona.edu/software.html) created by the [legendary Richard Holmes](https://arizona.aws.openrepository.com/handle/10150/262569?show=full) and the subsequent R Project package by Andy Bunn, [dplR](https://github.com/OpenDendro/dplR).  Our aim is to provide researchers working with tree-ring data the necessary tools in open-source environments, promoting open science practices, enhancing rigor and transparency in dendrochronology, and eventually allowing reproducible research entirely in a single programming language.
@@ -24,15 +24,17 @@ The Dendrochronology Program Library (DPL) in Python has its roots in both the [
     - [Windows](#windows)
   - [Functionalities and Usage](#functionalities-and-usage)
     - [Loading data using  `readers`](#loading-data-using--readers)
+    - [Loading data from online sources using `readers_url`](#loading-data-from-online-sources-using-readers_url)
     - [Data Summary from `summary`](#data-summary-from-summary)
     - [Data Stastics from `stats`](#data-stastics-from-stats)
     - [Data Report from `report`](#data-report-from-report)
-    - [Plotting](#plotting)
+    - [Plotting raw data with `plot`](#plotting-raw-data-with-plot)
     - [Detrending using `detrend`](#detrending-using-detrend)
     - [Autoregressive (AR) modeling](#autoregressive-ar-modeling)
     - [Build a chronology with `chron`](#build-a-chronology-with-chron)
     - [Build a variance stabilized chronology with `chron_stabilized`](#build-a-variance-stabilized-chronology-with-chron_stabilized)
     - [Crossdate with `xdate`](#crossdate-with-xdate)
+    - [Output data to files using `writers`](#output-data-to-files-using-writers)
 
 ---
 
@@ -168,6 +170,18 @@ This will load the package and its functions, allowing them to be accessed with
     >>> data = dpl.readers("/path/to/file.rwl", header=True)
     ```
 
+### Loading data from online sources using  `readers_url`
+**Note: This function is still in development and has only been tested so far with `rwl` raw data files from the [NCEI website](https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/)**
+
+- Description: reads `rwl` formatted data directly from online sources.
+- Options: 
+    - `header`: rwl input files often have a header present; Default is `False`, use `True` if input has a header.
+- Usage examples:
+    ```
+    >>> data = dpl.readers_url("http://link/to/file.rwl")
+    >>> data = dpl.readers_url("http://link/to/file.rwl", header=True)
+    ```
+
 ### Data Summary from `summary`
 
 - Description: generates a summary of each series recorded in `rwl`  and `csv` format files
@@ -198,7 +212,7 @@ This will load the package and its functions, allowing them to be accessed with
     >>> dpl.report(data)
     ```
 
-### Plotting
+### Plotting raw data with `plot`
 
 - Description: generates plots of tree ring with data from dataframes. Currently capable of generating `line`, `spag` (spaghetti) and `seg` (segment, default) plots.
 - Options:
@@ -309,5 +323,29 @@ This will load the package and its functions, allowing them to be accessed with
     - `show_flags`: default `True`, determines whether to show flags in the function output to the console.
 - Usage examples:
     ```
-    
+    >>> ca533_rwi = dpl.detrend(ca533, plot=False)
+
+    # Crossdating of detrended data with default args
+    >>> dpl.xdate(ca533_rwi)
+
+    # Crossdating with Pearson correlation and show flags 
+    # (other options set to defaults when not specified).
+    >>> dpl.xdate(ca533_rwi, corr="Pearson" show_flags=True)
+    ```
+
+### Output data to files using  `writers`
+
+- Description: writes data from dataframe to supported file types (`csv`, `rwl`, `crn`, `txt`).
+- Required parameters: 
+    - `data`: dataframe with ring widths (presumably one read from `readers` or `readers_url`)
+    - `label`: name (can include file path) to give to the created file. **should not include file extension**
+    - `format`: extension for file to be created. Can be `'csv'`, `'rwl'`, `'crn'` or `'txt'`.
+
+- Usage examples:
     ```
+    # Write data to file_name.csv in current working directory.
+    >>> dpl.writers(data, "file_name", "csv")
+
+    # Write data to file_name.csv in ./path/to/ directory.
+    >>> dpl.writers(data, "./path/to/file_name", "csv")
+    ```
\ No newline at end of file
diff --git a/dev-instructions.md b/dev-instructions.md
index e158ba9..ead5120 100644
--- a/dev-instructions.md
+++ b/dev-instructions.md
@@ -2,8 +2,13 @@
 
 Welcome to the dplPy developer manual. We welcome all code contributions, bug reports, bug fixes, documentation improvements, and suggestions.
 
-## Environment setup
+## Index
+- [Environment setup](#environment-setup)
+- [Making changes and submitting PRs](#making-changes-and-submitting-a-pull-request)
+- [API Reference]()
+
 
+## Environment setup
 ### 1. GitHub setup
 
 #### 1.1 Create dplPy fork in github
@@ -39,7 +44,7 @@ git checkout -b {feature_name}
 
 ### 2. Conda environment
 
-The packages required to run dplPy are all specified in environment.yml. 
+The packages required to run dplPy are all specified in environment.yml, which can be used to install them in Conda ([Anaconda](https://docs.anaconda.com/anaconda/install/index.html) or [Miniconda](https://docs.conda.io/projects/continuumio-conda/en/latest/user-guide/install/index.html)) or [Mamba](https://mamba.readthedocs.io/en/latest/installation.html) environments.
 
 #### 2.1\. Create your environment with the required packages installed.
 
@@ -58,7 +63,7 @@ $ mamba env create -f environment.yml
 If prompted for permission to install requred packages, select y.
 
 #### 2.2\. Activate your environment. 
-You will need to have the conda environment activated anytime you want to test code from the package.
+You will need to have the conda environment activated anytime you want to run or test code from the package.
 
 ```
 conda activate dplpy
@@ -118,10 +123,7 @@ Go to the testing tab (on the left side of the VSCode display). With your enviro
 
 If `.vscode/settings.json` has not been created, create it and add the lines shown above.
 
-Go back to the testing tab and verify that the dplpy unit tests are showing. They should look like this:
-
-![Screenshot of tests tab in VSCode](image.png)
-
+Go back to the testing tab and verify that the dplpy unit tests are showing.
 
 Run the tests by clicking the play button to the right of `tests`.
 
@@ -179,4 +181,301 @@ Pull requests allow you to view a side-by-side diff comparison of all changed fi
 
 If you are satisfied with your changes, give the PR a descriptive title, and specify in the description what changes were made and what (if any) issues were addressed. Then, submit the pull request.
 
-Your request will be reviewed by the repository maintainers.
\ No newline at end of file
+Your request will be reviewed by the repository maintainers.
+
+## API Reference
+
+Here is a list of functions (in alphabetical order) with descriptions:
+
+| Function | Description |
+| --- | --- |
+| [`ar_func`](#ar_funcdata-max_lag5-source) | Fits series or dataframe to autoregressive (AR) models and performs other operations on data with best model fit. |
+| [`autoreg`](#autoregdata-max_lag5-source) | Fits series to autoregressive (AR) models and returns parameters of best model fit. |
+| [`chron`](#chronrwi_data-biweighttrue-prewhitenfalse-plottrue-source) | Creates a mean value chronology for a dataset, typically the ring width indices of a detrended series |
+| [`detrend`](#detrend) | Detrends a given series or data frame, first by fitting data to curve(s), with spline(s) as the default, and then by calculating residuals or differences compared to the original data. |
+| [`help`](#help) | Displays help (alpha). |
+| [`plot`](#plot) | Generates line, spaghetti or segment plots.|
+| [`rbar`](#rbar) | Finds best interval of overlapping series over a  period of years, and calculating rbar constant for a dataset over period of overlap. |
+| [`readers`](#readers) | Reads data from supported file types (*.CSV and *.RWL) and stores them in dataframe. |
+| [`readme`](#readme) | Goes to this website. |
+| [`report`](#report) | Generates a report about absent rings in the data set. |
+| [`series_corr`](#series_corr) |  Crossdating function that focuses on the comparison of one series to the master chronology. |
+| [`stats`](#stats) | Generates summary statistics for RWL and CSV format files. |
+| [`summary`](#summary) | Generates a summary for RWL and CSV format files. |
+| [`xdate`](#xdate) | Crossdating function for dplPy loaded datasets. |
+
+
+### `ar_func(data, max_lag=5)` [[source]](https://github.com/OpenDendro/dplPy/blob/480973dc5f09f748271fb62a5ebd8ff5c88ac2dd/dplpy/autoreg.py#L36)
+
+Fits a given data to an the best-fit autoregressive model, then returns the residuals of AR fit relative to the original data + the mean of the original data.
+- **Required Parameters**:
+    - **data**  :   ***pandas.DataFrame or pandas.Series***, a pandas dataframe imported from dpl.readers() or a series extracted from such a dataframe.
+- **Optional Parameters**:
+    - **lag   :   _int_ default 5**, max lag to consider when selecting the best-fit AR model.
+- **Returns:**
+    -  **pandas.DataFrame or pandas.Series**, dataframe or series of AR-modeled data, depending on which was given as input.
+- **Usage Examples:**
+    ```
+    # ar_func with series
+    >>> dpl.ar_func(ca533["CAM191"], 10)
+    Year
+    1190    0.711307
+    1191   -0.232047
+    1192    0.521210
+    1193    0.575975
+    1194    0.901084
+            ...   
+    1966    0.296554
+    1967    0.384609
+    1968    0.397742
+    1969    0.427618
+    1970    0.383847
+    Name: CAM191, Length: 781, dtype: float64
+
+    # ar_func with dataframe
+    >>> dpl.ar_func(ca533, 10)
+            CAM011    CAM021    CAM031    CAM032    CAM041    CAM042    CAM051    CAM061    CAM062  ...  CAM152  CAM161  CAM162  CAM171  CAM172  CAM181  CAM191  CAM201  CAM211
+    Year                                                                                            ...                                                                        
+    626        NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    627        NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    628        NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    629        NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    630        NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN       NaN  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    ...        ...       ...       ...       ...       ...       ...       ...       ...       ...  ...     ...     ...     ...     ...     ...     ...     ...     ...     ...
+    1979  0.424423  0.404787  0.142900  0.378733  0.640022  0.369773  0.369770  0.347996  0.535881  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    1980  0.486215  0.614051  0.658424  0.408298  0.898555  0.568861  0.440974  0.693782  0.661847  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    1981  0.498586  0.505126  0.436690  0.260786  0.419491  0.438934  0.345517  0.544592  0.382856  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    1982  0.455773  0.414212  0.485516  0.448526  0.792929  0.443559  0.261443  0.560291  0.510274  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+    1983  0.666578  0.520679  0.223995  0.277267  0.755711  0.456165  0.252873  0.583766  0.320921  ...     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN     NaN
+
+    [1358 rows x 34 columns]
+    ```
+
+
+### `autoreg(data, max_lag=5)` [[source]](https://github.com/OpenDendro/dplPy/blob/480973dc5f09f748271fb62a5ebd8ff5c88ac2dd/dplpy/autoreg.py#L103)
+
+Selects the best AR model with a specified maximum order for the given data, and returns the parameters for the model. The best model is selected based on the AIC value.
+- **Required Parameters**:
+    - **data**  :   ***pandas.Series***, a pandas series extracted from a pandas dataframe containing tree ring widths (presumably imported from [`readers`]())
+- **Optional Parameters**:
+    - **lag   :   _int_ default 5**, max lag to consider when selecting the best-fit AR model.
+- **Returns:**
+    - **array** containing the parameters of best-fit AR model in order.
+- **Usage Examples:**
+    ```
+    >>> dpl.autoreg(ca533["CAM191"], 10)
+    const         0.022210
+    CAM191.L1     0.503373
+    CAM191.L2     0.087230
+    CAM191.L3     0.143716
+    CAM191.L4     0.020119
+    CAM191.L5    -0.027769
+    CAM191.L6    -0.010029
+    CAM191.L7     0.001373
+    CAM191.L8     0.025588
+    CAM191.L9     0.042340
+    CAM191.L10    0.136916
+    dtype: float64
+    ```
+
+### `chron(rwi_data, biweight=True, prewhiten=False, plot=True)` [[source]](https://github.com/OpenDendro/dplPy/blob/480973dc5f09f748271fb62a5ebd8ff5c88ac2dd/dplpy/chron.py#L44)
+
+Creates a mean value chronology for a dataset, typically the ring width indices of **a detrended series**.
+
+- **Required Parameters**:
+    - **rwi_data**  :   ***pandas.Dataframe***, a pandas dataframe containing (expected to be already detrended) tree ring widths.
+- **Optional Parameters**:
+    - **biweight   :   _int_ default True**; when `True`, means will be calculated using Tukey's biweight robust mean.
+    - **prewhiten   :   _int_ default False**; when `True`, data is prewhitened by fitting to an AR model.
+    - **plot   :   _int_ default True**; when `True`, results are plotted.
+- **Returns:**
+    - **pandas.Dataframe** of years, mean RWIs and sample depths for each year.
+- **Usage Examples:**
+    ```
+    >>> dpl.chron(ca533, plot=False)
+          Mean RWI  Sample depth
+    Year                        
+    626   0.170000             1
+    627   0.130000             1
+    628   0.140000             1
+    629   0.190000             1
+    630   0.220000             1
+    ...        ...           ...
+    1979  0.510581            21
+    1980  0.722784            21
+    1981  0.568495            21
+    1982  0.674211            21
+    1983  0.638166            21
+
+    [1358 rows x 2 columns]
+    ```
+
+
+
+
+
+
+### Loading data using  `readers`
+
+- Description: reads data from supported file types (`csv` and `rwl`) and stores them in a dataframe.
+- Options: 
+    - `header`: rwl input files often have a header present; Default is `False`, use `True` if input has a header.
+- Usage examples:
+    ```
+    >>> data = dpl.readers("/path/to/file.csv")
+    # or
+    >>> data = dpl.readers("/path/to/file.rwl", header=True)
+    ```
+
+### Loading data from online sources using  `readers_url`
+**Note: This function is still in development and has only been tested so far with `rwl` raw data files from the [NCEI website](https://www.ncei.noaa.gov/pub/data/paleo/treering/measurements/)**
+
+- Description: reads `rwl` formatted data directly from online sources.
+- Options: 
+    - `header`: rwl input files often have a header present; Default is `False`, use `True` if input has a header.
+- Usage examples:
+    ```
+    >>> data = dpl.readers_url("http://link/to/file.rwl")
+    >>> data = dpl.readers_url("http://link/to/file.rwl", header=True)
+    ```
+
+### Data Summary from `summary`
+
+- Description: generates a summary of each series recorded in `rwl`  and `csv` format files
+- Usage examples:
+    ```
+    >>> dpl.summary("/path/to/file.rwl")
+    # or
+    >>> dpl.summary(data)
+    ```
+
+### Data Stastics from `stats`
+
+- Description: generates summary statistics for `rwl`  and `csv` format files
+- Usage Example:
+    ```
+    >>> dpl.stats("/path/to/file.rwl")
+    # or
+    >>> dpl.stats(data)
+    ```
+
+### Data Report from `report`
+
+- Description: generates a report about ring measurements and absent rings in the data set
+- Usage Example:
+    ```
+    >>> dpl.report("/path/to/file.rwl")
+    # or
+    >>> dpl.report(data)
+    ```
+
+### Plotting raw data with `plot`
+
+- Description: generates plots of tree ring with data from dataframes. Currently capable of generating `line`, `spag` (spaghetti) and `seg` (segment, default) plots.
+- Options:
+    - `type="line"`: creates a line plot (default)
+    - `type="spag"`: creates a spaghetti plot
+    - `type="seg"`: creates a segment plot
+- Usage Example:
+    ```
+    >>> dpl.report("/path/to/file.rwl")
+    # or 
+    >>> dpl.plot(data)
+
+    # User is able to select specific series of interests.
+    # In the example below, the user selects SERIES_1, SERIES_2, SERIES_3 
+    # from the "data" dataset and generates a spaghetti plot
+    >>> dpl.plot(data[[SERIES_1, SERIES_2, SERIES_3]], type="spag")
+    ```
+
+### Detrending using `detrend`
+ 
+- Description: Detrends a given series or data frame, first by fitting data to curve(s), and then by calculating residuals or differences compared to the original data.
+- Options:
+    - `fit="spline"`: default detrending method.
+    - `fit="ModNegEx"`: detrending using negative exponent method.
+    - `fit="Hugershoff"`: detrending using the Hugenshoff method.
+    - `fit="linear"`: detrending using the linear method.
+    - `fit="horizontal"`: detrending using the horizontal method.
+    - `method="residual"`: calculates residuals vs original data (default).
+    - `method="difference"`: calculates differences vs original data.
+    - `plot=True|False`: whether or not to plot results, default is `True`.
+- Usage Example:
+    ```
+    # detrend with default options
+    >>> dpl.detrend(data)
+    
+    # specify fit to hugershoff curve and detrend with difference
+    >>> dpl.detrend(data, fit="Hugershoff", method="difference")
+
+    # detrend only SERIES_1, SERIES_2 and SERIES_3
+    >>> dpl.detrend(data[[SERIES_1, SERIES_2, SERIES_3]], fit="Hugershoff", method="difference")
+    ```
+
+
+
+
+
+
+### Build a variance stabilized chronology with `chron_stabilized`
+
+- Description: Builds a variance stabilized mean-value chronology for a dataset of **detrended** ring width indices, by multiplying the chronology with the square root of the effective independent sample size, $ Neff $.
+
+    Note: where n(t) is the number of series at time t, and rbar is the running interseries correlation, 
+
+    $$ Neff = { n(t) \over 1+(n(t)-1)rbar(t) } $$
+
+- Options:
+    - `win_length`: an integer for specifying the window lengths where interseries correlations will be calculated (default `50`). Should not be greater than the number of years in the dataset, recommended to be between 30% and 50% of the number of years.
+    - `min_seg_ratio`: the minimum ratio of non-NA values to the window length for a series to be considered in an Neff calculation (default `0.33`).
+    - `biweight`: boolean indicating whether or not to use Tukey's bi-weight robust mean when calculating the mean-value chronology; default `True`.
+    - `running_rbar`: boolean indicating whether or not to return the running interseries correlations as part of chronology output; default `False`.
+- Usage Example:
+    ```
+    # Detrend data first!
+    >>> rwi_data = dpl.detrend(data)
+
+    # Perform chronology with default args
+    >>> dpl.chron_stabilized(rwi_data)
+
+    # Specify win_length, min_seg_ratio and running_rbar
+    >>> dpl.chron_stabilized(rwi_data, win_length=60, min_seg_ratio=0.5, running_rbar=True)
+    ```
+
+### Crossdate with `xdate`
+- Description: This function calculates correlation serially between each tree-ring series and a master chronology built from all the other series in the dataset (leave-one-out principle).
+- Options:
+    - `prewhiten`: default `True`, determines whether or not to prewhiten series using AR modeling
+    - `corr`: default `'Spearman'`, the type of correlation to use. Can be `'Pearson'` or `'Spearman'`.
+    - `slide_period`: default `50`, the number of years to compare to the master chronology at a time.
+    - `bin_floor`: default `100`, determines the minimum bin year. The minimum bin year is calculated as $ \lceil (min\_yr/bin\_floor)\rceil*bin.floor $ where `min_yr` is the first year in the dataset.
+    - `p_val`: default `0.05`, determines the critical value below which interseries correlations are flagged.
+    - `show_flags`: default `True`, determines whether to show flags in the function output to the console.
+- Usage examples:
+    ```
+    >>> ca533_rwi = dpl.detrend(ca533, plot=False)
+
+    # Crossdating of detrended data with default args
+    >>> dpl.xdate(ca533_rwi)
+
+    # Crossdating with Pearson correlation and show flags 
+    # (other options set to defaults when not specified).
+    >>> dpl.xdate(ca533_rwi, corr="Pearson" show_flags=True)
+    ```
+
+### Output data to files using  `writers`
+
+- Description: writes data from dataframe to supported file types (`csv`, `rwl`, `crn`, `txt`).
+- Required parameters: 
+    - `data`: dataframe with ring widths (presumably one read from `readers` or `readers_url`)
+    - `label`: name (can include file path) to give to the created file. **should not include file extension**
+    - `format`: extension for file to be created. Can be `'csv'`, `'rwl'`, `'crn'` or `'txt'`.
+
+- Usage examples:
+    ```
+    # Write data to file_name.csv in current working directory.
+    >>> dpl.writers(data, "file_name", "csv")
+
+    # Write data to file_name.csv in ./path/to/ directory.
+    >>> dpl.writers(data, "./path/to/file_name", "csv")
+    ```
\ No newline at end of file
diff --git a/tests/integs/test_integ_autoreg.py b/tests/integs/test_integ_autoreg.py
new file mode 100644
index 0000000..e69de29