Skip to content

Commit

Permalink
Dev ife (#51)
Browse files Browse the repository at this point in the history
* xdate works for overall series correlation

* Added code for creating bins and dividing series into segments

* Cleaning up and commenting related to xdate

* series_corr works but is inefficient

* WIP changes

* Added comments, updated working jupyter notebook

* Changes since start of fall semester

* variance stabiliization produces accurate values

* Unit tests for readers, summary, stats and tbrm

* Added unit tests for detrend and chron

* Added tests for chron_stabilized, series_corr and writers

---------

Co-authored-by: Ifeoluwa Ale <ifeoluwaale@Cyverses-MacBook-Pro.local>
Co-authored-by: cosimichele <nismichele@gmail.com>
  • Loading branch information
3 people authored Nov 3, 2023
1 parent 8f29cbd commit 8394f7c
Show file tree
Hide file tree
Showing 33 changed files with 4,551 additions and 3,836 deletions.
7 changes: 7 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Scripts for testing
src/test_*.py
src/*.txt
src/misc.py

# IDE stuff
.DS_Store
.vscode/
tests/data/.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
Expand Down
106 changes: 106 additions & 0 deletions dev-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# dplPy Developer Instructions (in progress)

Welcome to the dplPy developer manual.

## Environment setup
To contribute to dplpy, you will need to set up some tools

### 1. GitHub setup

#### 1.1 Create dplPy fork in github

You will need your own copy of dplpy to work on the code. Go to the dplPy github page and click the fork button. Make sure the option to copy only the main branch is unchecked.


#### 1.2 Create local repository
In your local terminal, clone the fork to your computer using the commands shown below. Replace {your-user} with your github username.
```
$ git clone https://github.com/{your-user}/OpenDendro/dplPy.git dplpy-{your-user}
$ cd dplpy-{your-user}
git remote add upstream https://github.com/OpenDendro/dplPy.git
git fetch upstream
```

This creates a github repository in dplPy-{your-user} on your computer and connects the repository to your fork, which is now connected to the main dplPy repository.

#### 1.3 Create feature branch

TBC


### 2. Conda environment

The packages required to run dplPy are all specified in environment.yml.

#### 2.1\. Create your environment with the required packages installed.

If you're using conda, run

```
$ conda env create -f environment.yml
```

If you're using mamba, run

```
$ mamba env create -f environment.yml
```

If prompted for permission to install requred packages, select y.

#### 2.2\. Activate your environment.
You will need to have the conda environment activated anytime you want to test code from the package.

```
conda activate dplpy
```

After running this command, you should see (dplpy) on the left of each new line in the terminal.

#### 2.3\. Run unit and integration tests to ensure that installation was successful.
TBA: Instructions for running tests

### 3. IDE setup

We recommend using VSCode for development. The following instructions show how to set up VSCode to recognize the conda environment and debug tests.

#### 3.1\. Open the dplpy folder in VScode
In VSCode, open the folder containing your local dplpy repository. If you followed the instructions above, this should be a folder named `dplpy-{your-github-username}`. Then, open the file `src/dplpy.py`.

#### 3.2\. Change the python interpreter to use the conda environment's interpreter
In the bottom corner of your IDE display, select the language interpreter.

Choose the interpreter `Python 3.x ('dplpy')`, with a path that ends with `/envs/dplpy/python`.

Now you should be able to run any python files within the currently open folder with the run button in VSCode, instead of running them through the terminal.

Note: If the terminal is opened after the interpreter has been set to use the conda environment, conda activate dplpy will automatically be run and does not need to be run again.

#### 3.3\. Set up unit testing tools

Go to the testing tab (on the left side of the VSCode display). With your environment set. If the tests are not automatically discovered, open `.vscode/settings.json` and add the following lines inside the curly braces, so that your file looks like this:

```
{
// any pre-existing configurations (DO NOT ADD THIS, THIS REPRESENTS ANYTHING ALREADY IN THE FILE)
"python.testing.pytestArgs": [
"./src/unittests"
],
"python.testing.unittestEnabled": false,
"python.testing.pytestEnabled": true
}
```

If `.vscode/settings.json` has not been created, create it and add the lines shown above.

Go back to the testing tab and verify that the dplpy unit tests are showing. They should look like this:

TBA: Image


Run the tests by clicking the play button on src.


## Overview of dplPy functions

3 changes: 2 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,5 @@ dependencies:
- pip:
- csaps
- jupyterlab
- notebook
- notebook
- pytest
4 changes: 3 additions & 1 deletion src/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,6 @@

__author__ = "Tyson Lee Swetnam"
__email__ = "tswetnam@arizona.edu"
__version__ = "0.1"
__version__ = "0.1"

from src import dplpy
29 changes: 18 additions & 11 deletions src/autoreg.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,23 +44,25 @@

def ar_func(data, max_lag=5):
if isinstance(data, pd.DataFrame):
res = {}
start_df = pd.DataFrame(index=pd.Index(data.index))
to_concat = [start_df]
for column in data.columns:
res[column] = ar_func_series(data[column], max_lag).tolist()
to_concat.append(ar_func_series(data[column], max_lag))
res = pd.concat(to_concat, axis=1)
return res
elif isinstance(data, pd.Series):
res = ar_func_series(data, max_lag)
return res
else:
return TypeError("argument should be either pandas dataframe or pandas series.")
raise TypeError("Data argument should be either pandas dataframe or pandas series.")

# This function returns residuals plus mean of the best fit AR
# model of the data
def ar_func_series(data, max_lag):
nullremoved_data = data.dropna()
pars = autoreg(nullremoved_data, max_lag)

y = nullremoved_data.to_numpy()
y = nullremoved_data

yi = fitted_values(y, pars)

Expand All @@ -70,13 +72,18 @@ def ar_func_series(data, max_lag):

# Add mean to the residuals
for i in range(len(res)):
res[i] += mean
res.iloc[i] += mean

return res

# This method selects the best AR model with a specified maximum order
# The best model is selected based on AIC value
def autoreg(data, max_lag=5):
def autoreg(data: pd.Series, max_lag=5):
# validate data?
if not isinstance(data, pd.Series):
raise TypeError("Data argument should be pandas series. Received " + str(type(data)) + " instead.")

# Need to change this to only ignore specific warnings instead of all
with warnings.catch_warnings():
warnings.filterwarnings("ignore")
ar_data = ar_select_order(data.dropna(), max_lag, ic='aic', old_names=False)
Expand All @@ -86,13 +93,13 @@ def autoreg(data, max_lag=5):
# This function calculates the in-sample predicted values of a series,
# given an array containing the original data and the parameters for
# the AR model
def fitted_values(data_array, params):
mean = np.mean(data_array)
def fitted_values(data_series, params):
mean = np.mean(data_series)
results = []

for i in range((len(params)-1), len(data_array)):
pred = params[0]
for i in range((len(params)-1), len(data_series)):
pred = params.iloc[0]
for j in range(1, len(params)):
pred += (params[j] * data_array[i-j])
pred += (params.iloc[j] * data_series.iloc[i-j])
results.append(pred)
return np.asarray(results)
5 changes: 4 additions & 1 deletion src/chron.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,10 @@

# Main function for creating chronology of series. Formats input, prewhitens if necessary
# and produces output mean value chronology in a dataframe.
def chron(rwi_data, biweight=True, prewhiten=False, plot=True):
def chron(rwi_data: pd.DataFrame, biweight=True, prewhiten=False, plot=True):
if not isinstance(rwi_data, pd.DataFrame):
raise TypeError("Expected pandas dataframe as input, got " + str(type(rwi_data)) + " instead")

chron_data = {}
for series in rwi_data:
series_data = rwi_data[series].dropna()
Expand Down
90 changes: 90 additions & 0 deletions src/chron_stabilized.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
from rbar import get_running_rbar, mean_series_intercorrelation
from chron import chron
import numpy as np
import pandas as pd
import warnings


def chron_stabilized(rwi_data: pd.DataFrame, win_length=50, min_seg_ratio=0.33, biweight=True, running_rbar=False):
if not isinstance(rwi_data, pd.DataFrame):
raise TypeError("Expected data input to be a pandas dataframe, not " + str(type(rwi_data)) + ".")


num_years = rwi_data.shape[0]

if win_length > num_years:
raise ValueError("Window length should not be greater than the number of rows in the dataset")

if min_seg_ratio <= 0 or min_seg_ratio > 1:
raise ValueError("min_seg_ratio cannot be <= 0 or > 1")

if win_length < 0.3*num_years or win_length >= 0.5*num_years:
warnings.warn("We recommend using a window length greater than 30%% but less than 50%% of the chronology length\n")

print("Generating variance stabilized chronology...\n")

# give rbar function a range of years (window length) to calculate rbar for
# calculate rbar for that window, using either osborn's or frank's or 67spline
# get rbar for each relevant segment of the dataframe


mean_val = rwi_data.mean().mean()

zero_mean_data = rwi_data - mean_val

rbar_array = np.zeros(zero_mean_data.shape[0])

if win_length % 2 == 0:
target = (win_length)/2
else:
target = (win_length-1)/2

for i in range(num_years-win_length + 1):
data_segment = zero_mean_data[i:i + win_length]
if data_segment.shape[0] < win_length:
continue
target_index = int(i + target)
rbar_array[target_index] = get_running_rbar(data_segment, min_seg_ratio)

rbar_array = pad_rbar_array(rbar_array)

reg_chron = chron(zero_mean_data, biweight=biweight, plot=False)

mean_rwis = reg_chron["Mean RWI"].to_numpy()
samp_deps = reg_chron["Sample depth"].to_numpy()
denom = np.multiply(samp_deps-1, rbar_array) + 1

n_eff = np.minimum(np.divide(samp_deps, denom), samp_deps)
rbar_const = mean_series_intercorrelation(zero_mean_data, "pearson", min_seg_ratio)
stabilized_means = np.multiply(mean_rwis, np.sqrt(n_eff * rbar_const))

if running_rbar:
stabilized_chron = pd.DataFrame(data={"Adjusted CRN": stabilized_means + mean_val, "Running rbar": rbar_array, "Sample depth": samp_deps}, index=reg_chron.index)
else:
stabilized_chron = pd.DataFrame(data={"Adjusted CRN": stabilized_means + mean_val, "Sample depth": samp_deps}, index=reg_chron.index)

print("SUCCESS!\n")
return stabilized_chron

def pad_rbar_array(rbar_array):
# double check that rbar cannot be 0
first = 0
first_valid = 0
for val in rbar_array:
if val != 0 and not np.isnan(val):
first = val
break
first_valid += 1

last = 0
last_valid = len(rbar_array) - 1
for val in np.flip(rbar_array):
if val != 0 and not np.isnan(val):
last = val
break
last_valid -= 1

rbar_array[:first_valid] = np.full(first_valid, first) #should be np.full(first_valid, 1)
rbar_array[last_valid:] = np.full(len(rbar_array) - last_valid, last) #should be np.full(len(rbar_array) - last_valid, last)

return rbar_array
19 changes: 13 additions & 6 deletions src/detrend.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,17 +40,26 @@
from autoreg import ar_func
import curvefit

def detrend(data, fit="spline", method="residual", plot=True, period=None):
def detrend(data: pd.DataFrame | pd.Series, fit="spline", method="residual", plot=True, period=None):
if isinstance(data, pd.DataFrame):
<<<<<<< HEAD
res = pd.DataFrame(index=pd.Index(data.index))
to_add = [res]
for column in data.columns:
to_add.append(detrend_series(data[column], column, fit, method, plot, period=None))
output_df = pd.concat(to_add, axis=1)
return output_df.rename_axis(data.index.name)
=======
res = pd.DataFrame(index=data.index)
to_add = [res]
for column in data.columns:
to_add.append(detrend_series(data[column], column, fit, method, plot, period=None))
return pd.concat(to_add, axis=1)
>>>>>>> main
elif isinstance(data, pd.Series):
return detrend_series(data, data.name, fit, method, plot)
else:
return TypeError("argument should be either pandas dataframe or pandas series.")
raise TypeError("argument should be either pandas dataframe or pandas series.")

# Takes a series as input and by default fits it to a spline, then
# detrends it by calculating residuals
Expand All @@ -74,17 +83,15 @@ def detrend_series(data, series_name, fit, method, plot, period=None):
yi = curvefit.horizontal(x, y)
else:
# give error message for unsupported curve fit
print()
return ValueError("unsupported keyword for curve-fit type. See documentation for more info.")
raise ValueError("unsupported keyword for curve-fit type. See documentation for more info.")

if method == "residual":
detrended_data = residual(y, yi)
elif method == "difference":
detrended_data = difference(y, yi)
else:
# give error message for unsupported detrending method
print()
return ValueError("unsupported keyword for detrending method. See documentation for more info.")
raise ValueError("unsupported keyword for detrending method. See documentation for more info.")

if plot:
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(7,3))
Expand Down
19 changes: 19 additions & 0 deletions src/dplpy.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,6 +185,18 @@ def autoreg_from_parser(args):

def xdate_from_parser(args):
xdate(input=args.input)
<<<<<<< HEAD

def series_corr_from_parser(args):
series_corr(input=args.input)

def chron_stabilized_from_parser(args):
chron_stabilized(input=args.input)

def write_from_parser(args):
write(input=args.input)
=======
>>>>>>> main

def series_corr_from_parser(args):
series_corr(input=args.input)
Expand All @@ -209,9 +221,16 @@ def rbar_from_parser(args):
from detrend import detrend
from autoreg import ar_func, autoreg
from chron import chron
<<<<<<< HEAD
from chron_stabilized import chron_stabilized
from xdate import xdate, xdate_plot
from series_corr import series_corr
from writers import write
=======
from xdate import xdate, xdate_plot
from series_corr import series_corr
from rbar import rbar, common_interval
>>>>>>> main

def main(args=None):
parser = argparse.ArgumentParser(description="dplPy v0.1") # update version as we update packages
Expand Down
Loading

0 comments on commit 8394f7c

Please sign in to comment.