Functional performance #99

galengorski · 2022-05-11T20:58:55Z

galengorski
May 11, 2022
Maintainer

To try to understand why different versions of the model were performing differently, I tried to analyze the model functional performance. The analysis is based on this paper by Ruddell et al. I focused on a single site 01481000, Chadds Ford on Brandywine Creek. It is a training and validation site. I used model results from 3 models:

0_baseline_lstm -- predicting DO
1a_lstm_metab_just_metab -- predicting metabolism and physical parameters and using deterministic equations to predict DO
2_metab_multitask -- predicting DO, metabolism, and physical parameters

The plot below shows the RMSE of the validation and training data together on the y-axis. For the x-axis, I calculated the transfer entropy between air temp (seg_tave_air) and DO (min,mean,max), then took the difference between the modeled transfer entropy and the observed transfer entropy. An x value of 0 indicates that the model is extracting the same amount of information from air temperature, a positive value indicates an "over-deterministic" model and a negative value indicates "over-random" behavior. The models are color coded.

the metab model seems to show the best agreement with observed data for DOmin, which is cool because air temperature is used to calculate DOmin. So in this case, the metab model did improve functional performance, but at the (slight) cost of predictive performance.
For DOmean, the metab model really misses information contained within air temp.
The other takeaway for me, is how good the baseline LSTM model is at extracting the right amount of information

galengorski · 2022-05-11T21:15:05Z

galengorski
May 11, 2022
Maintainer Author

I did the same analysis for discharge. The x-axis in these plots is the difference in the transfer entropy (modeled - observed) from discharge (seg_upstream_inflow) to DOmin,mean,max. Again x=0 means that the model is matching patterns in the observed data.

The interesting thing about discharge is that it is functionally an input in metab, and multitask (through the Zt term), but not in the baseline model.

Note the change in scale of the x-axis from the plot above. In the metab model results, ~45% of the uncertainty in DOmin can be explained by today's discharge value, when in reality that number should be about 2% according to the observed data.
This makes it look like metab is making a very complicated concentration discharge model
Interesting that the baseline LSTM once again has the best functional performance among the models even though it's the only one that doesn't have some discharge proxy as an input

0 replies

jsadler2 · 2022-05-13T14:51:58Z

jsadler2
May 13, 2022
Maintainer

Cool, @galengorski. Thanks for putting this together. I like the approach. It's nice to have something to chew on.

Thinking about the multitask model- for the train/val sites, the multitask model slightly decreased predictive performance, but it also decreased functional performance. This is very interesting!

I'm very interested to see the functional performance at one (or all) of the validation sites, since the multitask model increased predictive performance. How hard would it be for you to produce similar plots for a val site?

Also, I just wanted to note we shouldn't put much stock into the analysis of the do_mean for the "metab" model because of the random weights that are throwing off those predictions. ... That is also why we see the RMSE's so much higher for that compared to the min and max.

3 replies

galengorski May 13, 2022
Maintainer Author

@jsadler2 Good point about the do_mean, I'll leave that out of the next analysis. I think it would be really interesting to do this for other sites and it wouldn't be hard. I do have a data question though: For these analyses I have been using the well_observed_trn_inputs.zarr and well_observed_trn_targets.zarr files from within 02a_model/out/. I ran targets to create these. Within the same folder there is a well_observed_trn_val_targets.zarr but no train val equivalent for inputs. Do you know where I could find that, or how to recreate it?

jsadler2 May 13, 2022
Maintainer

I think it would be really interesting to do this for other sites and it wouldn't be hard

Sweeet.

jsadler2 May 13, 2022
Maintainer

Re the zarr data: I kind of iterated a lot through those file names. And actually both of the ones you mention above are outdated. Now there are just two files:

well_obs_inputs.zarr
well_obs_targets.zarr

Before, I was splitting them into train and train/val because river-dl at the time didn't have the capability to account for validation sites just validation times. We added that in later, so now we don't have to make that split and we just have one data file for inputs and one for the targets (or outputs).

Does that makes sense?

galengorski · 2022-06-02T21:30:28Z

galengorski
Jun 2, 2022
Maintainer Author

Continuing this analysis with six sites and three models.

Training sites: 01480617 ,01480870, and 01481000
Validation sites: 01472104, 01473500, and 01481500
baseline: predict do_min, do_mean, do_max
metab: predict do_min, do_mean, do_max, GPP, ER
multi-task: predict do_min, do_mean, do_max, GPP, ER, k, z, T

This figure is the average functional performance (transfer entropy from air temp to DO, modeled compared to observed) vs. the average RMSE for training and calibration sites for all models

Overall, there doesn't seem to be a huge difference in functional performance across the three models
The multi-task model does better at making predictions at validation sites for do_min without a huge loss in functional performance
I expected the multi-task model to show better functional performance in general because it is fitting to water temperature data, but maybe the baseline model already does about as well as could be expected

This figure is comparing functional performance (transfer entropy from air temp to DO) on the x-axis to RMSE on the y-axis for the baseline model only:

The model is over-extracting information from air temperature at most sites particularly with do_min and do_mean
For predicting do_min at validation sites, it looks like there is a tradeoff between predictive and functional performance
It looks like the tradeoff between predictive and functional performance corresponds well with average temperature (not shown), that is, sites with higher average temperature have higher RMSEs but better predictive performance. Not sure how to connect that to the controlling processes

0 replies

galengorski · 2022-08-22T21:09:34Z

galengorski
Aug 22, 2022
Maintainer Author

Using functional performance to compare the baseline model to the metab_dense model.

The baseline model predicts do_min, do_mean, and do_max, while the metab_dense model predicts GPP, ER, K, temperature, and depth, then uses a dense layer with tuned weights to convert those predictions into do_min, do_mean, and do_max.

This is the transfer entropy from solar radiation to do_min, do_mean, and do_max for the baseline (blue) and metab_dense (red) models at different time lags. Transfer entropy is averaged across all sites. Transfer entropy is the amount of uncertainty reduced in the DO variable through knowledge of solar radiation at that time lag, independent of the DO variable history. The TE is normalized by the entropy of the DO variable. So a TE value of 0.05 at a time lag of 1 day means that knowing yesterday’s solar radiation value reduces uncertainty in today’s DO by 5%.

Focusing first on the black line, we see that at a time lag of 1 day, DO_max has the highest TE compared to do_mean, and do_min. This makes sense, as DO_max is more strongly influenced by solar radiation (through GPP) than the other variables. These plots show that both models are underutilizing solar radiation across all time lags. However, the metab_dense model is doing a better job at representing the relationships between solar radiation and DO. An interpretation is that the metab dense model is doing better at representing the relationships between solar radiation and do_max because it is being explicitly trained on GPP, which represents the linkage between solar radiation and DO max.

1 reply

jsadler2 Aug 30, 2022
Maintainer

Very cool, @galengorski.

I am a little curious about how to interpret the other log days - i.e., what does it mean that the TE at lag day 2 is 0.06? In other words, if you have 0.06 for day 1 and day 2, does it mean by knowing both of those, you'd reduce your uncertainty by 12%?

Another, maybe related question - I wonder why the curve is generally going up? Does that mean you are reducing uncertainty more with solar radiation from 7 days ago compared to 1 day ago? But that for sure doesn't make sense, so I must be thinking about these values wrong.

It also sorta reminds me of the expected gradients plots and I could be mixing the interpretation of the two up.

galengorski · 2022-08-22T21:17:57Z

galengorski
Aug 22, 2022
Maintainer Author

Now focusing on a lag of one day, the maximum TE value for DO_max, we can calculate the functional performance as the difference between the modeled and observed transfer entropy for each model. We calculate the functional performance for each site individually to see where the functional performance is improving the most.

Here we are looking at the functional performance on the y-axis, 0 is optimal functional performance, meaning that the modeled solar radiation to DO relationship exactly matches the observed relationship. On the x-axis is the NHD segment canopy coverage within a 100 meter buffer. I wouldn’t neccesarily expect there to be a strong relationship between the canopy coverage and the solar radiation - DO relationship since solar radiation is measured above the canopy, but you can see how the metab_dense model increases functional performance across almost all sites in all DO variables. This suggests that by using solar radiation to predict GPP and then DO (red dots) instead of directly DO (blue dots), the solar radiation to DO relationship is more accurately represented across all sites and DO variables.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functional performance #99

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 4 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Functional performance #99

galengorski May 11, 2022 Maintainer

Replies: 5 comments · 4 replies

galengorski May 11, 2022 Maintainer Author

jsadler2 May 13, 2022 Maintainer

galengorski May 13, 2022 Maintainer Author

jsadler2 May 13, 2022 Maintainer

jsadler2 May 13, 2022 Maintainer

galengorski Jun 2, 2022 Maintainer Author

galengorski Aug 22, 2022 Maintainer Author

Using functional performance to compare the baseline model to the metab_dense model.

jsadler2 Aug 30, 2022 Maintainer

galengorski Aug 22, 2022 Maintainer Author

galengorski
May 11, 2022
Maintainer

Replies: 5 comments 4 replies

galengorski
May 11, 2022
Maintainer Author

jsadler2
May 13, 2022
Maintainer

galengorski May 13, 2022
Maintainer Author

jsadler2 May 13, 2022
Maintainer

jsadler2 May 13, 2022
Maintainer

galengorski
Jun 2, 2022
Maintainer Author

galengorski
Aug 22, 2022
Maintainer Author

jsadler2 Aug 30, 2022
Maintainer

galengorski
Aug 22, 2022
Maintainer Author