-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Unknown
committed
Jan 10, 2025
0 parents
commit bed74c9
Showing
242 changed files
with
92,903 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
# Sphinx build info version 1 | ||
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done. | ||
config: 084519760a157951f829b8a78b947ace | ||
tags: 645f666f9bcd5a90fca523b33c5a78b7 |
Empty file.
488 changes: 488 additions & 0 deletions
488
_downloads/36f2dcdf0e7d567ee811f431809ff4e5/04_setup_simple_example.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
376 changes: 376 additions & 0 deletions
376
_downloads/473f52568ea3ffaea5d9cdbd4e15c657/07_read_from_s3.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,376 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Read an Example from S3" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Overview\n", | ||
"Similar to `05 Clone from S3`, we will again work with an Evaluation dataset that is located in an S3 bucket. Unlike in `05 Clone from S3`, this time we will read the data directly from the S3 bucket. We will run all the same commands against the dataset as in `05 Clone from S3`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Create an Evaluation\n", | ||
"First we will import TEEHR along with some other required libraries for this example. Then we create an Evaluation object that points tot he S3 bucket." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import teehr\n", | ||
"\n", | ||
"# Tell Bokeh to output plots in the notebook\n", | ||
"from bokeh.io import output_notebook\n", | ||
"output_notebook()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from teehr.loading.s3.clone_from_s3 import list_s3_evaluations\n", | ||
"list_s3_evaluations()[\"url\"].values" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": { | ||
"tags": [ | ||
"hide-output" | ||
] | ||
}, | ||
"outputs": [], | ||
"source": [ | ||
"# Create an Evaluation object that points to the S3 location\n", | ||
"ev = teehr.Evaluation(\"s3a://ciroh-rti-public-data/teehr-data-warehouse/v0_4_evaluations/e0_2_location_example\")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now that we have created an Evaluation that points to the data in S3, lets query the `locations` table as a GeoPandas GeoDataFrame and then plot the gages on a map using the TEEHR plotting." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"locations_gdf = ev.locations.to_geopandas()\n", | ||
"locations_gdf.teehr.locations_map()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Lets also query the `primary_timeseries` and plot the timeseries data using the `df.teehr.timeseries_plot()` method." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pt_df = ev. primary_timeseries.to_pandas()\n", | ||
"pt_df.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"pt_df.teehr.timeseries_plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And the `location_crosswalks` table." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"lc_df = ev.location_crosswalks.to_pandas()\n", | ||
"lc_df.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And the `secondary_timeseries` table." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"st_df = ev.secondary_timeseries.to_pandas()\n", | ||
"st_df.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"st_df.teehr.timeseries_plot()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"And lastly, the `joined_timeseries` table." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"jt_df = ev.joined_timeseries.to_pandas()\n", | ||
"jt_df.head()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"### Metrics\n", | ||
"Now that we have confirmed that we have all the data tables and the `joined_timeseries` table, we can move on to analyzing the data. The user is encouraged to check out the documentation pages relating to filtering and grouping in the context of generating metrics. The short explanation is that `filters` can be used to select what values are used when calculating metrics, while the `group_by` determines how the values are grouped into populations before calculating metrics.\n", | ||
"\n", | ||
"The most basic way to evaluate simulation performance is to `group_by` `configuration_name` and `primary_location_id`, and generate some basic metrics. In this case it will be Nash-Sutcliffe Efficiency, Kling-Gupta Efficiency and Relative Bias, calculated at each location for each configuration. As we saw there are 2 locations and 1 configuration, so the total number of rows that are output is just 2. If there were more `locations` or more `configurations`, there would be more rows in the output for this query. TEEHR contains many more metrics that can be calculated by simply including them in the list of `include_metrics`, and there are also many other ways to look at performance besides the basic metrics." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ev.metrics.query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\"],\n", | ||
" include_metrics=[\n", | ||
" teehr.DeterministicMetrics.NashSutcliffeEfficiency(),\n", | ||
" teehr.DeterministicMetrics.KlingGuptaEfficiency(),\n", | ||
" teehr.DeterministicMetrics.RelativeBias()\n", | ||
" ]\n", | ||
").to_pandas()" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Now to demonstrate how filters work, if we add a filter to only select values where the `primary_location_id` is `usgs-14138800`. Accordingly, it will only include rows from the `join_timeseries` table where `primary_location_id` is `usgs-14138800` in the metrics calculations, and since we are grouping by `primary_location_id`, that means we can expect one row in the output. And that is what we see below." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"(\n", | ||
" ev.metrics\n", | ||
" .query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\"],\n", | ||
" filters=[\n", | ||
" {\n", | ||
" \"column\": \"primary_location_id\",\n", | ||
" \"operator\": \"=\",\n", | ||
" \"value\": \"usgs-14138800\"\n", | ||
" }],\n", | ||
" include_metrics=[\n", | ||
" teehr.DeterministicMetrics.NashSutcliffeEfficiency(),\n", | ||
" teehr.DeterministicMetrics.KlingGuptaEfficiency(),\n", | ||
" teehr.DeterministicMetrics.RelativeBias()\n", | ||
" ]\n", | ||
" )\n", | ||
" .to_pandas()\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"As another example, because the `joined_timeseries` table contains a `year` column which was added as a user defined field, we can also group by `year`. In this case we will get the metrics calculated for each `configuration_name`, `primary_location_id`, and `year`." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"(\n", | ||
" ev.metrics\n", | ||
" .query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\", \"year\"],\n", | ||
" filters=[\n", | ||
" {\n", | ||
" \"column\": \"primary_location_id\",\n", | ||
" \"operator\": \"=\",\n", | ||
" \"value\": \"usgs-14138800\"\n", | ||
" }],\n", | ||
" include_metrics=[\n", | ||
" teehr.DeterministicMetrics.NashSutcliffeEfficiency(),\n", | ||
" teehr.DeterministicMetrics.KlingGuptaEfficiency(),\n", | ||
" teehr.DeterministicMetrics.RelativeBias()\n", | ||
" ]\n", | ||
" )\n", | ||
" .to_pandas()\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"There are many ways that TEEHR can be used to \"slice and dice\" the data in the TEEHR dataset. One last example here before wrapping up this lesson. Lets say we wanted the \"annual peak relative bias\", so that is the relative bias of the annual peak values. Well, TEEHR can do this too by chaining the query methods together and overriding the `input_field_names` and the `output_field_name` as shown below. We will do this step by step to understand it. First run the following query where the second `query` is commented out then in the next cell run it with the second `query` uncommented. As you can see first we calculate the peak primary value (`max_primary_value`) and peak secondary value (`max_secondary_value`) for each year, then we calculate the relative bias across the yearly peaks (`annual_max_relative_bias`)." | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"(\n", | ||
" ev.metrics\n", | ||
" .query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\", \"year\"],\n", | ||
" filters=[\n", | ||
" {\n", | ||
" \"column\": \"primary_location_id\",\n", | ||
" \"operator\": \"=\",\n", | ||
" \"value\": \"usgs-14138800\"\n", | ||
" }],\n", | ||
" include_metrics=[\n", | ||
" teehr.SignatureMetrics.Maximum(\n", | ||
" input_field_names=[\"primary_value\"],\n", | ||
" output_field_name=\"max_primary_value\"\n", | ||
" ),\n", | ||
" teehr.SignatureMetrics.Maximum(\n", | ||
" input_field_names=[\"secondary_value\"],\n", | ||
" output_field_name=\"max_secondary_value\"\n", | ||
" )\n", | ||
" ]\n", | ||
" )\n", | ||
" # .query(\n", | ||
" # group_by=[\"configuration_name\", \"primary_location_id\"],\n", | ||
" # include_metrics=[\n", | ||
" # teehr.DeterministicMetrics.RelativeBias(\n", | ||
" # input_field_names=[\"max_primary_value\", \"max_secondary_value\"],\n", | ||
" # output_field_name=\"monthly_max_relative_bias\"\n", | ||
" # )\n", | ||
" # ]\n", | ||
" # )\n", | ||
" .to_pandas()\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"(\n", | ||
" ev.metrics\n", | ||
" .query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\", \"year\"],\n", | ||
" filters=[\n", | ||
" {\n", | ||
" \"column\": \"primary_location_id\",\n", | ||
" \"operator\": \"=\",\n", | ||
" \"value\": \"usgs-14138800\"\n", | ||
" }],\n", | ||
" include_metrics=[\n", | ||
" teehr.SignatureMetrics.Maximum(\n", | ||
" input_field_names=[\"primary_value\"],\n", | ||
" output_field_name=\"max_primary_value\"\n", | ||
" ),\n", | ||
" teehr.SignatureMetrics.Maximum(\n", | ||
" input_field_names=[\"secondary_value\"],\n", | ||
" output_field_name=\"max_secondary_value\"\n", | ||
" )\n", | ||
" ]\n", | ||
" )\n", | ||
" .query(\n", | ||
" group_by=[\"configuration_name\", \"primary_location_id\"],\n", | ||
" include_metrics=[\n", | ||
" teehr.DeterministicMetrics.RelativeBias(\n", | ||
" input_field_names=[\"max_primary_value\", \"max_secondary_value\"],\n", | ||
" output_field_name=\"annual_max_relative_bias\"\n", | ||
" )\n", | ||
" ]\n", | ||
" )\n", | ||
" .to_pandas()\n", | ||
")" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"ev.spark.stop()" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": ".venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.15" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Oops, something went wrong.