Skip to content

Commit

Permalink
Merge pull request #111 from hytest-org/dev
Browse files Browse the repository at this point in the history
remove dev; merge to main
  • Loading branch information
amsnyder authored Nov 7, 2022
2 parents 2909cfa + abbb504 commit 69d8ec7
Show file tree
Hide file tree
Showing 3 changed files with 39 additions and 32 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -6,28 +6,22 @@
"source": [
"# Preparing CONUS404 and reference data for aggregation and sampling\n",
"\n",
"Short paragraph describing what is about to happen\n",
"\n",
"<details>\n",
" <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>\n",
" \n",
" <table>\n",
" <tr>\n",
" <td>Pre-Requisites\n",
" <td>To get the most out of this notebook, you should already have an understanding of these topics: \n",
" <ul>\n",
" <li>pre-req one\n",
" <li>pre-req two\n",
" </ul>\n",
" <tr>\n",
" <td>Expected Results\n",
" <td>At the end of this notebook, you should be able to: \n",
" <ul>\n",
" <li>outcome one\n",
" <li>outcome two\n",
" </ul>\n",
" </table>\n",
"</details>"
"<img src='./Eval_PreProc.svg' width=600>\n",
"\n",
"The data preparation step is needed in order to align the datasets for analysis. The specific \n",
"steps needed to prepare a given dataset may differ, depending on the source and the variable of\n",
"interest. \n",
"\n",
"Some steps might include: \n",
"\n",
"* Organizing the time-series index such that the time steps for both simulated and observed are congruent\n",
" * This may involve interpolation to estimate a more granular time-step than is found in the source data\n",
" * More often, an agregating function is used to 'down-sample' the dataset to a coarser time step (days vs hours).\n",
"* Coordinate aggregation units between simulated and observed \n",
" * Gridded data may be sampled per HUC-12, HUC-6, etc. to match modeled data indexed by these units. \n",
"* Re-Chunking the data to make time-series analysis more efficient (see [here](/dev/null) for a primer on re-chunking).\n",
"\n",
"## **Step 0: Importing libaries**"
]
},
{
Expand Down Expand Up @@ -2000,7 +1994,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Retrieving data from HPC or the Cloud\n",
"## **Step 2: Retrieving data from HPC or the Cloud**\n",
"#### The process varies based on where the notebook is being run but generally looks this:\n",
"1. (Done already) Connect to workspace (local, HPC, or QHUB) and open notebook\n",
"2. Start Dask client \n",
Expand All @@ -2013,7 +2007,7 @@
"metadata": {},
"source": [
"# Update to helper function after repo consolidation\n",
"## **Start a Dask client using an appropriate Dask Cluster** \n",
"### Start a Dask client using an appropriate Dask Cluster\n",
"This is an optional step, but can speed up data loading significantly, especially when accessing data from the cloud."
]
},
Expand Down Expand Up @@ -2191,7 +2185,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Retrieve CONUS404 from source and tranform it to a Dask array**"
"### Retrieve CONUS404 from source and tranform it to a Dask array"
]
},
{
Expand Down Expand Up @@ -2245,7 +2239,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## **Explore the data** \n",
"## **Step 3: Explore the data** \n",
"(sometimes called exploratory data analysis (EDA) or exploratory spatial data analysis (ESDA) when it contains cartographic data)"
]
},
Expand Down Expand Up @@ -2345,7 +2339,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Importing geographic extents\n",
"## **Step 3: Importing geographic extents**\n",
"Sometimes geographic data is brought in and used to clip a larger dataset to an area of interest (AOI). \n",
"\n",
"Let's look at two ways this can be done: a user-defined polygon or using the pyNHD package. Data can also be brought in other ways such as a local file or an API request. These are covered in other tutorials such as [Reading and Writing Files (GeoPandas)](https://geopandas.org/en/stable/docs/user_guide/io.html) or using the Python [requests](https://requests.readthedocs.io/en/latest/) package for [Programattically Accesing Geospatial Data Using APIs](https://www.earthdatascience.org/courses/use-data-open-source-python/intro-to-apis/spatial-data-using-apis/).\n",
Expand Down Expand Up @@ -2521,7 +2515,7 @@
"tags": []
},
"source": [
"## **Putting it together: Process CONUS404 to variable and research spatial extent**\n",
"## **Step 4: Putting it together: Process CONUS404 to variable and spatial extent**\n",
"In this section we are going to put together some skills we have learned so far: bring in CONUS404, select our variables, then clip to our spatial extent. This assumes that the notebook is being run on the ESIP QHub. If being run on HPC then comment/uncomment the datasets as needed.\n",
"\n",
"Variables: Accumulated precipitation (PREC_ACC_NC), air temperature (TK), and (calculated) surface net radiation (RNET) <br>\n",
Expand Down Expand Up @@ -8876,7 +8870,7 @@
"tags": []
},
"source": [
"### Prepare reference data\n",
"## **Step 5: Prepare reference data**\n",
"\n",
"Now that the CONUS404 dataset has been preprocessed, we need to do the same with datasets used for comparison with the forcings data. In this section, data will be brought in from several sources and preprocessed in data-type-appropriate ways.\n",
"\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,21 @@
"source": [
"# Create zonal statistics and point extractions for comparing CONUS404 and reference datasets\n",
"\n",
"Short paragraph describing what is about to happen\n",
"<img src='./Eval_PreProc.svg' width=600>\n",
"\n",
"The pre-processing step is needed in order to align the two datasets for analysis. The specific \n",
"steps needed to prepare a given dataset may differ, depending on the source and the variable of\n",
"interest. \n",
"\n",
"Some steps might include: \n",
"\n",
"* Organizing the time-series index such that the time steps for both simulated and observed are congruent\n",
" * This may involve interpolation to estimate a more granular time-step than is found in the source data\n",
" * More often, an agregating function is used to 'down-sample' the dataset to a coarser time step (days vs hours).\n",
"* Coordinate aggregation units between simulated and observed \n",
" * Gridded data may be sampled per HUC-12, HUC-6, etc. to match modeled data indexed by these units. \n",
" * Index formats may be adjusted (e.g. a 'gage_id' may be 'USGS-01104200' in one data set, vs '01104200' in another)\n",
"* Re-Chunking the data to make time-series analysis more efficient (see [here](/dev/null) for a primer on re-chunking).\n",
"<details>\n",
" <summary>Guide to pre-requisites and learning outcomes...&lt;click to expand&gt;</summary>\n",
" \n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -593,7 +593,7 @@
"outputs": [],
"source": [
"#Data is loaded \n",
"DScore = pd.read_csv(r'./DScore_streamflow_benchmark.csv', dtype={'site_no':str} ).set_index('site_no', drop=False)\n",
"DScore = pd.read_csv(r'./DScore_streamflow_example.csv', dtype={'site_no':str} ).set_index('site_no', drop=False)\n",
"# Merge benchmarks with cobalt data to form a single table, indexed by site_no\n",
"metrics = DScore.columns.tolist()[1:] \n",
"DScore = DScore.merge(\n",
Expand Down Expand Up @@ -750,7 +750,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.10.6 ('hytest')",
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
Expand Down

0 comments on commit 69d8ec7

Please sign in to comment.