diff --git a/08-hydrology1_precip.Rmd b/08-hydrology1_precip.Rmd index 93a1b72..9cf8652 100644 --- a/08-hydrology1_precip.Rmd +++ b/08-hydrology1_precip.Rmd @@ -2,6 +2,10 @@ All of the earlier chapters of this book dealt with the behavior of water in different hydraulic systems, such as canals or pipes. Now we consider the bigger picture of where the water originates, and ultimately how we can estimate how much water is available for different uses, and how much excess (flood) water systems will need to be designed and built to accommodate. +The _hydromisc_ package will need to be installed to access some of the data used below. If it is not installed, do so following the instructions on [the github site for the package](https://github.com/EdM44/hydromisc). + +## The hydrologic cycle + A fundamental concept is the hydrologic cycle, depicted in Figure \@ref(fig:hydr-cycle). (ref:fighc1) The hydrologic cycle, from the [USGS](https://labs.waterdata.usgs.gov/visualizations/water-cycle) @@ -10,9 +14,131 @@ A fundamental concept is the hydrologic cycle, depicted in Figure \@ref(fig:hydr knitr::include_graphics('images/USGS_WaterCycle_anthropogenic.png') ``` -The primary variable in the hydrologic cycle from an engineering perspective is precipitation, since that is the source of the water used and managed in engineered systems. +The primary variable in the hydrologic cycle from an engineering perspective is precipitation, since that is the source of the water used and managed in engineered systems. When precipitation hits the land surface it can take many paths: principally it may infiltrate into the soil or form runoff. Water collected in ponds or depressions may evaporate, and additional water may evaporate from the soil surface or be drawn from the soil by vegetation to be transpired. In a simplified way, the water budget may be expressed as Equation \@ref(eq:hc-1). +\begin{equation} + Q = P - E + \Delta S + (\#eq:hc-1) +\end{equation} +where _Q_ is outflow (as in streamflow volume), _P_ is precipitation, _E_ is evapotranspiration, and _ΔS_ is the change in storage (in snowpack, the soil or groundwater). Over long time periods, _ΔS_ becomes very small compared to the other terms, so for what is called ‘climatological’ periods (often 30-year averages) the water budget can simplified to Equation \@ref(eq:hc-2) +\begin{equation} + Q = P - E + (\#eq:hc-2) +\end{equation} +All variables must have the same units, so typically _Q_ is expressed as a depth of water (or runoff) over the contributing drainage area. Example \@ref(exm:ex-hcx1) shows a demonstration of a simple water budget over a basin. -The _hydromisc_ package will need to be installed to access some of the data used below. If it is not installed, do so following the instructions on [the github site for the package](https://github.com/EdM44/hydromisc). +::: {.example #ex-hcx1} +Using only the generally available data P, T (air temperature), and Q, assess the water budget for the watershed contributing flow to the USGS stream gauge 11169500, Saratoga Creek, CA. +::: +First considering Q, streamflow for USGS stream gauges can be directly downloaded from the USGS either from [their website](https://dashboard.waterdata.usgs.gov/app/nwd/en/) or using the R package [dataRetrieval](https://cran.r-project.org/package=dataRetrieval). However, for many sites those streamflow observations will include effects of upstream impoundments (dams) or diversions, so will not represent the response of the watershed to precipitation. Another option is to use estimates of "naturalized" streamflows, which have been calculated by the USGS as part of its [National Hydrographic Dataset (NHD)](https://www.usgs.gov/national-hydrography/about-national-hydrography-products) products. The R package [nhdplusTools](https://cran.r-project.org/package=nhdplusTools) provides one way to access this. The packages [sf](https://cran.r-project.org/package=sf) and [terra](https://cran.r-project.org/package=terra) will be useful for manipulating some spatial objects created along the way. + +First the gauge and contributing basin are downloaded using [nhdplusTools](https://cran.r-project.org/package=nhdplusTools), and the river reach flowlines are identified. +```{r message=FALSE} +library(nhdplusTools) + +nwissite <- list(featureSource = "nwissite", featureID = "USGS-11169500") + +# Retrieve gauge location and basin boundary objects +gauge <- get_nldi_feature(nwissite) +basin <- get_nldi_basin(nwissite) +nldi_feature <- list(featureSource = "comid", featureID = as.integer(gauge$comid)[1]) +flowline_nldi <- navigate_nldi(nldi_feature, + mode = "upstreamTributaries", + distance_km = 1000) +``` + +With the basin and reaches defined, the nhdplus data for the basin contributing to the outlet gauge can be downloaded as a geopackage (.gpkg file). Here the current working directory is set as where the file will be written. +```{r message=FALSE, eval=FALSE} +# Download the geopackage to the working directory +output_file_download <- file.path(getwd(), "subset_download.gpkg") + +nhd_subset <-subset_nhdplus(comids = as.integer(flowline_nldi$UT$nhdplus_comid), + output_file = output_file_download, + flowline_only = FALSE, + nhdplus_data = "download", + return_data = TRUE, overwrite = TRUE) +flowline <- nhd_subset$NHDFlowline_Network +``` + +```{r message=FALSE, echo=FALSE} +ndhsubsetfile <- system.file("extdata", "subset_download.gpkg", package="hydromisc") +flowline <- sf::st_read(ndhsubsetfile, "NHDFlowline_Network", quiet = TRUE) +``` + +In the example above, _nhd_subset_ has four layers of _sf_ objects, one of which is the _NHDFlowline_ layer that is used in what follows. What is displayed in Figure \@ref(fig:wbfig) is a static image produced using the [mapview](https://cran.r-project.org/package=mapview) function _mapshot_. +```{r, message=FALSE, eval=FALSE} +mapview::mapview(basin, legend=FALSE) + + mapview::mapview(flowline, color = "cyan", col.regions = "white", lwd = 3, legend=FALSE) + + mapview::mapview(gauge, color = "red", col.regions = "white", cex = 3, legend=FALSE) +``` + +```{r wbfig, message=FALSE, echo=FALSE, fig.align = 'center', out.width = "70%", fig.cap = "Watershed boundary and nhdplus stream network for USGS gauge 11169500."} +knitr::include_graphics('images/nhdplus_basin.png') +``` + +In this example the flows with the prefix *qc* (representing the QCMA flows described in the [NHDPlus User's Guide](https://pubs.usgs.gov/publication/ofr20191096)) are extracted to represent naturalized flows. These are also converted to _mm_ using the total basin drainage area and the number of days per month. +```{r message=FALSE} +# Extract the data for only the outlet reach +outlet <- subset(flowline, flowline$comid == gauge$comid) +drainage_area_km2 <- outlet$totdasqkm +# 12 monthly flows for outlet - QC is natural flows adjusted using unimpaired gauges +df.flows <- dplyr::select(sf::st_drop_geometry(outlet), starts_with("qc_"), -qc_ma) + +df.flows <- df.flows |> tidyr::pivot_longer(cols = 1:12, names_to = "Month", values_to = "flow_cfs") |> + dplyr::mutate(Month = month.abb) +# Add a column that converts monthly flow in cfs into a depth in mm over the watershed +dpm <- lubridate::days_in_month(1:12) +df.flows$runoff <- df.flows$flow_cfs * 86400 * dpm * 304.8^3 / (drainage_area_km2 * 10^12) +``` + +To obtain P and T there are many options. A simple one is to use an online tool such as [Model My Watershed](https://modelmywatershed.org/) and read in the csv file of P and T. This example will obtain 30-year normals for P and T using the [prism R package](https://cran.r-project.org/package=prism). This is extracted from the [PRISM](https://prism.oregonstate.edu/normals/) archive, a widely used product for P and T in the U.S. The [terra](https://cran.r-project.org/package=terra) package is used to extract the portion of the layer within the basin and find the mean. +```{r message=FALSE, eval=FALSE} +library(prism) + +prism_set_dl_dir(tempdir()) +# Download the monthly 30-year averages at 4km resolution +get_prism_normals(type="tmean", resolution = "4km", mon = 1:12, keepZip = FALSE) +flist <- prism_archive_subset("tmean", "monthly normals", mon = 1:12, resolution = "4km") +flist_full <- pd_to_file(flist) +rtmp <- terra::rast(flist_full) +df.t_prism <- terra::extract(rtmp, basin, fun = mean) |> + tidyr::pivot_longer(cols = c(-ID), names_to = "Month", values_to = "tavg_C") |> + dplyr::mutate(Month = month.abb) |> dplyr::select(-c("ID")) + +get_prism_normals(type="ppt", resolution = "4km", mon = 1:12, keepZip = FALSE) +flist <- prism_archive_subset("ppt", "monthly normals", mon = 1:12, resolution = "4km") +flist_full <- pd_to_file(flist) +rppt <- terra::rast(flist_full) +df.p_prism <- terra::extract(rppt, basin, fun = mean) |> + tidyr::pivot_longer(cols = c(-ID), names_to = "Month", values_to = "precip") |> + dplyr::mutate(Month = month.abb) |> dplyr::select(-c("ID")) +``` + +```{r message=FALSE, echo=FALSE} +df.t_prism <- hydromisc::df.t_prism +df.p_prism <- hydromisc::df.p_prism +``` + +Finally, combine the runoff, P and T into a single data frame and plot them. The potential evapotranspiration (PET) is added using a function that only requires temperature and latitude (for day length) from the [SPEI](https://cran.r-project.org/package=SPEI) package. +```{r wbprism, message=FALSE, fig.align = 'center', out.width = "70%", fig.cap = "Partial water budget for the basin contributing to USGS gauge 11169500."} +df.wb <- list(df.flows, df.t_prism, df.p_prism) |> purrr::reduce(dplyr::left_join, by = "Month") +# Find basin centroid +pt1 <- sf::st_coordinates(sf::st_centroid(basin)) +df.wb$PET <- SPEI::thornthwaite(df.wb$tavg_C, lat = pt1[2], verbose=FALSE) + +df.plot <- df.wb |> dplyr::select(Month, runoff, PET, precip) |> + tidyr::pivot_longer(cols = c(runoff, PET), names_to = "Component", values_to = "Flux") + +# Convert months to integers for plotting +suppressPackageStartupMessages(library(ggplot2)) +ggplot(data = df.plot, aes(x = match(Month,month.abb))) + + geom_col(aes(y = Flux, fill = Component)) + + geom_line(aes(y = precip, lty = "precip"), linewidth = 1.25) + + scale_x_discrete(limits = month.abb) + + labs(x="",y="Precip, Runoff or PET, mm", linetype = NULL) + + theme_bw() +``` + +While incomplete, this water budget illustrates months in which the basin is water limited (runoff + PET > precipitation) or energy limited (runoff + PET < precipitation). Water is more available to recharge the soil moisture and groundwater during energy limited periods since actual evaporation, which cannot exceed PET, will leave excess water, following Equation \@ref(eq:hc-1). As another alternative, a simple water budget can be estimated from P and T and an estimate of the soil moisture holding capacity using the [bioclim](https://cran.r-project.org/package=bioclim) R package. ## Precipitation observations @@ -93,7 +219,6 @@ knitr::kable(monavg, digits = 2) |> The winter of 2016-2017 (water year 2017) was a record wet year for much of California. Figure \@ref(fig:prcp-2) shows a **hyetograph** the daily values for that year. ```{r prcp-2, message=FALSE, warning = FALSE, fig.align = 'center', out.width = "75%", fig.cap="Daily Precipitation for San Jose, CA for water year 2017"} -library(ggplot2) ghcn_prcp2 <- data.frame(date = ghcn_data$DATE, wy = ghcn_data$wy, prcp = ghcn_data$PRCP ) ggplot(subset(ghcn_prcp2, wy==2017), aes(x=date, y=prcp)) + geom_bar(stat="identity",color="red") + @@ -111,7 +236,7 @@ avgrainrate <- ghcn_prcp2[ghcn_prcp2$prcp > 0.01,] |> group_by(wy) |> nraindays <- ghcn_prcp2[ghcn_prcp2$prcp > 0.01,] |> group_by(wy) |> summarise(nraindays = length(prcp)) #Find length of consecutive dry and wet spells for the record -days.dry.wet <- seas::interarrival(ghcn_prcp, var = "prcp", p.cut = 0.01, inv = FALSE) +days.dry.wet <- seas::interarrival(ghcn_prcp2, var = "prcp", p.cut = 0.01, inv = FALSE) #add a water year column to the result days.dry.wet$wy <- sapply(days.dry.wet$date, wateryr) res <- days.dry.wet |> group_by(wy) |> summarise(cdd = median(dry, na.rm=TRUE), cwd = median(wet, na.rm=TRUE)) @@ -199,7 +324,6 @@ annual_sum <- data.frame(year = annual_data$Year, sum_X = cumsum(annual_data$Station_X)) #create scatterplot with a label on every point -library(ggplot2) library(ggrepel) ggplot(annual_sum, aes(sum_X,sum_A, label = year)) + geom_point() + diff --git a/09-hydrology2_processes.Rmd b/09-hydrology2_processes.Rmd index 7ae54ed..1b574ab 100644 --- a/09-hydrology2_processes.Rmd +++ b/09-hydrology2_processes.Rmd @@ -207,7 +207,7 @@ Here $E_p$ is in mm/d, $\Delta$ and $\gamma$ are in $kPa~K^{-1}$, $R_n$ is in $M Open water evaporation can also be calculated using a modified version of the Penman-Monteith equation \@ref(eq:pm1). In this latter case, vegetation coefficients are not needed, so Equation \@ref(eq:pm1) can be used with $r_s=0$ and $r_a=251/(1+0.536~u_2)$, following [Thom & Oliver, 1977](https://doi.org/10.1002/qj.49710343610). -The R package [Evaporation](https://cran.r-project.org/package=Evapotranspiration) has functions to calculate $ET_0$ using this and many other functions. This is especially useful when calculating PET over many points or through a long time series. +The R package [Evaporation](https://cran.r-project.org/package=Evapotranspiration) has functions to calculate $ET_0$ using this and many other functions. This is especially useful when calculating PET over many points or through a long time series. Satellite-observed ET has been estimated globally as part of the [OpenET project](https://etdata.org/); the data may be accessed using the [openet R package](https://github.com/codeswitching/openet). ## Snow diff --git a/docs/designing-for-floods-flood-hydrology.html b/docs/designing-for-floods-flood-hydrology.html index 6be2d40..fb8e8ce 100644 --- a/docs/designing-for-floods-flood-hydrology.html +++ b/docs/designing-for-floods-flood-hydrology.html @@ -25,7 +25,7 @@ - + @@ -256,10 +256,11 @@
Flood hydrology is generally the description of how frequently a flood of a certain level will be exceeded in a specified period. This was discussed briefly in the section on precipitation frequency, Section 8.2.
+Flood hydrology is generally the description of how frequently a flood of a certain level will be exceeded in a specified period. This was discussed briefly in the section on precipitation frequency, Section 8.3.
The hydromisc package will need to be installed to access some of the data used below. If it is not installed, do so following the instructions on the github site for the package.
Example 10.1 A temporary dam is constructed while a repair is built. It will be in place 5 years and is designed to protect against floods up to a 20-year recurrence interval (i.e., there is a \(p=\frac{1}{20}=0.05\), or 5% chance, that it will be exceeded in any one year). What is the probability of (a) no failure in the 5 year period, and (b) at least two failures in 5 years.
# (a)
-ans1 <- dbinom(0, 5, 0.05)
-cat(sprintf("Probability of exactly zero occurrences in 5 years = %.4f %%",100*ans1))
-#> Probability of exactly zero occurrences in 5 years = 77.3781 %
-# (b)
-ans2 <- 1 - pbinom(1,5,.05) # or pbinom(1,5,.05, lower.tail=FALSE)
-cat(sprintf("Probability of 2 or more failures in 5 years = %.2f %%",100*ans2))
-#> Probability of 2 or more failures in 5 years = 2.26 %
# (a)
+ans1 <- dbinom(0, 5, 0.05)
+cat(sprintf("Probability of exactly zero occurrences in 5 years = %.4f %%",100*ans1))
+#> Probability of exactly zero occurrences in 5 years = 77.3781 %
+# (b)
+ans2 <- 1 - pbinom(1,5,.05) # or pbinom(1,5,.05, lower.tail=FALSE)
+cat(sprintf("Probability of 2 or more failures in 5 years = %.2f %%",100*ans2))
+#> Probability of 2 or more failures in 5 years = 2.26 %
While the next example uses normally distributed data, most data in hydrology are better described by other distributions.
Example 10.2 Annual average streamflows in some location are normally distributed with a mean annual flow of 20 m\(^3\)/s and a standard deviation of 6 m\(^3\)/s. Find (a) the probability of experiencing a year with less than (or equal to) 10 m\(^3\)/s, (b) greater than 32 m\(^3\)/s, and (c) the annual average flow that would be expected to be exceeded 10% of the time.
# (a)
-ans1 <- pnorm(10, mean=20, sd=6)
-cat(sprintf("Probability of less than 10 = %.2f %%",100*ans1))
-#> Probability of less than 10 = 4.78 %
-# (b)
-ans2 <- pnorm(32, mean=20, sd=6, lower.tail = FALSE) #or 1 - pnorm(32, mean=20, sd=6)
-cat(sprintf("Probability of greater than or equal to 30 = %.2f %%",100*ans2))
-#> Probability of greater than or equal to 30 = 2.28 %
-# (c)
-ans3 <- qnorm(.1, mean=20, sd=6, lower.tail=FALSE)
-cat(sprintf("flow exceeded 10%% of the time = %.2f m^3/s",ans3))
-#> flow exceeded 10% of the time = 27.69 m^3/s
-# plot to visualize answers
-x <- seq(0,40,0.1)
-y<- pnorm(x,mean=20,sd=6)
-xlbl <- expression(paste(Flow, ",", ~ m^"3"/s))
-plot(x ,y ,type="l",lwd=2, xlab = xlbl, ylab= "Prob. of non-exceedance")
-abline(v=10,col="black", lwd=2, lty=2)
-abline(v=32,col="blue", lwd=2, lty=2)
-abline(h=0.9,col="green", lwd=2, lty=2)
-legend("bottomright",legend=c("(a)","(b)","(c)"),col=c("black","blue","green"), cex=0.8, lty=2)
# (a)
+ans1 <- pnorm(10, mean=20, sd=6)
+cat(sprintf("Probability of less than 10 = %.2f %%",100*ans1))
+#> Probability of less than 10 = 4.78 %
+# (b)
+ans2 <- pnorm(32, mean=20, sd=6, lower.tail = FALSE) #or 1 - pnorm(32, mean=20, sd=6)
+cat(sprintf("Probability of greater than or equal to 30 = %.2f %%",100*ans2))
+#> Probability of greater than or equal to 30 = 2.28 %
+# (c)
+ans3 <- qnorm(.1, mean=20, sd=6, lower.tail=FALSE)
+cat(sprintf("flow exceeded 10%% of the time = %.2f m^3/s",ans3))
+#> flow exceeded 10% of the time = 27.69 m^3/s
+# plot to visualize answers
+x <- seq(0,40,0.1)
+y<- pnorm(x,mean=20,sd=6)
+xlbl <- expression(paste(Flow, ",", ~ m^"3"/s))
+plot(x ,y ,type="l",lwd=2, xlab = xlbl, ylab= "Prob. of non-exceedance")
+abline(v=10,col="black", lwd=2, lty=2)
+abline(v=32,col="blue", lwd=2, lty=2)
+abline(h=0.9,col="green", lwd=2, lty=2)
+legend("bottomright",legend=c("(a)","(b)","(c)"),col=c("black","blue","green"), cex=0.8, lty=2)
The USGS has developed many R packages, including one for retrieval of data, dataRetrieval. Since this resides on CRAN, the package can be installed with (the use of ‘!requireNamespace’ skips the installation if it already is installed):
- +Other USGS packages that are very helpful for peak flow analysis are not on CRAN, but rather housed in a USGS repository. The easiest way to install packages from that archive is to follow the installation instructions at that repository. For the exercises below, install smwrGraphs and smwrBase following the instructions at those sites. The prefix smwr refers to their use in support of the excellent reference Statistical Methods in Water Resources.
Lastly, the lmomco package has extensive capabilities to work with many forms of probability distributions, and has functions for calculating distribution parameters (like skew) that we will use.
- +While the data could be downloaded and saved locally through that link, it is convenient here to use the dataRetrieval command.
- +The data used here are also available as part of the hydromisc package, and may be obtained by typing hydromisc::Qpeak_download
.
It is always helpful to look at the downloaded data frame before doing anything with it. There are many columns that are not needed or that have repeated information. There are also some rows that have no data (‘NA’ values). It is also useful to change some column names to something more intuitive. We will need to define the water year (a water year begins October 1 and ends September 30).
-Qpeak <- Qpeak_download[!is.na(Qpeak_download$peak_dt),c('peak_dt','peak_va')]
-colnames(Qpeak)[colnames(Qpeak)=="peak_dt"] <- "Date"
-colnames(Qpeak)[colnames(Qpeak)=="peak_va"] <- "Peak"
-Qpeak$wy <- smwrBase::waterYear(Qpeak$Date)
Qpeak <- Qpeak_download[!is.na(Qpeak_download$peak_dt),c('peak_dt','peak_va')]
+colnames(Qpeak)[colnames(Qpeak)=="peak_dt"] <- "Date"
+colnames(Qpeak)[colnames(Qpeak)=="peak_va"] <- "Peak"
+Qpeak$wy <- smwrBase::waterYear(Qpeak$Date)
The data have now been simplified so that can be used more easily in the subsequent flood frequency analysis. Data should always be plotted, which can be done many ways. As a demonstration of highlighting specific years in a barplot, the strongest El Niño years (in 1930-2002) from NOAA Physical Sciences Lab can be highlighted in red.
-xlbl <- "Water Year"
-ylbl <- expression("Peak Flow, " ~ ft^{3}/s)
-nino_years <- c(1983,1998,1992,1931,1973,1987,1941,1958,1966, 1995)
-cols <- c("blue", "red")[(Qpeak$wy %in% nino_years) + 1]
-barplot(Qpeak$Peak, names.arg = Qpeak$wy, xlab = xlbl, ylab=ylbl, col=cols)
xlbl <- "Water Year"
+ylbl <- expression("Peak Flow, " ~ ft^{3}/s)
+nino_years <- c(1983,1998,1992,1931,1973,1987,1941,1958,1966, 1995)
+cols <- c("blue", "red")[(Qpeak$wy %in% nino_years) + 1]
+barplot(Qpeak$Peak, names.arg = Qpeak$wy, xlab = xlbl, ylab=ylbl, col=cols)
With those calculated, a defined return period can be chosen and the flood frequency factors, from Equation (10.1), calculated for that return period (the example here is for a 50-year return period). The qnorm function from base R and the qpearsonIII function from the smwrBase package make this straightforward, and K values for Equation (10.1) are obtained for a lognormal, Knorm, and LP-III, Klp3.
- +Now the flood frequency equation (10.1) can be applied to calculate the flows associated with the 50-year return period for each of the distributions. Remember to take the anti-log of your answer to return to standard units.
- +While not necessary, to add probabilities to the annual flow sequence we will create a new data frame consisting of the observed peak flows, sorted in descending order.
- +This can be done with fewer commands, but here is an example where first a rank column is created (1=highest peak in the record of N years), followed by adding columns for the exceedance and non-exceedence probabilities:
-df_pp$rank <- as.integer(seq(1:length(df_pp$Obs_peak)))
-df_pp$exc_prob <- (df_pp$rank/(1+length(df_pp$Obs_peak)))
-df_pp$ne_prob <- 1-df_pp$exc_prob
df_pp$rank <- as.integer(seq(1:length(df_pp$Obs_peak)))
+df_pp$exc_prob <- (df_pp$rank/(1+length(df_pp$Obs_peak)))
+df_pp$ne_prob <- 1-df_pp$exc_prob
For each of the non-exceedance probabilities calculated for the observed peak flows, use the flood frequency equation (10.1) to estimate the peak flow that would be predicted by a lognormal or LP-III distribution. This is the same thing that was done above for a specified return period, but now it will be “applied” to an entire column.
-df_pp$LN_peak <- mapply(function(x) {10^(mn+std*qnorm(x))}, df_pp$ne_prob)
-df_pp$LP3_peak <- mapply(function(x) {10^(mn+std*smwrBase::qpearsonIII(x, skew=g))},df_pp$ne_prob)
df_pp$LN_peak <- mapply(function(x) {10^(mn+std*qnorm(x))}, df_pp$ne_prob)
+df_pp$LP3_peak <- mapply(function(x) {10^(mn+std*smwrBase::qpearsonIII(x, skew=g))},df_pp$ne_prob)
There are many packages that create probability plots (see, for example, the versatile scales package for ggplot2). For this example the USGS smwrGraphs package is used. First, for aesthetics, create x- and y- axis labels.
- +The smwrGraphs package works most easily if it writes output directly to a file, a PNG file in this case, using the setPNG command; the file name and its dimensions in inches are given as arguments, and the PNG device is opened for writing. This is followed by commands to plot the data on a graph. Technically, the data are plotted to an object here is called prob.pl. The probPlot command plots the observed peaks as points, where the alpha argument is the a in Equation (10.2). Additional points or lines are added with the addXY command, used here to add the LN and LP3 data as lines (one solid, one dashed). Finally, a legend is added (the USGS refers to that as an “Explanation”), and the output PNG file is closed with the dev.off() command.
-smwrGraphs::setPNG("probplot_smwr.png",6.5, 3.5)
-#> width height
-#> 6.5 3.5
-#> [1] "Setting up markdown graphics device: probplot_smwr.png"
-prob.pl <- smwrGraphs::probPlot(df_pp$Obs_peak, alpha = 0.0, Plot=list(what="points",size=0.05,name="Obs"), xtitle=xlbl, ytitle=ylbl)
-prob.pl <- smwrGraphs::addXY(df_pp$ne_prob,df_pp$LN_peak,Plot=list(what="lines",name="LN"),current=prob.pl)
-prob.pl <- smwrGraphs::addXY(df_pp$ne_prob,df_pp$LP3_peak,Plot=list(what="lines",type="dashed",name="LP3"),current=prob.pl)
-smwrGraphs::addExplanation(prob.pl,"ul",title="")
-dev.off()
-#> png
-#> 2
smwrGraphs::setPNG("probplot_smwr.png",6.5, 3.5)
+#> width height
+#> 6.5 3.5
+#> [1] "Setting up markdown graphics device: probplot_smwr.png"
+prob.pl <- smwrGraphs::probPlot(df_pp$Obs_peak, alpha = 0.0, Plot=list(what="points",size=0.05,name="Obs"), xtitle=xlbl, ytitle=ylbl)
+prob.pl <- smwrGraphs::addXY(df_pp$ne_prob,df_pp$LN_peak,Plot=list(what="lines",name="LN"),current=prob.pl)
+prob.pl <- smwrGraphs::addXY(df_pp$ne_prob,df_pp$LP3_peak,Plot=list(what="lines",type="dashed",name="LP3"),current=prob.pl)
+smwrGraphs::addExplanation(prob.pl,"ul",title="")
+dev.off()
+#> png
+#> 2
Example 9.1 Fit a Horton infiltration model to measured infiltration data, then use the derived parameters to plot the infiltration rate over time.
First, for this example, read in the data sample included with the hydromisc
package.
# path to sample spreadsheet
-datafile <- system.file("extdata", "infilt_field_data.xlsx", package="hydromisc")
-infilt_data <- readxl::read_excel(datafile)
# path to sample spreadsheet
+datafile <- system.file("extdata", "infilt_field_data.xlsx", package="hydromisc")
+infilt_data <- readxl::read_excel(datafile)
Set up a function for the Horton equation and use the built in stats
package function nls
to solve for the parameters. This works with column names of ‘Time’ and ‘Rate’ (case sensitive).
horton <- function(time, fc, f0, k) {
- fc + (f0 - fc) * exp(-k * time)
-}
-horton_fit <- stats::nls(Rate ~ horton(Time, fc, f0, k), data = infilt_data,
- start = list(fc=0.2, f0=1, k=0.1))
-coef(horton_fit)
-#> fc f0 k
-#> 0.52816668 1.25076319 0.05102829
horton <- function(time, fc, f0, k) {
+ fc + (f0 - fc) * exp(-k * time)
+}
+horton_fit <- stats::nls(Rate ~ horton(Time, fc, f0, k), data = infilt_data,
+ start = list(fc=0.2, f0=1, k=0.1))
+coef(horton_fit)
+#> fc f0 k
+#> 0.52816668 1.25076319 0.05102829
Using poor guesses as starting values can result in error messages like ‘singular gradient’ indicating the nls
function could not find a solution. Trying better guesses for starting parameter values can help: \(f_c\) should be close to the final infiltration rate observed, and \(f_0\) near the initial rate. Another approach is illustrated below.
Plot the results of the fit with the observations.
-infilt_data$Rate_Horton <- predict(horton_fit, list(x = infilt_data$Time))
-plot(infilt_data$Time,infilt_data$Rate, pch=1,
- xlab = "Time, min", ylab = "Infiltration rate, cm/hr")
-lines(infilt_data$Time, infilt_data$Rate_Horton, lty = 1)
-legend("topright", legend = c("Obs.","Horton"), pch = c(1, NA), lty = c(NA, 1))
infilt_data$Rate_Horton <- predict(horton_fit, list(x = infilt_data$Time))
+plot(infilt_data$Time,infilt_data$Rate, pch=1,
+ xlab = "Time, min", ylab = "Infiltration rate, cm/hr")
+lines(infilt_data$Time, infilt_data$Rate_Horton, lty = 1)
+legend("topright", legend = c("Obs.","Horton"), pch = c(1, NA), lty = c(NA, 1))
Using the nlsr package provides a similar function to nls
that can accommodate much worse starting parameter values. Using the starting values below in the nls
command above fails. The use of nlsr
functions is slightly different from nls
, as shown below.
horton_form <- Rate ~ fc + (f0 - fc) * exp(-k * Time)
-horton_fit2 <- nlsr::nlxb(formula = horton_form, data = infilt_data,
- start = list(fc=1, f0=1, k=0.5))
-predicted_values <- as.numeric(predict(horton_fit2, list(Time = infilt_data$Time)))
-horton_fit2$coefficients
-#> fc f0 k
-#> 0.5281621 1.2507571 0.0510268
horton_form <- Rate ~ fc + (f0 - fc) * exp(-k * Time)
+horton_fit2 <- nlsr::nlxb(formula = horton_form, data = infilt_data,
+ start = list(fc=1, f0=1, k=0.5))
+predicted_values <- as.numeric(predict(horton_fit2, list(Time = infilt_data$Time)))
+horton_fit2$coefficients
+#> fc f0 k
+#> 0.5281621 1.2507571 0.0510268
Example 9.2 Fit a Green-Ampt infiltration model to measured infiltration data, then use the derived parameters to plot the infiltration rate over time.
For convenience, this uses the same infiltration data as in the prior example. Because the Green-Ampt formulation needs cumulative infiltration depth, F, first create a column of this in the data frame.
-# Add a column for incremental time
-infilt_data <- infilt_data |> dplyr::mutate(dt = Time - dplyr::lag(Time))
-# Add value to first row
-infilt_data$dt[1] <- infilt_data$Time[1]
-# Add a column for cumulative infiltration
-infilt_data$Cumul_Infilt <- cumsum(infilt_data$Rate * infilt_data$dt)
# Add a column for incremental time
+infilt_data <- infilt_data |> dplyr::mutate(dt = Time - dplyr::lag(Time))
+# Add value to first row
+infilt_data$dt[1] <- infilt_data$Time[1]
+# Add a column for cumulative infiltration
+infilt_data$Cumul_Infilt <- cumsum(infilt_data$Rate * infilt_data$dt)
Because the Green-Ampt model comprises implicit, nonlinear equations they cannot be solved directly, as could be done with the Horton equation. Combining the base uniroot
function for solving an implicit equation with the non-linear fitting capabilities of the Flexible Modelling Environment FME package makes this possible in R.
First, define a function that finds cumulative infiltration, Fp, for a given time and the Green-Ampt parameters K and Ns in (9.5) (rearranged to solve for zero).
-GA_infilt <- function(time, parms) {
- with(as.list(parms), {
- # function to solve for zero
- uniroot(function(Fp) Fp - K*time - Ns * log(1 + Fp/Ns),
- interval = c(0, 1e3),
- tol = 1.e-6)$root
- })
- }
GA_infilt <- function(time, parms) {
+ with(as.list(parms), {
+ # function to solve for zero
+ uniroot(function(Fp) Fp - K*time - Ns * log(1 + Fp/Ns),
+ interval = c(0, 1e3),
+ tol = 1.e-6)$root
+ })
+ }
Now set up the function to optimize the parameters for all times, with the goal to minimize the square root of the squared differences. Any other function could be used to minimize the difference by changing the last line of the function. Then use the FME
modFit
function to find the optimal parameters.
GA_min <- function(ps) {
- K <- ps[1]
- Ns <- ps[2]
- GA_fit <- sapply(infilt_data$Time, GA_infilt, parms = c(K = K, Ns = Ns))
- sqrt((GA_fit - infilt_data$Cumul_Infilt)^2)
-}
-
-#Set initial parameter values and run minimization function for the best parameter set
-parms_start <- c(1, 0.5)
-GA_out <- FME::modFit(f = GA_min, p = parms_start, lower = c(0, 0),
- upper = c(Inf, Inf))
-fitted_parms <- setNames(GA_out$par, c("K","Ns"))
-fitted_parms
-#> K Ns
-#> 0.5656739 4.8610322
GA_min <- function(ps) {
+ K <- ps[1]
+ Ns <- ps[2]
+ GA_fit <- sapply(infilt_data$Time, GA_infilt, parms = c(K = K, Ns = Ns))
+ sqrt((GA_fit - infilt_data$Cumul_Infilt)^2)
+}
+
+#Set initial parameter values and run minimization function for the best parameter set
+parms_start <- c(1, 0.5)
+GA_out <- FME::modFit(f = GA_min, p = parms_start, lower = c(0, 0),
+ upper = c(Inf, Inf))
+fitted_parms <- setNames(GA_out$par, c("K","Ns"))
+fitted_parms
+#> K Ns
+#> 0.5656739 4.8610322
Finally, use the best fit parameters to generate a sequence of cumulative infiltration values using (9.5) and infiltration rate using (9.4), plotting the results along with the constant rate (average) infiltration for the observation period.
-fitted_F <- sapply(infilt_data$Time, GA_infilt, parms = fitted_parms)
-infilt_data$Rate_GA <- fitted_parms[['K']] * (1 + fitted_parms[['Ns']]/fitted_F)
-
-# Make a plot
-plot(infilt_data$Time,infilt_data$Rate, pch=1, xlab = "Time, min", ylab = "Infiltration rate, cm/hr")
-lines(infilt_data$Time, infilt_data$Rate_GA, lty = 2)
-abline(h = tail(infilt_data$Cumul_Infilt, 1)/tail(infilt_data$Time, 1))
-legend("topright", legend = c("Obs.","Green-Ampt","Constant"), pch = c(1, NA, NA), lty = c(NA, 2, 1))
fitted_F <- sapply(infilt_data$Time, GA_infilt, parms = fitted_parms)
+infilt_data$Rate_GA <- fitted_parms[['K']] * (1 + fitted_parms[['Ns']]/fitted_F)
+
+# Make a plot
+plot(infilt_data$Time,infilt_data$Rate, pch=1, xlab = "Time, min", ylab = "Infiltration rate, cm/hr")
+lines(infilt_data$Time, infilt_data$Rate_GA, lty = 2)
+abline(h = tail(infilt_data$Cumul_Infilt, 1)/tail(infilt_data$Time, 1))
+legend("topright", legend = c("Obs.","Green-Ampt","Constant"), pch = c(1, NA, NA), lty = c(NA, 2, 1))
Open water evaporation can also be calculated using a modified version of the Penman-Monteith equation (9.8). In this latter case, vegetation coefficients are not needed, so Equation (9.8) can be used with \(r_s=0\) and \(r_a=251/(1+0.536~u_2)\), following Thom & Oliver, 1977.
-The R package Evaporation has functions to calculate \(ET_0\) using this and many other functions. This is especially useful when calculating PET over many points or through a long time series.
+The R package Evaporation has functions to calculate \(ET_0\) using this and many other functions. This is especially useful when calculating PET over many points or through a long time series. Satellite-observed ET has been estimated globally as part of the OpenET project; the data may be accessed using the openet R package.
Example 9.3 Manually calibrate an index snowmelt model for a SNOTEL site using one year of data.
Visit the SNOTEL to select a site. In this example site 1050, Horse Meadow, located in California, is used. Next download the data using the snotelr package (install the package first, if needed).
- +Plot the data to assess the period available and how complete it is.
-plot(as.Date(snow_data$date), snow_data$snow_water_equivalent,
- type = "l", xlab = "Date", ylab = "SWE (mm)")
plot(as.Date(snow_data$date), snow_data$snow_water_equivalent,
+ type = "l", xlab = "Date", ylab = "SWE (mm)")
Note the units are SI. If you download data directly from the SNOTEL web site the data would be in conventional US units. snotelr converts the data to SI units as it imports. The package includes a function snotel_metric
that could be used to convert raw data downloaded from the SNOTEL website to SI units.
For this example, we will extract a single year, but set the dates to ensure the entire snow season is contained within a year. The snotel_phenology
function makes this easy. By default it uses an offset of 180 days (so years begin 180 days before Jan 1) to accomplish this.
phenology <- snotelr::snotel_phenology(snow_data) # Key dates from the observed snow data
-min(phenology$first_snow_acc) # Earliest snow accumulation in the record
-#> [1] "2004-09-20"
-max(phenology$last_snow_melt) # Latest snow melt in the record
-#> [1] "2022-05-13"
-mean(phenology$max_swe) # The mean of peak SWE for each year, mm
-#> [1] 550.6222
phenology <- snotelr::snotel_phenology(snow_data) # Key dates from the observed snow data
+min(phenology$first_snow_acc) # Earliest snow accumulation in the record
+#> [1] "2004-09-20"
+max(phenology$last_snow_melt) # Latest snow melt in the record
+#> [1] "2022-05-13"
+mean(phenology$max_swe) # The mean of peak SWE for each year, mm
+#> [1] 550.6222
Based on this, for this site we’ll subset the data using 1-Sep to 31-Aug to be sure an entire winter is in one year. In addition, create a data frame that only includes columns that are needed and only include a single year of data.
-snow_data_subset <- subset(snow_data, as.Date(date) >= as.Date("2008-09-01") &
- as.Date(date) <= as.Date("2009-08-31"))
-snow_data_sel <- subset(snow_data_subset, select=c("date", "snow_water_equivalent", "precipitation", "temperature_mean", "temperature_min", "temperature_max"))
-plot(as.Date(snow_data_sel$date),snow_data_sel$snow_water_equivalent,
- type = "l",xlab = "Date", ylab = "SWE (mm)")
snow_data_subset <- subset(snow_data, as.Date(date) >= as.Date("2008-09-01") &
+ as.Date(date) <= as.Date("2009-08-31"))
+snow_data_sel <- subset(snow_data_subset, select=c("date", "snow_water_equivalent", "precipitation", "temperature_mean", "temperature_min", "temperature_max"))
+plot(as.Date(snow_data_sel$date),snow_data_sel$snow_water_equivalent,
+ type = "l",xlab = "Date", ylab = "SWE (mm)")
Start with some assumed values and run the snow model.
-Tmax_snow <- 3
-Tmin_rain <- 2
-kd <- 1
-Tbase <- 0
-snow_estim <- hydromisc::snow.sim(DATA = snow_data_sel,
- Tmax = Tmax_snow,
- Tmin = Tmin_rain,
- kd = kd,
- Tmelt = Tbase)
Tmax_snow <- 3
+Tmin_rain <- 2
+kd <- 1
+Tbase <- 0
+snow_estim <- hydromisc::snow.sim(DATA = snow_data_sel,
+ Tmax = Tmax_snow,
+ Tmin = Tmin_rain,
+ kd = kd,
+ Tmelt = Tbase)
Now the simulated values can be compared to the observations. If not installed already, install the hydroGOF package, which has some useful functions for evaluating how well modeled output fits observations. In the plot that follows we specify three measures of goodness-of-fit:
These are discussed in detail in other references, but the aim is to calibrate (change the input parameters) until these values are low.
-obs <- snow_data_sel$snow_water_equivalent
-sim <- snow_estim$swe_simulated
-hydroGOF::ggof(sim, obs, na.rm = TRUE, dates=snow_data_sel$date,
- gofs=c("MAE", "RMSE", "PBIAS"),
- xlab = "", ylab="SWE, mm",
- tick.tstep="months", cex=c(0,0),lwd=c(2,2))
obs <- snow_data_sel$snow_water_equivalent
+sim <- snow_estim$swe_simulated
+hydroGOF::ggof(sim, obs, na.rm = TRUE, dates=snow_data_sel$date,
+ gofs=c("MAE", "RMSE", "PBIAS"),
+ xlab = "", ylab="SWE, mm",
+ tick.tstep="months", cex=c(0,0),lwd=c(2,2))
While manual model calibration can improve the fit, a more complete calibration involves optimization methods that search the parameter space for the optimal combination of parameter values. The R function optim, part of the stats package installed with base R, is efficient at finding a local minimum or maximum to a function. For most hydrologic models finding the global optimum means searching the entire parameter space, since many local minima and maxima usually exist. Some version of Monte Carlo methods are used to do this.
There are many R packages to perform different variations of Monte Carlo simulations, but a simple implementation can be done manually using the foreach package. In this example the four main model parameters are each varied over 10 evenly spaced values, producing \(10^4\) unique parameters sets that need to be run. For each run, the function will return a value for RMSE.
-# package needed for efficiently looping through all simulations
-library(foreach)
-
-# Create the sets of values of each parameter within a defined range
-p_Tmax <- seq(-1, 3, length.out = 10)
-p_Tmin <- seq(-1, 2, length.out = 10)
-p_kd <- seq(0.5, 5, length.out = 10)
-p_Tbase <- seq(-2, 2, length.out = 10)
-
-# Assemble them into a data frame with all parameter combinations
-p_all <- expand.grid(Tmax = p_Tmax, Tmin = p_Tmin, kd = p_kd, Tmelt = p_Tbase)
-
-# Define the function to return a single test statistic for each simulation
-fcn <- function(datain, Tmax, Tmin, kd, Tmelt){
- snow_estim <- hydromisc::snow.sim(DATA=datain, Tmax=Tmax, Tmin=Tmin, kd=kd, Tmelt=Tmelt)
- calib.stats <- suppressWarnings(hydroGOF::gof(snow_estim$swe_simulated,obs,na.rm=TRUE))
- return(as.numeric(calib.stats['RMSE',]))
-}
# package needed for efficiently looping through all simulations
+library(foreach)
+
+# Create the sets of values of each parameter within a defined range
+p_Tmax <- seq(-1, 3, length.out = 10)
+p_Tmin <- seq(-1, 2, length.out = 10)
+p_kd <- seq(0.5, 5, length.out = 10)
+p_Tbase <- seq(-2, 2, length.out = 10)
+
+# Assemble them into a data frame with all parameter combinations
+p_all <- expand.grid(Tmax = p_Tmax, Tmin = p_Tmin, kd = p_kd, Tmelt = p_Tbase)
+
+# Define the function to return a single test statistic for each simulation
+fcn <- function(datain, Tmax, Tmin, kd, Tmelt){
+ snow_estim <- hydromisc::snow.sim(DATA=datain, Tmax=Tmax, Tmin=Tmin, kd=kd, Tmelt=Tmelt)
+ calib.stats <- suppressWarnings(hydroGOF::gof(snow_estim$swe_simulated,obs,na.rm=TRUE))
+ return(as.numeric(calib.stats['RMSE',]))
+}
Now the function needs to be used to run the snow model for every parameters combination. What is shown below is how to do this using the foreach package in a brute force manner, running all simulations on a single CPU. This takes a while.
-out <- foreach(i = 1:nrow(p_all), .combine='c') %do% {
- fcn(snow_data_sel, p_all[i, "Tmax"], p_all[i, "Tmin"], p_all[i, "kd"], p_all[i, "Tmelt"])
-}
out <- foreach(i = 1:nrow(p_all), .combine='c') %do% {
+ fcn(snow_data_sel, p_all[i, "Tmax"], p_all[i, "Tmin"], p_all[i, "kd"], p_all[i, "Tmelt"])
+}
Because the number of simulations can become very large, this command could be adapted to run the simulations in parallel using multiple cores by using the R package doParallel, which loads the parallel
package (part of the base R installation). By running the foreach simulations with multiple processes, the run time using 7 of 8 available cores dropped from 5.9 minutes for a single CPU to 1.1 minutes. Aside from setting up the cluster before the command and stopping it after (essential for freeing up resources), the only change is the use of %dopar% instead of %do%.
library(doParallel)
-no_cores <- detectCores() - 1
-registerDoParallel(cores=no_cores)
-cl <- makeCluster(no_cores, type="PSOCK") # On mac or linux you can change type to "FORK"
-out <- foreach(i = 1:nrow(p_all), .combine='c') %dopar% {
- fcn(snow_data_sel, p_all[i, "Tmax"], p_all[i, "Tmin"], p_all[i, "kd"], p_all[i, "Tmelt"])
-}
-stopCluster(cl)
library(doParallel)
+no_cores <- detectCores() - 1
+registerDoParallel(cores=no_cores)
+cl <- makeCluster(no_cores, type="PSOCK") # On mac or linux you can change type to "FORK"
+out <- foreach(i = 1:nrow(p_all), .combine='c') %dopar% {
+ fcn(snow_data_sel, p_all[i, "Tmax"], p_all[i, "Tmin"], p_all[i, "kd"], p_all[i, "Tmelt"])
+}
+stopCluster(cl)
It can be helpful to look at the four parameters for the simulations that had the lowest RMSE. This presumes the tidyverse package has been installed.
-# Combine results with parameters, sort by RMSE (ascending)
-out_all <- cbind(p_all, RMSE=out) |> dplyr::arrange(RMSE)
-
-library(ggplot2)
-# Plot the parameters for the 10 best runs
-dplyr::slice(out_all, 1:10) |>
- tidyr::pivot_longer(-RMSE, names_to = 'param', values_to = 'value') |>
- ggplot(aes(x = RMSE, y = value)) +
- geom_point() +
- facet_wrap(vars(param), scales = "free")
# Combine results with parameters, sort by RMSE (ascending)
+out_all <- cbind(p_all, RMSE=out) |> dplyr::arrange(RMSE)
+
+library(ggplot2)
+# Plot the parameters for the 10 best runs
+dplyr::slice(out_all, 1:10) |>
+ tidyr::pivot_longer(-RMSE, names_to = 'param', values_to = 'value') |>
+ ggplot(aes(x = RMSE, y = value)) +
+ geom_point() +
+ facet_wrap(vars(param), scales = "free")
There are several occurrences of equal RMSE values, so some points are superimposed on one another. Some parameter values are consistent for all of the 10 best simulations: kd is 2.5 and Tmelt is 2\(^\circ\)C, but optimal values can be achieved with a variety of different combinations of Tmax and Tmin. The results using the optimal parameters can be plotted to visualize the simulation.
-Tmax_opt <- max(out_all$Tmax[1],out_all$Tmin[1])
-Tmin_opt <- min(out_all$Tmax[1],out_all$Tmin[1])
-kd_opt <- out_all$kd[1]
-Tmelt_opt <- out_all$Tmelt[1]
-
-snow_estim_opt <- hydromisc::snow.sim(DATA=snow_data_sel,Tmax=Tmax_opt,
- Tmin=Tmin_opt, kd=kd_opt, Tmelt=Tmelt_opt)
-sim_opt <- snow_estim_opt$swe_simulated
-hydroGOF::ggof(sim_opt, obs, na.rm = TRUE, dates=snow_data_sel$date,
- gofs=c("MAE", "RMSE", "PBIAS"),
- xlab = "", ylab="SWE, mm",
- tick.tstep="months", cex=c(0,0),lwd=c(2,2))
Tmax_opt <- max(out_all$Tmax[1],out_all$Tmin[1])
+Tmin_opt <- min(out_all$Tmax[1],out_all$Tmin[1])
+kd_opt <- out_all$kd[1]
+Tmelt_opt <- out_all$Tmelt[1]
+
+snow_estim_opt <- hydromisc::snow.sim(DATA=snow_data_sel,Tmax=Tmax_opt,
+ Tmin=Tmin_opt, kd=kd_opt, Tmelt=Tmelt_opt)
+sim_opt <- snow_estim_opt$swe_simulated
+hydroGOF::ggof(sim_opt, obs, na.rm = TRUE, dates=snow_data_sel$date,
+ gofs=c("MAE", "RMSE", "PBIAS"),
+ xlab = "", ylab="SWE, mm",
+ tick.tstep="months", cex=c(0,0),lwd=c(2,2))
Once a reasonable calibration is obtained, the effect of increasing temperatures on SWE can be simulated by including the deltaT argument in the hydromisc::snow.sim command. Here a 3\(^\circ\)C uniform temperature increase is imposed on the optimal parameterization above.
-dT <- 3.0
-snow_plus3 <- hydromisc::snow.sim(DATA=snow_data_sel,
- Tmax=Tmax_opt, Tmin=Tmin_opt, kd=kd_opt, Tmelt=Tmelt_opt,
- deltaT = dT)
-simplusdT <- snow_plus3$swe_simulated
-# plot the results
-dTlegend <- expression("Simulated"*+3~degree*C)
-plot(as.Date(snow_data_sel$date),obs,type = "l",xlab = "", ylab = "SWE (mm)")
-lines(as.Date(snow_estim$date),sim_opt,lty=2,col="blue")
-lines(as.Date(snow_estim$date),simplusdT,lty=3,col="red")
-legend("topright", legend = c("Observed", "Simulated",dTlegend),
- lty = c(1,2,3), col=c("black","blue","red"))
-grid()
dT <- 3.0
+snow_plus3 <- hydromisc::snow.sim(DATA=snow_data_sel,
+ Tmax=Tmax_opt, Tmin=Tmin_opt, kd=kd_opt, Tmelt=Tmelt_opt,
+ deltaT = dT)
+simplusdT <- snow_plus3$swe_simulated
+# plot the results
+dTlegend <- expression("Simulated"*+3~degree*C)
+plot(as.Date(snow_data_sel$date),obs,type = "l",xlab = "", ylab = "SWE (mm)")
+lines(as.Date(snow_estim$date),sim_opt,lty=2,col="blue")
+lines(as.Date(snow_estim$date),simplusdT,lty=3,col="red")
+legend("topright", legend = c("Observed", "Simulated",dTlegend),
+ lty = c(1,2,3), col=c("black","blue","red"))
+grid()
2024-05-11
+2024-05-21
An R package useful for solving linear programming problems is the lpSolveAPI package. Install that if necessary, and also install the knitr and kableExtra packages, since thet are very useful for printing the many tables that linear programming involves.
Begin by creating an empty linear model. The (0,2) means zero constraints (they’ll be added later) and 2 decision variables. The next two lines just assign names to the decision variables. Because we will use many functions of the lpSolveAPI package, load the library first. Load the kableExtra package too.
-library(lpSolveAPI)
-library(kableExtra)
-example.lp <- lpSolveAPI::make.lp(0,2) # 0 constraints and 2 decision variables
-ColNames <- c("X1","X2")
-colnames(example.lp) <- ColNames # Set the names of the decision variables
library(lpSolveAPI)
+library(kableExtra)
+example.lp <- lpSolveAPI::make.lp(0,2) # 0 constraints and 2 decision variables
+ColNames <- c("X1","X2")
+colnames(example.lp) <- ColNames # Set the names of the decision variables
Now set up the objective function. Minimization is the default goal of this R function, but we’ll set it anyway to be clear. The second argument is the vector of coefficients for the decision variables, meaning X2 is minimized.
-set.objfn(example.lp,c(0,1))
-x <- lp.control(example.lp, sense="min") #save output to a dummy variable
set.objfn(example.lp,c(0,1))
+x <- lp.control(example.lp, sense="min") #save output to a dummy variable
The next step is to define the constraints. Four constraints were listed above. Additional constraints could be added that \(X1\ge 0\) and \(X2\ge 0\), however, variable ranges in this LP solver are [0,infinity] by default, so for this example and we do not need to include constraints for positive results. If necessary, decision variable valid ranges can be set using set.bounds().
Constraints are defined with the add.constraint command. Figure 12.4 provides an annotated example of the use of an add.constraint command.Type ?add.constraint in the console for additional details.
The four constraints for this example are added with:
-add.constraint(example.lp, xt=c(400,-300), type="<=", rhs=0, indices=c(1,2))
-add.constraint(example.lp, xt=c(1,1), type=">=", rhs=7500)
-add.constraint(example.lp, xt=c(1,0), type="<=", rhs=4000)
-add.constraint(example.lp, xt=c(0,1), type="<=", rhs=7500)
add.constraint(example.lp, xt=c(400,-300), type="<=", rhs=0, indices=c(1,2))
+add.constraint(example.lp, xt=c(1,1), type=">=", rhs=7500)
+add.constraint(example.lp, xt=c(1,0), type="<=", rhs=4000)
+add.constraint(example.lp, xt=c(0,1), type="<=", rhs=7500)
That completes the setup of the linear model. You can view the model to verify the values you entered by typing the name of the model.
-example.lp
-#> Model name:
-#> X1 X2
-#> Minimize 0 1
-#> R1 400 -300 <= 0
-#> R2 1 1 >= 7500
-#> R3 1 0 <= 4000
-#> R4 0 1 <= 7500
-#> Kind Std Std
-#> Type Real Real
-#> Upper Inf Inf
-#> Lower 0 0
example.lp
+#> Model name:
+#> X1 X2
+#> Minimize 0 1
+#> R1 400 -300 <= 0
+#> R2 1 1 >= 7500
+#> R3 1 0 <= 4000
+#> R4 0 1 <= 7500
+#> Kind Std Std
+#> Type Real Real
+#> Upper Inf Inf
+#> Lower 0 0
If it has a large number of decision variables it only prints a summary, but in that case you can use write.lp(example.lp, "example_lp.txt", "lp")
to create a viewable file with the model.
Now the model can be solved.
- +If the solver finds an optimal solution it will return a zero.
View the final value of the objective function by retrieving it and printing it:
-optimal_solution <- get.objective(example.lp)
-print(paste0("Optimal Solution = ",round(optimal_solution,2),sep=""))
-#> [1] "Optimal Solution = 4285.71"
optimal_solution <- get.objective(example.lp)
+print(paste0("Optimal Solution = ",round(optimal_solution,2),sep=""))
+#> [1] "Optimal Solution = 4285.71"
For more detail, recover the values of each of the decision variables.
- +Next you can print the sensitivity report – a vector of M constraints followed by N decision variables. It helps to create a data frame for viewing and printing the results. Nicer printing is achieved using the kable and kableExtra functions.
-sens <- get.sensitivity.obj(example.lp)$objfrom
-results1 <- data.frame(variable=ColNames,value=vars,gradient=as.integer(sens))
-kbl(results1, booktabs = TRUE) %>% kable_styling(full_width = F)
sens <- get.sensitivity.obj(example.lp)$objfrom
+results1 <- data.frame(variable=ColNames,value=vars,gradient=as.integer(sens))
+kbl(results1, booktabs = TRUE) %>% kable_styling(full_width = F)
The above shows decision variable values for the optimal solution. The Gradient is the change in the objective function for a unit increase in the decision variable. Here a negative gradient for decision variable \(X1\), the groundwater withdrawal, means that increasing the groundwater withdrawal will have a negative effect on the objective function, (to minimize \(X2\)): that is intuitive, since increasing groundwater withdrawal can reduce reservoir supply on a one-to-one basis.
To look at which constraints are binding, retrieve the $duals part of the output.
-m <- length(get.constraints(example.lp)) #number of constraints
-duals <- get.sensitivity.rhs(example.lp)$duals[1:m]
-results2 <- data.frame(constraint=c(seq(1:m)),multiplier=duals)
-kbl(results2, booktabs = TRUE) %>% kable_styling(full_width = F)
m <- length(get.constraints(example.lp)) #number of constraints
+duals <- get.sensitivity.rhs(example.lp)$duals[1:m]
+results2 <- data.frame(constraint=c(seq(1:m)),multiplier=duals)
+kbl(results2, booktabs = TRUE) %>% kable_styling(full_width = F)