Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Samples data #741

Open
wants to merge 34 commits into
base: develop
Choose a base branch
from
Open

Conversation

ldecicco-USGS
Copy link
Collaborator

No description provided.

seeing how readme gets rendered without image (we'll try to get it back though!)
…gered manually. I think there are some rate-limiting features going on in the CI job
Merge branch 'main' of https://github.com/DOI-USGS/dataRetrieval into samples_data

# Conflicts:
#	R/readWQPdata.R
Copy link
Contributor

@ehinman ehinman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool MR! Left some comments, but generally find the new functions and vignette easy to follow and use.

Weird observation, but I ran this:
test <- read_USGS_samples(pointLocationLatitude = 43.1206415, pointLocationLongitude = -89.3691616, pointLocationWithinMiles = 15, activityStartDateLower = "2024-10-01")

And it failed. Then I looked at the vignette and matched the vignette's example:
test <- read_USGS_samples(pointLocationLatitude = 43.1206415, pointLocationLongitude = -89.3691616, pointLocationWithinMiles = 15, activityStartDateLower = "2024-10-01", dataType = "Monitoring locations", dataProfile = "Site")

And it worked. Then I tried the first query again, and it worked. Not sure if the ordering of the queries matters at all, but I found it peculiar enough to share. Could have totally been coincidence.



context("samples-data samples")
test_that("samples-data samples working", {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
test_that("samples-data samples working", {
test_that("samples-data activities working", {


expect_true(all(c(
"Org_Identifier",
"Org_FormalName"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming you pick these columns because they are the first ones in the returned df if completed successfully/as expected?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. Could be more thorough, but that's it.

# no_sf = TRUE,
# warn = TRUE)
# )
# })
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why commented out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NLDI recently switched to api.gov (something like that at least). That means there's now rate limited...which means if I hit the same request a bunch of times, it errors out. We're going to have to figure out how to deal with this eventually (both for NLDI and all future new services)....but that's a problem for another day. In the meantime, Dave recommended commenting this out for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ldecicco-USGS can we create an Issue to track this? Just to make sure future selves remember.

#' dataType = "Projects")
#'
#' }
read_USGS_samples <- function(monitoringLocationIdentifier = NA,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but suggest naming file read_USGS_samples.R.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, decided on the snake case after starting the work. Renaming...

#' See also: \url{https://api.waterdata.usgs.gov/samples-data/docs}.
#'
#' @param monitoringLocationIdentifier A monitoring location identifier has two parts: the agency code
#' and the location number. Location identifiers should be separated with commas,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#' and the location number. Location identifiers should be separated with commas,
#' and the location number, separated by a dash (-). Location identifiers should be separated with commas,


## Return data types

There are 2 arguments that dictate what kind of data is returned: dataType and dataProfile. The "dataType" argument define what kind of data comes back, and the "dataProfile" define what columns from that type come back.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
There are 2 arguments that dictate what kind of data is returned: dataType and dataProfile. The "dataType" argument define what kind of data comes back, and the "dataProfile" define what columns from that type come back.
There are 2 arguments that dictate what kind of data is returned: dataType and dataProfile. The "dataType" argument defines what kind of data comes back, and the "dataProfile" defines what columns from that type come back.

#' with pointLocationLatitude and pointLocationLongitude
#' @param dataType Options include: "Results", "Monitoring locations", "Activities",
#' "Projects", and "Organizations".
#' @param dataProfile Profile depends on type. Options for "Results" dataType are:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing some? From Samples API: Available values: fullphyschem, basicphyschem, fullbio, basicbio, narrow, resultdetectionquantitationlimit, labsampleprep, count

I guess you're matching what's here: https://waterdata.usgs.gov/download-samples/#dataProfile=fullphyschem ?

#' "Site" and "Count". Options for "Activities" are "Sample Activities",
#' "Activity Metrics", "Activity Groups", "Count". Options for "Projects" are:
#' "Project" and "Project Monitoring Location Weight". Options for "Organizations" are:
#' "Organization" and "Count".
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate matching the services and profiles with the Samples download interface, but I also would prefer to simply match the actual service names as outlined here: https://api.waterdata.usgs.gov/samples-data/docs#/ It's sometimes onerous to remember what is capitalized, what is not, where there are spaces, etc. I admit some impatience here, but I'd prefer the inputs to match what actually shows up in the URL generated at the bottom of the Samples GUI. That also saves you a little bit of effort in the code as well.

#' @param characteristic Characteristic is a specific category describing the sample.
#' See available options by running
#' \code{check_param("characteristics")$characteristicName}.
#' @param stateFips State query parameter. To get a list of available state fips,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a nice example of this in the vignette, but would be helpful to show the general format here as well: "FIPs codes for states take the format: CountryAbbrev:StateNumber, like US:55 for Wisconsin." I got a little turned around trying to write my own function query. First I went to check_params("state") and then wasn't sure why I'd also need to do stateCdLookup(), but then realized I needed the correct fips format, which can be extracted using stateCdLookup(), but I needed to look at the help to understand that the correct function parameterization is stateCdLookup("WI", "fips").

#' \code{stateCdLookup}.
#' @param countyFips County query parameter. To get a list of available counties,
#' run \code{check_param("counties")}. The "Fips" can be created using the function
#' \code{countyCdLookup}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a nice example of this in the vignette, but same general comment as above, in case someone does not go to the vignette.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

p.s. in all my muddling through this, the errors were very helpful, and I eventually got to where I needed. So that is definitely a nice aspect of this whole system of helper functions and lookup tables!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants