Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle excel sheets #190 #191

Merged
merged 11 commits into from
May 13, 2020
6 changes: 5 additions & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,14 @@
# bcdata (development version)

### IMPROVEMENTS
* Geometry predicates can now take a `bbox` object as well as an `sf*` object (#176)
* Rename `selectable` column from `bcdc_describe_feature` to `sticky` and modify corresponding docs and tests (#180)
* Fix `select`, `filter` and `mutate` roxygen so that bcdata specific documentation to these methods is available
* Add `head` and `tail` methods for `bcdc.promise` objects. Thanks to @hgriesbauer for the suggestion! (#182, #186)
* Provide `as_tibble` as an alias for `collect` in line with `dbplyr` behaviour (#166)
* When reading in excel files, `bcdc_get_data` now outputs a messages indicating the presence and names of any sheets (#190)

### BUG FIXES
* Fix `select`, `filter` and `mutate` roxygen so that bcdata specific documentation to these methods is available

# bcdata 0.1.2

Expand Down
17 changes: 11 additions & 6 deletions R/get_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -26,14 +26,15 @@
#' @param resource optional argument used when there are multiple data files
#' within the same record. See examples.
#' @param ... arguments passed to other functions. Tabular data is passed to a function to handle
#' the import based on the file extension. `bcdc_read_functions()` provides details on which functions
#' the import based on the file extension. [bcdc_read_functions()] provides details on which functions
#' handle the data import. You can then use this information to look at the help pages of those functions.
#' See the examples for a workflow that illustrates this process.
#' For spatial Web Service data the `...` arguments are passed to `bcdc_query_geodata()`.
#' For spatial Web Service data the `...` arguments are passed to [bcdc_query_geodata()].
#' @param verbose When more than one resource is available for a record,
#' should extra information about those resources be printed to the console?
#' Default `TRUE`
#'
#'
#' @return An object of a type relevant to the resource (usually a tibble or an sf object)
#' @export
#'
Expand All @@ -57,16 +58,20 @@
#' bcdc_get_data('d7e6c8c7-052f-4f06-b178-74c02c243ea4')
#'
#' ## From bcdc_get_record we realize that the data is in xlsx format
#' bcdc_get_record('d7e6c8c7-052f-4f06-b178-74c02c243ea4')
#' bcdc_get_record('8620ce82-4943-43c4-9932-40730a0255d6')
#'
#' ## bcdc_read_functions let's us know that bcdata
#' ## uses readxl::read_excel to import xlsx files
#' bcdc_read_functions()
#'
#' ## If you read the help page for readxl::read_excel,
#' ## it seems likely that we need to skip the first row:
#' bcdc_get_data('d7e6c8c7-052f-4f06-b178-74c02c243ea4', skip = 1)
#' ## bcdata let's you know that this resource has
#' ## multiple worksheets
#' bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6')
#'
#' ## we can control what is read in from an excel file
#' ## using arguments from readxl::read_excel
#'
#' bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6', sheet = 'Regional Districts')
#' }
#'
#' @export
Expand Down
23 changes: 20 additions & 3 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,6 @@ read_from_url <- function(resource, ...){
if (!reported_format %in% formats_supported()) {
stop("Reading ", reported_format, " files is not currently supported in bcdata.")
}

cli <- bcdc_http_client(file_url)

## Establish where to download file
Expand All @@ -220,10 +219,13 @@ read_from_url <- function(resource, ...){
# where that's not the case
message("Reading the data using the ", fun$fun, " function from the ",
fun$package, " package.")
handle_excel(tmp, ...)

tryCatch(do.call(fun$fun, list(tmp, ...)),
error = function(e) {
stop("Could not read data set. The file can be found here:\n '",
tmp, "'\n if you would like to try to read it manually.\n\n",
stop("Reading the data set failed with the following error message:\n\n ", e,
"\nThe file can be found here:\n '",
tmp, "'\nif you would like to try to read it manually.\n",
call. = FALSE)
})
}
Expand Down Expand Up @@ -281,6 +283,21 @@ handle_zip <- function(x) {
files
}

handle_excel <- function(tmp, ...) {
if (!is_filetype(tmp, c("xls", "xlsx"))) {
return(invisible(NULL))
}

sheets <- readxl::excel_sheets(tmp)
if (length(sheets) > 1L) {
message(paste0("\nThis .", tools::file_ext(tmp), " resource contains the following sheets: \n",
paste0(" '", sheets,"'", collapse = "\n")))
if (!methods::hasArg("sheet")) {
message("Defaulting to the '", sheets[1], "' sheet. See ?bcdc_get_data for examples on how to specify a sheet.\n")
}
}
}


unique_temp_dir <- function(pattern = "bcdata_") {
dir <- tempfile(pattern = pattern)
Expand Down
16 changes: 10 additions & 6 deletions man/bcdc_get_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

10 changes: 9 additions & 1 deletion tests/testthat/test-get-data.R
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ test_that("fails informatively when can't read a file", {
skip_on_cran()
expect_error(bcdc_get_data(record = '523dce9d-b464-44a5-b733-2022e94546c3',
resource = '4cc98644-f6eb-410b-9df0-f9b2beac9717'),
"Could not read data set")
"Reading the data set failed with the following error message:")
})

test_that("bcdc_get_data can return the wms resource when it is specified by resource",{
Expand Down Expand Up @@ -156,3 +156,11 @@ test_that("bcdc_get_data fails when >1 resource not specified & noninteractive",
"The record you are trying to access appears to have more than one resource.")
})

test_that("bcdc_get_data handles sheet name specification", {
expect_message(bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6'), 'This .xlsx resource contains the following sheets:')
expect_error(bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6', sheet = "foo"), "Error: Sheet 'foo' not found")
out <- capture.output(bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6', sheet = "Single Detached"), type = 'message')
expect_false(any(grepl('This .xlsx resource contains the following sheets:', out)))

})