Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle excel sheets #190 #191

Merged
merged 11 commits into from
May 13, 2020
Merged

handle excel sheets #190 #191

merged 11 commits into from
May 13, 2020

Conversation

boshek
Copy link
Collaborator

@boshek boshek commented May 11, 2020

Here is what happens at the moment:

library(bcdata)
#> Attaching package: 'bcdata'
#> The following object is masked from 'package:stats':
#> 
#>     filter

bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6')
#> Reading the data using the read_xlsx function from the readxl package.
#> 
#> This .xlsx resource contains the following sheets: 
#>  'Notes'
#>  'Regional Districts'
#>  'Single Detached'
#>  'Multi Unit Homes'
#>  'Purpose Built Rental'
#> Defaulting to the 'Notes' sheet. See ?bcdc_get_data for examples on how to specify a sheet.
#> # A tibble: 0 x 0


bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6', sheet = "Single Detached")
#> Reading the data using the read_xlsx function from the readxl package.
#> 
#> This .xlsx resource contains the following sheets: 
#>  'Notes'
#>  'Regional Districts'
#>  'Single Detached'
#>  'Multi Unit Homes'
#>  'Purpose Built Rental'
#> Defaulting to the 'Notes' sheet. See ?bcdc_get_data for examples on how to specify a sheet.
#> New names:
#> * `` -> ...2
#> * `` -> ...3
#> * `` -> ...4
#> * `` -> ...5
#> * `` -> ...6
#> # A tibble: 162 x 6
#>    `Single Detached Homes by~ ...2      ...3  ...4  ...5  ...6                  
#>    <chr>                      <chr>     <chr> <chr> <lgl> <chr>                 
#>  1 Community                  Single D~ <NA>  <NA>  NA    Note:                 
#>  2 <NA>                       2016      2017  2018  NA    * For privacy reasons~
#>  3 100 Mile House             25        16    19    NA    **                    
#>  4 Abbotsford                 336       252   270   NA    Central Saanich inclu~
#>  5 Alert Bay                  *         *     *     NA    Delta includes Tsawwa~
#>  6 Anmore                     18        18    6     NA    Kent includes Agassiz~
#>  7 Armstrong                  35        43    34    NA    Langley includes Lang~
#>  8 Ashcroft                   *         *     *     NA    North Cowichan includ~
#>  9 Barriere                   *         *     12    NA    North Vancouver inclu~
#> 10 Belcarra                   9         *     5     NA    West  Kelowna include~
#> # ... with 152 more rows

Created on 2020-05-11 by the reprex package (v0.3.0)

I think that if a user specifies the sheet argument that bcdc_get_data should be silent. We are trying to warn those who don't know about this. What do you think @ateucher ?

@boshek boshek added the enhancement New feature or request label May 11, 2020
@ateucher
Copy link
Collaborator

It's a thing of beauty - great work! Yes, I think for sure we want to suppress the warning if the sheet argument is passed

@boshek
Copy link
Collaborator Author

boshek commented May 12, 2020

I hadn't meant to use that language but maybe it should be a warning rather than a message?

@stephhazlitt
Copy link
Member

Is it still handy for the user to have have the other sheets printed to the console? e.g. everything else except Defaulting to ...

@boshek
Copy link
Collaborator Author

boshek commented May 12, 2020

My thinking is that if someone is already specifying a sheet they likely are aware that multiple sheets exist. That sheet in that resource in that record is the end point of their data discovery journey. At that point it feels we should keep quiet in the console since they don't need any more help. I am just conscious that bcdata is already quite chatty.

@ateucher
Copy link
Collaborator

I think both points are good... I think one point in favour of Steph's argument is that if you want a different sheet you need to run the function again without the sheet argument, so could be extra function calls just for that. We could think about adding the info to bcdc_get_record() but that might be one step too far into a bit of a corner case...

@boshek
Copy link
Collaborator Author

boshek commented May 12, 2020

I do like the idea of providing users some way of accessing the names of the sheets. Would a function be overkill? bcdc_excel_sheets? If folks have to iterate over a file like that it would be good to provide them with that vector.

@boshek boshek marked this pull request as ready for review May 12, 2020 21:06
@boshek
Copy link
Collaborator Author

boshek commented May 12, 2020

I think the conversation of providing a method to programmatically access sheet names could be deferred to a) the first time a user requests it or b) how the process more fully fleshed out during the writing of #192. With that in mind, maybe we could merge this to provide basic functionality?

@boshek boshek requested review from ateucher and stephhazlitt May 12, 2020 21:08
@boshek
Copy link
Collaborator Author

boshek commented May 12, 2020

/document

@stephhazlitt
Copy link
Member

I wonder about expanding bcdc_tidy_resources?

Copy link
Collaborator

@ateucher ateucher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one little comment, otherwise looks great!

R/utils.R Outdated Show resolved Hide resolved
Co-authored-by: Andy Teucher <andy.teucher@gov.bc.ca>
@boshek
Copy link
Collaborator Author

boshek commented May 13, 2020

@stephhazlitt I like that idea too. The problem is that there is no way to know which sheets are present without downloading each excel file. If we had a record like this we'd have to download each file just to generate that table. Then we would have to redownload when we choose the one we wanted. Seems like a lot of downloading. It really highlights why excel files with multiple sheets aren't ideal.

@ateucher
Copy link
Collaborator

Following from my last question, the original error message is swallowed by tryCatch, so I added it back in c1a1d80. Now looks like:

> bcdc_get_data('8620ce82-4943-43c4-9932-40730a0255d6', sheet = "foo")
Reading the data using the read_xlsx function from the readxl package.
Error: Reading the data set failed with the following error message:
  Error: Sheet 'foo' not found

The file can be found here:
  '/var/folders/2w/x5wq73f93yzgm7hjr_b_54q00000gp/T//RtmpVj3ASz/bcdata_cf8f2aea4253/filecf8f79a86d27.xlsx'
if you would like to try to read it manually.

@stephhazlitt
Copy link
Member

I agree with merging this in now @boshek, good call.

I still like the idea that bcdc_get_data() prints the sheet names even when the user passes a sheet name (and thus likely knows the .xlsx file design). But I am chatty, and especially enjoying chatting w/ my console these days. 🤣

@boshek boshek changed the title handle excel sheets WIP #190 handle excel sheets #190 May 13, 2020
@boshek boshek merged commit ad4c686 into master May 13, 2020
@boshek boshek deleted the xl_sheets branch May 13, 2020 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants