-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve documentation for tidy methods #1230
Conversation
#' | ||
#' \describe{ | ||
#' \item{terms}{character, the selectors or variables selected} | ||
#' \item{value}{character, the location of the cuts} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would think this should be numeric. but it results characters. related a bit to #1229
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have not had a chance to review fully, so please forgive if I'm missing something, but it seems there's a lot of duplication here. With an eye for maintainability, would we consider writing this output programmatically?
I'd imagine a dictionary containing common definitions of output columns and a function that returns the Rd for a set of column names, with the option to override definitions for new columns or columns with exceptions to usual definitions. broom does this for tidier methods. This would allow us to iterate more quickly on column definitions.
there is some duplication, e.i. #> # A tibble: 22 × 2
#> value n
#> <chr> <int>
#> 1 character, the feature names 2
#> 2 numeric, the lambda estimate 2
#> 3 numeric, value of loading 2
#> 4 character, `rename` expression 1
#> 5 character, expression passed to `mutate()` 1
#> 6 character, the factor levels that is used for the new value 1
#> 7 character, the location of the cuts 1
#> 8 character, the mode value 1
#> 9 character, the value of `ref_level` 1
#> 10 list, a _list column_ with the conversion key 1
#> # ℹ 12 more rows
#> # ℹ Use `print(n = ...)` to see more rows
|
Would you be game to post the code you used to make that table? |
We could do something like this: #' ```{r, echo = FALSE, results="asis"}
#' args <- list(
#' statistic = "character, name of statistic, mean or sd",
#' value = "numeric, value statistic"
#' )
#' result <- tidy_documents(args)
#' cat(result)
#' ``` which would replace: #' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble returned with 4
#' columns `terms`, `statistic`, `value` and `id`:
#'
#' \describe{
#' \item{terms}{character, the selectors or variables selected}
#' \item{statistic}{character, name of statistic, mean or sd}
#' \item{value}{numeric, value statistic}
#' \item{id}{character, id of this step}
#' } What do you think? |
library(tidyverse)
fs::dir_ls("R") |>
map(read_lines) |>
unlist() |>
str_subset("\\item\\{value\\}") |>
str_remove(".{18}") |>
str_sub(end = -2) |>
tibble() |>
set_names("value") |>
count(value, sort = TRUE) |
Okay, this is convincing enough for me that we need not add any machinery here. :) fs::dir_ls("R") |>
map(read_lines) |>
unlist() |>
str_subset("\\item\\{") |>
str_sub(end = -2) |>
tibble() %>%
rename(text = 1) %>%
separate(text, c("column", "description"), "\\}\\{") %>%
mutate(column = str_replace(column, fixed("#' \\item{"), "")) %>%
group_by(column) %>%
count(description, sort = TRUE)
#> # A tibble: 81 × 3
#> # Groups: column [44]
#> column description n
#> <chr> <chr> <int>
#> 1 id character, id of this step 92
#> 2 terms character, the selectors or variables selected 88
#> 3 columns character, names of resulting columns 3
#> 4 component character, name of component 3
#> 5 class character, name of class variable 2
#> 6 value character, the feature names 2
#> 7 value numeric, the lambda estimate 2
#> 8 value numeric, value of loading 2
#> 9 base numeric, value for the base 1
#> 10 class character, name of the class 1
#> # ℹ 71 more rows Would be nice just for those two columns but cumbersome for the rest. I'll defer to you on whether you want to try |
lets try without the refactor 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bravo! These changes are definitely an improvement. Here's a first pass at review. :)
At first, I tried to navigate to these docs via e.g. ?tidy.step_harmonic
rather than ?step_harmonic
. The former goes to tidy.recipe
which seems less likely to be what the user is searching for than step_harmonic
's docs. Should we redirect that alias there?
Right now, all the tidy methods link to the same place, I just learned that you could do We could split out the tidy documentation to their own. It would be to be a page for each tidy method. And we should adequately get linked from the main step. Frankly the tidy documentation isn't great. I'm afraid that this would cause more duplication then what its worth. But I also don't think people use the |
Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wooo! Thumbs up from me to merge once you feel comments have been addressed. I think these are a big improvement.
Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>
This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue. |
To close #936
This PR does a couple of things:
id
column (missing for all steps!!!)