improve documentation for tidy methods #1230

EmilHvitfeldt · 2023-10-03T21:20:03Z

To close #936

This PR does a couple of things:

Standardize the tidy section documentation. it uses a list format now
Documents the id column (missing for all steps!!!)
Fixes various mistakes in spelling of column names, which columns are included or what they discribe
Adds documentation for missing steps (11 steps!!)

R/center.R

EmilHvitfeldt · 2023-10-03T21:22:07Z

R/cut.R

+#'
+#' \describe{
+#'   \item{terms}{character, the selectors or variables selected}
+#'   \item{value}{character, the location of the cuts}


you would think this should be numeric. but it results characters. related a bit to #1229

R/discretize.R

R/normalize.R

R/pca.R

R/scale.R

simonpcouch

Have not had a chance to review fully, so please forgive if I'm missing something, but it seems there's a lot of duplication here. With an eye for maintainability, would we consider writing this output programmatically?

I'd imagine a dictionary containing common definitions of output columns and a function that returns the Rd for a set of column names, with the option to override definitions for new columns or columns with exceptions to usual definitions. broom does this for tidier methods. This would allow us to iterate more quickly on column definitions.

EmilHvitfeldt · 2023-10-04T17:43:24Z

there is some duplication, e.i. id is always the same, and terms is also almost the same. But I worry the refactor won't be that nice. For example looking at the value. there are 25 instances of its use, with 22 unique values

#> # A tibble: 22 × 2                                                                                                    
#> value                                                              n
#> <chr>                                                          <int>
#>  1 character, the feature names                                    2
#>  2 numeric, the lambda estimate                                    2
#>  3 numeric, value of loading                                       2
#>  4 character, `rename` expression                                  1
#>  5 character, expression passed to `mutate()`                      1
#>  6 character, the factor levels that is used for the new value     1
#>  7 character, the location of the cuts                             1
#>  8 character, the mode value                                       1
#>  9 character, the value of `ref_level`                             1
#> 10 list, a _list column_ with the conversion key                   1
#> # ℹ 12 more rows
#> # ℹ Use `print(n = ...)` to see more rows

simonpcouch · 2023-10-04T17:46:35Z

Would you be game to post the code you used to make that table?

EmilHvitfeldt · 2023-10-04T17:47:30Z

We could do something like this:

#' ```{r, echo = FALSE, results="asis"}
#' args <- list(
#'  statistic = "character, name of statistic, mean or sd",
#'  value = "numeric, value statistic"
#' )
#' result <- tidy_documents(args)
#' cat(result)
#' ```

which would replace:

#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble returned with 4
#' columns `terms`, `statistic`, `value` and `id`:
#'
#' \describe{
#'   \item{terms}{character, the selectors or variables selected}
#'   \item{statistic}{character, name of statistic, mean or sd}
#'   \item{value}{numeric, value statistic}
#'   \item{id}{character, id of this step}
#' }

What do you think?

EmilHvitfeldt · 2023-10-04T17:47:53Z

e to post the code you used to make that table?

library(tidyverse)

fs::dir_ls("R") |>
  map(read_lines) |>
  unlist() |>
  str_subset("\\item\\{value\\}") |>
  str_remove(".{18}") |>
  str_sub(end = -2) |>
  tibble() |>
  set_names("value") |>
  count(value, sort = TRUE)

simonpcouch · 2023-10-04T18:06:40Z

Okay, this is convincing enough for me that we need not add any machinery here. :)

fs::dir_ls("R") |>
  map(read_lines) |>
  unlist() |>
  str_subset("\\item\\{") |>
  str_sub(end = -2) |>
  tibble() %>%
  rename(text = 1) %>%
  separate(text, c("column", "description"), "\\}\\{") %>%
  mutate(column = str_replace(column, fixed("#'   \\item{"), "")) %>%
  group_by(column) %>%
  count(description, sort = TRUE)
#> # A tibble: 81 × 3                                                                                                          
#> # Groups:   column [44]
#>    column    description                                        n
#>    <chr>     <chr>                                          <int>
#>  1 id        character, id of this step                        92
#>  2 terms     character, the selectors or variables selected    88
#>  3 columns   character, names of resulting columns              3
#>  4 component character, name of component                       3
#>  5 class     character, name of class variable                  2
#>  6 value     character, the feature names                       2
#>  7 value     numeric, the lambda estimate                       2
#>  8 value     numeric, value of loading                          2
#>  9 base      numeric, value for the base                        1
#> 10 class     character, name of the class                       1
#> # ℹ 71 more rows

Would be nice just for those two columns but cumbersome for the rest.

I'll defer to you on whether you want to try tidy_documents() before I review properly, but no pressure from me on making changes before I do so. What do you think?

EmilHvitfeldt · 2023-10-04T18:10:54Z

lets try without the refactor 😄

simonpcouch

Bravo! These changes are definitely an improvement. Here's a first pass at review. :)

At first, I tried to navigate to these docs via e.g. ?tidy.step_harmonicrather than ?step_harmonic. The former goes to tidy.recipe which seems less likely to be what the user is searching for than step_harmonic's docs. Should we redirect that alias there?

NEWS.md

R/BoxCox.R

R/classdist_shrunken.R

R/normalize.R

R/pls.R

R/profile.R

R/other.R

R/relevel.R

EmilHvitfeldt · 2023-10-04T19:17:58Z

At first, I tried to navigate to these docs via e.g. ?tidy.step_harmonicrather than ?step_harmonic. The former goes to tidy.recipe which seems less likely to be what the user is searching for than step_harmonic's docs. Should we redirect that alias there?

Right now, all the tidy methods link to the same place, tidy.recipe as you noted. We found it messy that that the tidy methods were listed next to the steps on pkgdown. so now they live alone:

I just learned that you could do ?tidy.step_center() without loading the namespace, since it didn't autocomplete i figured it didn't work.

We could split out the tidy documentation to their own. It would be to be a page for each tidy method. And we should adequately get linked from the main step. Frankly the tidy documentation isn't great. I'm afraid that this would cause more duplication then what its worth.

But I also don't think people use the tidy() methods that much. and I don't know if it is because they don't need them, or because they don't know they exists

Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>

simonpcouch

Wooo! Thumbs up from me to merge once you feel comments have been addressed. I think these are a big improvement.

R/BoxCox.R

R/center.R

R/dummy_extract.R

R/filter.R

R/impute_mean.R

R/impute_median.R

R/relevel.R

R/slice.R

R/tidy.R

Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>

github-actions · 2023-10-20T00:24:46Z

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

EmilHvitfeldt added 2 commits October 3, 2023 14:16

improve documentation for tidy methods

dd4df83

add news

7b67556

EmilHvitfeldt commented Oct 3, 2023

View reviewed changes

EmilHvitfeldt and others added 2 commits October 3, 2023 14:34

Apply suggestions from code review

410cf38

reknit

d7ce127

EmilHvitfeldt requested a review from simonpcouch October 3, 2023 21:35

simonpcouch reviewed Oct 4, 2023

View reviewed changes

simonpcouch requested changes Oct 4, 2023

View reviewed changes

EmilHvitfeldt and others added 4 commits October 4, 2023 12:38

add missing is

a0e9d39

oxford comma

ba6b743

Apply suggestions from code review

88ac4af

Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>

reknit

701f5df

simonpcouch approved these changes Oct 5, 2023

View reviewed changes

EmilHvitfeldt and others added 4 commits October 5, 2023 13:21

dont document number of columns

48a2b4f

document terms in step_filter

3b61af0

Apply suggestions from code review

e5d5340

Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>

redocument

f001888

EmilHvitfeldt merged commit ece92e5 into main Oct 5, 2023
9 checks passed

EmilHvitfeldt deleted the document-tidy branch October 5, 2023 20:56

github-actions bot locked and limited conversation to collaborators Oct 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve documentation for tidy methods #1230

improve documentation for tidy methods #1230

EmilHvitfeldt commented Oct 3, 2023 •

edited

Loading

EmilHvitfeldt Oct 3, 2023

simonpcouch left a comment

EmilHvitfeldt commented Oct 4, 2023

simonpcouch commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

simonpcouch commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

simonpcouch left a comment

EmilHvitfeldt commented Oct 4, 2023

simonpcouch left a comment

github-actions bot commented Oct 20, 2023

improve documentation for tidy methods #1230

improve documentation for tidy methods #1230

Conversation

EmilHvitfeldt commented Oct 3, 2023 • edited Loading

EmilHvitfeldt Oct 3, 2023

Choose a reason for hiding this comment

simonpcouch left a comment

Choose a reason for hiding this comment

EmilHvitfeldt commented Oct 4, 2023

simonpcouch commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

simonpcouch commented Oct 4, 2023

EmilHvitfeldt commented Oct 4, 2023

simonpcouch left a comment

Choose a reason for hiding this comment

EmilHvitfeldt commented Oct 4, 2023

simonpcouch left a comment

Choose a reason for hiding this comment

github-actions bot commented Oct 20, 2023

EmilHvitfeldt commented Oct 3, 2023 •

edited

Loading