Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve documentation for tidy methods #1230

Merged
merged 12 commits into from
Oct 5, 2023
Merged

improve documentation for tidy methods #1230

merged 12 commits into from
Oct 5, 2023

Conversation

EmilHvitfeldt
Copy link
Member

@EmilHvitfeldt EmilHvitfeldt commented Oct 3, 2023

To close #936

This PR does a couple of things:

  • Standardize the tidy section documentation. it uses a list format now
  • Documents the id column (missing for all steps!!!)
  • Fixes various mistakes in spelling of column names, which columns are included or what they discribe
  • Adds documentation for missing steps (11 steps!!)

R/center.R Outdated Show resolved Hide resolved
#'
#' \describe{
#' \item{terms}{character, the selectors or variables selected}
#' \item{value}{character, the location of the cuts}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you would think this should be numeric. but it results characters. related a bit to #1229

R/discretize.R Outdated Show resolved Hide resolved
R/normalize.R Outdated Show resolved Hide resolved
R/pca.R Outdated Show resolved Hide resolved
R/scale.R Outdated Show resolved Hide resolved
R/scale.R Outdated Show resolved Hide resolved
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not had a chance to review fully, so please forgive if I'm missing something, but it seems there's a lot of duplication here. With an eye for maintainability, would we consider writing this output programmatically?

I'd imagine a dictionary containing common definitions of output columns and a function that returns the Rd for a set of column names, with the option to override definitions for new columns or columns with exceptions to usual definitions. broom does this for tidier methods. This would allow us to iterate more quickly on column definitions.

@EmilHvitfeldt
Copy link
Member Author

there is some duplication, e.i. id is always the same, and terms is also almost the same. But I worry the refactor won't be that nice. For example looking at the value. there are 25 instances of its use, with 22 unique values

#> # A tibble: 22 × 2                                                                                                    
#> value                                                              n
#> <chr>                                                          <int>
#>  1 character, the feature names                                    2
#>  2 numeric, the lambda estimate                                    2
#>  3 numeric, value of loading                                       2
#>  4 character, `rename` expression                                  1
#>  5 character, expression passed to `mutate()`                      1
#>  6 character, the factor levels that is used for the new value     1
#>  7 character, the location of the cuts                             1
#>  8 character, the mode value                                       1
#>  9 character, the value of `ref_level`                             1
#> 10 list, a _list column_ with the conversion key                   1
#> # ℹ 12 more rows
#> # ℹ Use `print(n = ...)` to see more rows

@simonpcouch
Copy link
Contributor

Would you be game to post the code you used to make that table?

@EmilHvitfeldt
Copy link
Member Author

We could do something like this:

#' ```{r, echo = FALSE, results="asis"}
#' args <- list(
#'  statistic = "character, name of statistic, mean or sd",
#'  value = "numeric, value statistic"
#' )
#' result <- tidy_documents(args)
#' cat(result)
#' ```

which would replace:

#' # Tidying
#'
#' When you [`tidy()`][tidy.recipe()] this step, a tibble returned with 4
#' columns `terms`, `statistic`, `value` and `id`:
#'
#' \describe{
#'   \item{terms}{character, the selectors or variables selected}
#'   \item{statistic}{character, name of statistic, mean or sd}
#'   \item{value}{numeric, value statistic}
#'   \item{id}{character, id of this step}
#' }

What do you think?

@EmilHvitfeldt
Copy link
Member Author

e to post the code you used to make that table?

library(tidyverse)

fs::dir_ls("R") |>
  map(read_lines) |>
  unlist() |>
  str_subset("\\item\\{value\\}") |>
  str_remove(".{18}") |>
  str_sub(end = -2) |>
  tibble() |>
  set_names("value") |>
  count(value, sort = TRUE)

@simonpcouch
Copy link
Contributor

Okay, this is convincing enough for me that we need not add any machinery here. :)

fs::dir_ls("R") |>
  map(read_lines) |>
  unlist() |>
  str_subset("\\item\\{") |>
  str_sub(end = -2) |>
  tibble() %>%
  rename(text = 1) %>%
  separate(text, c("column", "description"), "\\}\\{") %>%
  mutate(column = str_replace(column, fixed("#'   \\item{"), "")) %>%
  group_by(column) %>%
  count(description, sort = TRUE)
#> # A tibble: 81 × 3                                                                                                          
#> # Groups:   column [44]
#>    column    description                                        n
#>    <chr>     <chr>                                          <int>
#>  1 id        character, id of this step                        92
#>  2 terms     character, the selectors or variables selected    88
#>  3 columns   character, names of resulting columns              3
#>  4 component character, name of component                       3
#>  5 class     character, name of class variable                  2
#>  6 value     character, the feature names                       2
#>  7 value     numeric, the lambda estimate                       2
#>  8 value     numeric, value of loading                          2
#>  9 base      numeric, value for the base                        1
#> 10 class     character, name of the class                       1
#> # ℹ 71 more rows

Would be nice just for those two columns but cumbersome for the rest.

I'll defer to you on whether you want to try tidy_documents() before I review properly, but no pressure from me on making changes before I do so. What do you think?

@EmilHvitfeldt
Copy link
Member Author

lets try without the refactor 😄

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bravo! These changes are definitely an improvement. Here's a first pass at review. :)

At first, I tried to navigate to these docs via e.g. ?tidy.step_harmonicrather than ?step_harmonic. The former goes to tidy.recipe which seems less likely to be what the user is searching for than step_harmonic's docs. Should we redirect that alias there?

NEWS.md Outdated Show resolved Hide resolved
R/BoxCox.R Outdated Show resolved Hide resolved
R/BoxCox.R Outdated Show resolved Hide resolved
R/classdist_shrunken.R Outdated Show resolved Hide resolved
R/classdist_shrunken.R Outdated Show resolved Hide resolved
R/normalize.R Outdated Show resolved Hide resolved
R/pls.R Outdated Show resolved Hide resolved
R/profile.R Outdated Show resolved Hide resolved
R/other.R Outdated Show resolved Hide resolved
R/relevel.R Show resolved Hide resolved
@EmilHvitfeldt
Copy link
Member Author

At first, I tried to navigate to these docs via e.g. ?tidy.step_harmonicrather than ?step_harmonic. The former goes to tidy.recipe which seems less likely to be what the user is searching for than step_harmonic's docs. Should we redirect that alias there?

Right now, all the tidy methods link to the same place, tidy.recipe as you noted. We found it messy that that the tidy methods were listed next to the steps on pkgdown. so now they live alone:

Screenshot 2023-10-04 at 12 11 20 PM

I just learned that you could do ?tidy.step_center() without loading the namespace, since it didn't autocomplete i figured it didn't work.

We could split out the tidy documentation to their own. It would be to be a page for each tidy method. And we should adequately get linked from the main step. Frankly the tidy documentation isn't great. I'm afraid that this would cause more duplication then what its worth.

But I also don't think people use the tidy() methods that much. and I don't know if it is because they don't need them, or because they don't know they exists

EmilHvitfeldt and others added 4 commits October 4, 2023 12:38
Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>
Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wooo! Thumbs up from me to merge once you feel comments have been addressed. I think these are a big improvement.

R/BoxCox.R Show resolved Hide resolved
R/center.R Outdated Show resolved Hide resolved
R/dummy_extract.R Show resolved Hide resolved
R/filter.R Show resolved Hide resolved
R/impute_mean.R Outdated Show resolved Hide resolved
R/impute_median.R Outdated Show resolved Hide resolved
R/relevel.R Outdated Show resolved Hide resolved
R/slice.R Outdated Show resolved Hide resolved
R/tidy.R Outdated Show resolved Hide resolved
R/tidy.R Outdated Show resolved Hide resolved
@EmilHvitfeldt EmilHvitfeldt merged commit ece92e5 into main Oct 5, 2023
9 checks passed
@EmilHvitfeldt EmilHvitfeldt deleted the document-tidy branch October 5, 2023 20:56
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Oct 20, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make sure all tidy.step_* are documented
2 participants