Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve docs for data_to_wide #506

Merged
merged 30 commits into from
May 31, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: datawizard
Title: Easy Data Wrangling and Statistical Transformations
Version: 0.10.0.4
Version: 0.10.0.5
Authors@R: c(
person("Indrajeet", "Patil", , "patilindrajeet.science@gmail.com", role = "aut",
comment = c(ORCID = "0000-0003-1995-6531", Twitter = "@patilindrajeets")),
Expand Down
2 changes: 2 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ CHANGES
If you recode into a numeric variable, and one of the recode values is `NA`,
you no longer need to use `NA_real_` for numeric `NA` values.

* Improved documentation for some functions.

BUG FIXES

* `data_to_long()` did not work for data frame where columns had attributes
Expand Down
23 changes: 10 additions & 13 deletions R/data_read.R
Original file line number Diff line number Diff line change
Expand Up @@ -161,7 +161,7 @@
# user may decide whether we automatically detect variable type or not
if (isTRUE(convert_factors)) {
if (verbose) {
msg <- "Variables where all values have associated labels are now converted into factors. If this is not intended, use `convert_factors = FALSE`."

Check warning on line 164 in R/data_read.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/data_read.R,line=164,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 152 characters.

Check warning on line 164 in R/data_read.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/data_read.R,line=164,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 152 characters.
insight::format_alert(msg)
}
x[] <- lapply(x, function(i) {
Expand All @@ -178,20 +178,18 @@
if (is.character(i)) {
# we need this to drop haven-specific class attributes
i <- as.character(i)
} else {
} else if (!is.null(value_labels) && length(value_labels) == insight::n_unique(i)) {
# if all values are labelled, we assume factor. Use labels as levels
if (!is.null(value_labels) && length(value_labels) == insight::n_unique(i)) {
if (is.numeric(i)) {
i <- factor(i, labels = names(value_labels))
} else {
i <- factor(as.character(i), labels = names(value_labels))
}
value_labels <- NULL
attr(i, "converted_to_factor") <- TRUE
if (is.numeric(i)) {
i <- factor(i, labels = names(value_labels))
} else {
# else, fall back to numeric
i <- as.numeric(i)
i <- factor(as.character(i), labels = names(value_labels))
}
value_labels <- NULL
attr(i, "converted_to_factor") <- TRUE
} else {
# else, fall back to numeric
i <- as.numeric(i)
}

# drop unused value labels
Expand Down Expand Up @@ -310,9 +308,8 @@
)
}
return(out)
} else {
out <- tmp
}
out <- tmp
etiennebacher marked this conversation as resolved.
Show resolved Hide resolved
}

.post_process_imported_data(out, convert_factors, verbose)
Expand Down
1 change: 1 addition & 0 deletions R/data_restoretype.R
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
#' Restore the type of columns according to a reference data frame
#'
#' @param data A data frame for which to restore the column types.
#' @inheritParams data_to_long
#' @inheritParams data_rename
#' @param reference A reference data frame from which to find the correct
Expand Down
72 changes: 51 additions & 21 deletions R/data_to_long.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,65 +4,95 @@
#' the number of columns. This is a dependency-free base-R equivalent of
#' `tidyr::pivot_longer()`.
#'
#' @param data A data frame to pivot.
#' @param names_to The name of the new column that will contain the column
#' names.
#' @param data A data frame to convert to long format, so that it has more
#' rows and fewer columns post-lengthening than pre-lengthening.
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
#' @param names_to The name of the new column (variable) that will contain the
#' _names_ from columns in `select` as values, to identify the source of the
#' values.
#' @param names_prefix A regular expression used to remove matching text from
#' the start of each variable name.
#' @param names_sep,names_pattern If `names_to` contains multiple values, this
#' argument controls how the column name is broken up.
#' `names_pattern` takes a regular expression containing matching groups, i.e. "()".
#' @param values_to The name of the new column that will contain the values of
#' the pivoted variables.
#' @param values_to The name of the new column that will contain the _values_ of
#' the columns in `select`.
#' @param values_drop_na If `TRUE`, will drop rows that contain only `NA` in the
#' `values_to` column. This effectively converts explicit missing values to
#' implicit missing values, and should generally be used only when missing values
#' in data were created by its structure.
#' `values_to` column. This effectively converts explicit missing values to
#' implicit missing values, and should generally be used only when missing values
#' in data were created by its structure.
#' @param rows_to The name of the column that will contain the row names or row
#' numbers from the original data. If `NULL`, will be removed.
#' numbers from the original data. If `NULL`, will be removed.
#' @param ... Currently not used.
#' @inheritParams extract_column_names
#' @param cols Identical to `select`. This argument is here to ensure compatibility
#' with `tidyr::pivot_longer()`. If both `select` and `cols` are provided, `cols`
#' is used.
#' with `tidyr::pivot_longer()`. If both `select` and `cols` are provided, `cols`
#' is used.
#'
#' @details
#' Reshaping data into long format usually means that the input data frame is
#' in _wide_ format, where multiple measurements taken on the same subject are
#' stored in multiple columns (variables). The long format stores the same
#' information in a single column, with each measurement per subject stored in
#' a separate row. All variables that are not in `select` will be repeated for
#' each row that is lengthened.
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
#'
#' The necessary information for `data_to_long()` is:
#'
#' - The columns that contain the repeated measurements (`select`).
#' - The name of the newly created column that will contain the names of the
#' columns in `select` (`names_to`), to identify the source of the values.
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
#' - The name of the newly created column that contains the values of the
#' columns in `select` (`values_to`).
#'
#' In other words: Repeated measurements that are spread across several columns
strengejacke marked this conversation as resolved.
Show resolved Hide resolved
#' will be gathered into a single column (`values_to`), with the original column
#' names, that identify the source of the gathered values, stored in a new column
#' (`names_to`).
#'
#' @return If a tibble was provided as input, `reshape_longer()` also returns a
#' tibble. Otherwise, it returns a data frame.
#'
#' @examplesIf requireNamespace("psych") && requireNamespace("tidyr")
#' wide_data <- data.frame(replicate(5, rnorm(10)))
#' wide_data <- setNames(
#' data.frame(replicate(2, rnorm(8))),
#' c("Time1", "Time2")
#' )
#' wide_data$ID <- 1:8
#' wide_data
#'
#' # Default behaviour (equivalent to tidyr::pivot_longer(wide_data, cols = 1:5))
#' # Default behaviour (equivalent to tidyr::pivot_longer(wide_data, cols = 1:3))
#' # probably doesn't make much sense to mix "time" and "id"
#' data_to_long(wide_data)
#'
#' # Customizing the names
#' data_to_long(wide_data,
#' select = c(1, 2),
#' names_to = "Column",
#' values_to = "Numbers",
#' rows_to = "Row"
#' data_to_long(
#' wide_data,
#' select = c("Time1", "Time2"),
#' names_to = "Timepoint",
#' values_to = "Score"
#' )
#'
#' # Full example
#' # ------------------
#' data <- psych::bfi # Wide format with one row per participant's personality test
#'
#' # Pivot long format
#' data_to_long(data,
#' very_long_data <- data_to_long(data,
#' select = regex("\\d"), # Select all columns that contain a digit
#' names_to = "Item",
#' values_to = "Score",
#' rows_to = "Participant"
#' )
#' head(very_long_data)
#'
#' data_to_long(
#' even_longer_data <- data_to_long(
#' tidyr::who,
#' select = new_sp_m014:newrel_f65,
#' names_to = c("diagnosis", "gender", "age"),
#' names_pattern = "new_?(.*)_(.)(.*)",
#' values_to = "count"
#' )
#'
#' head(even_longer_data)
#' @inherit data_rename
#' @export
data_to_long <- function(data,
Expand Down
Loading
Loading