Improve docs for data_to_wide #506

strengejacke · 2024-05-20T09:32:29Z

Background: https://x.com/stephenjwild/status/1792294171924967920
And I recently wanted to use data_to_wide() and couldn't get it work at first, took some time to figure out the logic behind reshaping to wide again. Especially, the documentation of id_cols was confusing.

etiennebacher

Thanks, those are nice docs improvements. However I disagree on the renaming of id_cols as by. I don't think it makes things any clearer but instead leads to confusion since those two functions are clearly based on tidyr::pivot_*() functions so departing from those only for one argument is strange.

Regarding the tweet, I just have access to the first one since I don't have an X account, so maybe there are more explanations, but to me those pivot_*() functions made pivoting/unpivoting so much easier to understand than reshape2 or data.table approaches.

R/data_read.R

R/data_to_wide.R

R/data_to_long.R

R/data_to_wide.R

strengejacke · 2024-05-21T16:50:12Z

but to me those pivot_*() functions made pivoting/unpivoting so much easier to understand

True, but I think by even makes it even clearer. by refers across our easystats-packages to a variable/column that indicates "grouping". Reshaping to wide format commonly, but not in general, identifies "repeated measurements" by an "ID", or group. And id_cols suggests plural, although often only one column is selected (that was also one of the criticism on Twitter). Thus, id_cols is not even the best choice from a grammar perspective, too ;-)

etiennebacher · 2024-05-24T12:56:06Z

By @strengejacke in a review comment:

I personally prefer by, both because it's in line with renaming the other arguments across functions into by. Especially, since by refers to a "grouping" variable, and imho, by makes clearer that this specified variables is used for gathering those rows ("groups") that are spread as new columns.

However, since we already allow select (easystats-name) and cols (tidyverse-name), we could probably use both argument names here, too?

etiennebacher · 2024-05-24T13:03:22Z

However, since we already allow select (easystats-name) and cols (tidyverse-name), we could probably use both argument names here, too?

I think we should deprecate and then remove one of the two. Those pivot functions already have a lot of args so I suggest we avoid adding bloat with aliases.

Thus, id_cols is not even the best choice from a grammar perspective, too ;-)

I don't think there is a best choice for this, one could use id_col but it would suggest this must be a single column while it's not always the case. Just for comparison with other packages:

base::reshape(): idvar
data.table::melt(): id.vars
pandas and polars: index

Considering a by argument would be quite unique (doesn't mean that other packages always make things right, but it's quite indicative).

Curious about what others think about renaming id_cols to by @easystats/core-team

strengejacke · 2024-05-24T16:15:20Z

We slice and reshape by the levels of that variable - it's still my personal favorite, and we use this argument name now aggressively consistent in our packages. I don't mind deviating from other packages, especially since users are free to use those other packages if they prefer that syntax.

I just realize that when you teach R to beginners, it's really good if you can repeat yourself: "use select for..." or "use by for...". Easier to understand and remember for newbies

strengejacke · 2024-05-26T12:12:52Z

bump @DominiqueMakowski @IndrajeetPatil @bwiernik @rempsyc @mattansb

DominiqueMakowski · 2024-05-30T08:24:20Z

I think I agree with @etiennebacher; id_cols' purpose was to eventually specify an index, but i don't think it is conceptually different from what by is expected to do

Sorry @strengejacke 😅 - but I'm not convinced it should be replaced or made an alias

bwiernik · 2024-05-30T13:54:19Z

I don't quite understand the use of by as an alias for id_cols. Could you talk about the similarity there?

strengejacke · 2024-05-30T16:23:57Z

but i don't think it is conceptually different from what by is expected to do

Yes, that's why I thought we could use by, because we use it in conceptually similar ways across other functions now.

But I'm ok with sticking to id_cols, though I don't see the need to use that name. The id_cols refers to a variable, whose "levels" is used for grouping/stratification. In every other instance we now use by, not here though.

I think "re-thinking" reshaping data in this way makes it rather easy to use, especially since it would be in line with our other functions.

strengejacke · 2024-05-30T18:09:14Z

Btw, for data_to_long(), we have select for selecting columns, but also its alias cols for compatibility with pivot_longer()

bwiernik · 2024-05-31T03:47:34Z

I think the conceptual similarity of by to id_cols is very tenuous. I suggest we drop it

strengejacke · 2024-05-31T06:16:58Z

Ok, if nobody likes my suggestion (😢), let's stick to id_cols then.

etiennebacher

Thanks, the failures look unrelated to this PR.

RIP by in data_to_wide()

strengejacke · 2024-05-31T14:10:51Z

by:

strengejacke added 3 commits May 20, 2024 11:30

Improve docs for data_to_wide

da230ac

fix

8db3919

fix

32a0d6a

strengejacke requested a review from etiennebacher May 20, 2024 09:38

strengejacke added 9 commits May 20, 2024 11:48

lintr

03001ec

update docs, deprecate arg, update test

93b776d

update

a4b1e7c

update readme

49650ab

add examples

85ce67b

also improve data_to_long

125c7c9

update test

1ecf014

wordlist

e36b129

update docs

1cd68f7

etiennebacher requested changes May 20, 2024

View reviewed changes

strengejacke added 6 commits May 21, 2024 10:38

address comments

bd88056

apply suggestions

f40b50c

docs

9524aaf

update docs

071d72d

address suggestions

5231541

address comments

1277739

typo

12a2443

strengejacke requested a review from etiennebacher May 21, 2024 22:57

strengejacke and others added 3 commits May 22, 2024 08:57

Update NEWS.md

b0b976f

formatting news

42584e9

plural

69aee08

Merge branch 'main' into improve_docs

54c44f2

strengejacke added 3 commits May 31, 2024 08:10

by -> id_cols

fca2c71

news

67befdf

fix

8f21e31

strengejacke added 4 commits May 31, 2024 11:00

fix warning in test

5dc81f8

typo

fc9f564

lintr, whitespace

d905c4b

lintr (simplify else)

be32269

etiennebacher approved these changes May 31, 2024

View reviewed changes

etiennebacher merged commit a7d3c80 into main May 31, 2024
22 of 29 checks passed

etiennebacher deleted the improve_docs branch May 31, 2024 11:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve docs for data_to_wide #506

Improve docs for data_to_wide #506

strengejacke commented May 20, 2024 •

edited

Loading

etiennebacher left a comment

strengejacke commented May 21, 2024 •

edited

Loading

etiennebacher commented May 24, 2024

etiennebacher commented May 24, 2024

strengejacke commented May 24, 2024

strengejacke commented May 26, 2024

DominiqueMakowski commented May 30, 2024 •

edited

Loading

bwiernik commented May 30, 2024

strengejacke commented May 30, 2024 •

edited

Loading

strengejacke commented May 30, 2024

bwiernik commented May 31, 2024

strengejacke commented May 31, 2024

etiennebacher left a comment

strengejacke commented May 31, 2024

Improve docs for data_to_wide #506

Improve docs for data_to_wide #506

Conversation

strengejacke commented May 20, 2024 • edited Loading

etiennebacher left a comment

Choose a reason for hiding this comment

strengejacke commented May 21, 2024 • edited Loading

etiennebacher commented May 24, 2024

etiennebacher commented May 24, 2024

strengejacke commented May 24, 2024

strengejacke commented May 26, 2024

DominiqueMakowski commented May 30, 2024 • edited Loading

bwiernik commented May 30, 2024

strengejacke commented May 30, 2024 • edited Loading

strengejacke commented May 30, 2024

bwiernik commented May 31, 2024

strengejacke commented May 31, 2024

etiennebacher left a comment

Choose a reason for hiding this comment

strengejacke commented May 31, 2024

strengejacke commented May 20, 2024 •

edited

Loading

strengejacke commented May 21, 2024 •

edited

Loading

DominiqueMakowski commented May 30, 2024 •

edited

Loading

strengejacke commented May 30, 2024 •

edited

Loading