depr(python,rust!): Rename parameter by
to group_by
in DataFrame.upsample/group_by_dynamic/rolling
#14840
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This hasn't been fully discussed, but it's related to #10989 and it was pretty easy to
F2
-replace these. Sometimes, just opening a PR is the fastest way to make a decision, so... here this is 😄In summary:
by
in Polars usually refers to which operation you do the operation by (e.g. insort
, it sorts by theby
columns), and sometimes to which columns you group by before applying the operation. For example:df.upsample('a', '3d', by='b')
means "group by 'b' and then upsample by column 'a' every 3 days"df.top_k(3, by='b')
means "take the 3 rows where column'b'
is largest"If an operation lets you group by certain columns before applying the operation, then I think it would be clearer to use
group_by
for that parameter.This would open up the doors for
df.top_k(3, by='b', group_by='a')
(theby
intop_k
is already taken)For readability, looking at the tests, I do think
is clearer to read than
The latter almost looks like it's rolling based on groups, whereas it's rolling based on
'times'
grouped by'groups'