-
-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs(python): Rewrite pivot
and melt
docstrings
#13693
base: main
Are you sure you want to change the base?
Conversation
pivot
and melt
docstrings
@stinodego love to get your thoughts on this one. I guess this can be the first "chunk" of the documentation rewrite strategy we were envisioning. |
@stinodego I wonder if another core developer might prefer to take a look at this and future documentation PRs? |
i'll take a look, but it would be good to get #14048 in first |
Thanks so much @MarcoGorelli! Feel free to edit my edits to reflect your new pivot behavior - you should have commit privileges on this branch. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is good information in here, but it's much too verbose for a docstring. Docstrings should be a one-line summary, maybe two or three sentences of extended summary, and then some examples with a one-sentence explanation.
We actually have a melt and pivot section in the user guide, and there currently is not much information in there. The information would be a welcome addition there.
So I would propose to update the user guide sections and keep only the most essential information the docstring. I can have another look then.
@stinodego Good idea! I moved most of the info to the user guide. I also updated my changes to account for @MarcoGorelli's fixes in #14048. Note that my changes to the user guide will need to be updated for 1.0 to account for #14491. |
@MarcoGorelli looks good? Once you're happy with it, I'll update to account for #14477 and fix the merge conflict. |
Overall, I think this is an improvement - in particular, the examples you provide for |
@MarcoGorelli thanks so much! ready for you to take a look :) |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #13693 +/- ##
==========================================
+ Coverage 80.94% 81.00% +0.05%
==========================================
Files 1327 1331 +4
Lines 172081 172786 +705
Branches 2453 2456 +3
==========================================
+ Hits 139290 139964 +674
- Misses 32320 32354 +34
+ Partials 471 468 -3 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only really got 3 minor comments, it's clear you've spent quite some time looking into pivot and melt. LMK if you don't have bandwidth to address them
Thanks for doing this, it'll help users!
Thus, if there are `N` unique values in the `columns` column, there will be | ||
`N + 1` columns in the pivoted `DataFrame`: one for the row names, the | ||
remaining `N` for the values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is only true if there's a single index
column? Which is fine, you explain what happens with multiple index columns below, I just think it needs caveating here
py-polars/polars/dataframe/frame.py
Outdated
know these unique values in advance, you can perform a "lazy pivot", as shown in | ||
the "Examples" section below. | ||
|
||
Warnings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe "Notes" rather than "Warnings"?
{'min', 'max', 'first', 'last', 'sum', 'mean', 'median', 'len'} | ||
- An expression to do the aggregation. | ||
A function to aggregate multiple `values` with the same `index` and | ||
`columns` prior to pivoting, equivalent to using :func:`group_by` as a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does group_by
render correctly here? you may need to use
:meth:`DataFrame.group_by`
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes,
:func:`group_by`
works - see how https://github.com/pola-rs/polars/blob/py-0.20.13/py-polars/polars/dataframe/frame.py#L7164 renders correctly at https://docs.pola.rs/py-polars/html/reference/dataframe/api/polars.DataFrame.fill_nan.html.
Thanks so much @MarcoGorelli! I incorporated your suggestions. I'm glad you appreciate the amount of work that went into this. I put a similar amount of care into #13402; it would be great if at least some of these changes could be incorporated at some point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some little fixes, as there were some things which still needed sorting out
In particular, the last example from pivot
didn't render
With the regards to the other PR - may I offer some unsolicited advice?
- as you can see, reviews generally take a lot of effort. It's easy to miss things. That's why you're being asked to only focus on a single topic per PR, reviewing 74 docs changes in one go is practically impossible
- I'd really advise against tagging so many people in one message...that would only really be OK if you were reporting an absolutely critical issue which simply couldn't wait...
- I'd suggest merging conflicts before asking for reviews, else content can get out of date (and another review will be needed after you've merged conflicts anyway)
Having said that, I really appreciate the work you've done here!
@MarcoGorelli thanks for the edits! Re: the other PR, it would probably be best at this point if the core developers take that PR as a general guideline for things that could be improved in the docs, and manually incorporate whichever changes they like on their own timeframe. |
pivot and merge are among the more complicated dataframe manipulation functions, so I thought I'd improve the documentation on them.