Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Improved docs on Transforms #2655

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

tempdata73
Copy link
Contributor

Notes:

  • I didn't use vl's example of using argmax on the movies dataset because this whole page uses the cars dataset as a guide. I felt like using the former would break the flow of thought. Nonetheless, I can happily incorporate it if you guys prefer that one.

Sorry for messing up the other pull request (#2654), but I finally fixed my commits and branches.

@mattijn
Copy link
Contributor

mattijn commented Jul 11, 2022

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

@dangotbanned
Copy link
Member

dangotbanned commented Dec 23, 2024

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

@mattijn, @joelostblom I'm combing through old issues and came across this PR that apparently closes #2645

Obviously we'd need to get this branch up-to-date, but I wanted to check-in to see if this had actually resolved #2645?

Update

I think I've got this conflict-free now with main 😌

@dangotbanned dangotbanned changed the title Improved docs on Transforms docs: Improved docs on Transforms Dec 23, 2024
@dangotbanned dangotbanned requested a review from dsmedia December 23, 2024 19:46
@dangotbanned
Copy link
Member

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

@dsmedia
Copy link
Contributor

dsmedia commented Dec 23, 2024

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

Sure. Will have a look this evening.

Copy link
Contributor

@dsmedia dsmedia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great doc additions! I've made some recommendations / edits here for consideration.

Comment on lines +80 to +81
**Note:** As mentioned in :doc:`../data`, this approach of transforming the
data with Pandas is preferable if we already have the DataFrame at hand.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider 1) being more explicit about what exactly is meant by the term "at hand" and 2) being upfront in this sentence about the reason or reasons for Pandas transformations being preferable when the DataFrame is "at hand" (automatic type inference? something else also?)

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

I think it should be referencing data-transformations

doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
argmin An input data object containing the minimum field value. N/A
argmax An input data object containing the maximum field value. :ref:`gallery_line_chart_with_custom_legend`
average The mean (average) field value. Identical to mean. :ref:`gallery_layer_line_color_rule`
count The total count of data objects in the group. :ref:`gallery_simple_heatmap`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Maybe that phrasing could replace

"... in the other axis" (#2655 (comment))

Comment on lines +171 to +173
========= =========================================================================== =====================================
Aggregate Description Example
========= =========================================================================== =====================================
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The vega-lite docs appear to list these in a more logical (if implicit) order, starting with count-related functions (including count, valid, values, missing, and distinct), moving to basic mathematical operations (sum, product), then to central tendency measures (mean/average, variance/variancep, stdev/stdevp, stderr, median), followed by distribution statistics (q1, q3, ci0, ci1), and finally ending with range functions (min/argmin, max/argmax). The ordering here appears to be in alphabetial order, though it's not strictly so (e.g. ci01). I would have a slight preference for the vega-lite-style functional organization scheme (and with explicit headings for the categories).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on changing the order.

I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition

doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
@dangotbanned
Copy link
Member

(#2655 (review))

@dsmedia thanks so much for reviewing so quickly!
I'll try my best to circle back to this over the next few days, if nobody else beats me to it

dangotbanned and others added 2 commits December 29, 2024 20:16
@dsmedia

Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>
Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>
Copy link
Member

@dangotbanned dangotbanned left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for doing the heavy lifting on this review @dsmedia

I think I've responded to/applied all of your suggestions and added a few I spotted

@@ -8,7 +8,7 @@ There are two ways to aggregate data within Altair: within the encoding itself,
or using a top level aggregate transform.

The aggregate property of a field definition can be used to compute aggregate
summary statistics (e.g., median, min, max) over groups of data.
summary statistics (e.g., :code:`median`, :code:`min`, :code:`max`) over groups of data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do think these should have some markup, but since they aren't functions - median etc seems like the wrong choice.

Something like "median(...)" would link more closely to how you'd use it

doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
doc/user_guide/transform/aggregate.rst Show resolved Hide resolved
Comment on lines +80 to +81
**Note:** As mentioned in :doc:`../data`, this approach of transforming the
data with Pandas is preferable if we already have the DataFrame at hand.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

I think it should be referencing data-transformations

Comment on lines +103 to +107
alt.Chart(cars).mark_bar().encode(
y='Origin:N',
# shorthand form of alt.Y(aggregate='count')
x='count()'
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment seems like it meant alt.X(aggregate='count'); but I think we can do without

Suggested change
alt.Chart(cars).mark_bar().encode(
y='Origin:N',
# shorthand form of alt.Y(aggregate='count')
x='count()'
)
alt.Chart(cars).mark_bar().encode(
x='count()',
y='Origin:N'
)

Comment on lines +109 to +111
**Note:** The :code:`count` aggregate function is of type
:code:`quantitative` by default, it does not matter if the source data is a
DataFrame, URL pointer, CSV file or JSON file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**Note:** The :code:`count` aggregate function is of type
:code:`quantitative` by default, it does not matter if the source data is a
DataFrame, URL pointer, CSV file or JSON file.
.. note::
The :code:`count` aggregate function is of type :code:`quantitative` by default,
it does not matter if the source data is a DataFrame, URL pointer, CSV file or JSON file.

Comment on lines +171 to +173
========= =========================================================================== =====================================
Aggregate Description Example
========= =========================================================================== =====================================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree on changing the order.

I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition

doc/user_guide/transform/aggregate.rst Outdated Show resolved Hide resolved
argmin An input data object containing the minimum field value. N/A
argmax An input data object containing the maximum field value. :ref:`gallery_line_chart_with_custom_legend`
average The mean (average) field value. Identical to mean. :ref:`gallery_layer_line_color_rule`
count The total count of data objects in the group. :ref:`gallery_simple_heatmap`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Maybe that phrasing could replace

"... in the other axis" (#2655 (comment))

@dangotbanned dangotbanned added this to the 5.6.0 milestone Jan 14, 2025
@dangotbanned dangotbanned mentioned this pull request Jan 17, 2025
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

improve documentation on aggregation
4 participants