docs: Improved docs on Transforms #2655

tempdata73 · 2022-07-10T15:26:31Z

Notes:

I didn't use vl's example of using argmax on the movies dataset because this whole page uses the cars dataset as a guide. I felt like using the former would break the flow of thought. Nonetheless, I can happily incorporate it if you guys prefer that one.

Sorry for messing up the other pull request (#2654), but I finally fixed my commits and branches.

mattijn · 2022-07-11T10:29:36Z

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

dangotbanned · 2024-12-23T13:29:39Z

Thanks for the PR! No problem of messing up the commits. Me or @joelostblom will do a review somewhere in coming days.

@mattijn, @joelostblom I'm combing through old issues and came across this PR that apparently closes #2645

Obviously we'd need to get this branch up-to-date, but I wanted to check-in to see if this had actually resolved #2645?

Update

I think I've got this conflict-free now with main 😌

From vega@dfb11f5#diff-3f8dbb48ec3017cd5b2722c66cd989b66c7832d8627e30474a20b0b6048f192b

Previous merge was super messy, due to 2 year old PR

dangotbanned · 2024-12-23T19:49:58Z

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

dsmedia · 2024-12-23T21:07:23Z

@dsmedia don't feel obligated to review this, just curious if you had any thoughts - since you've done a few doc PRs before?

Sure. Will have a look this evening.

dsmedia

Great doc additions! I've made some recommendations / edits here for consideration.

dsmedia · 2024-12-24T03:39:31Z

doc/user_guide/transform/aggregate.rst

+**Note:** As mentioned in :doc:`../data`, this approach of transforming the
+data with Pandas is preferable if we already have the DataFrame at hand.


Consider 1) being more explicit about what exactly is meant by the term "at hand" and 2) being upfront in this sentence about the reason or reasons for Pandas transformations being preferable when the DataFrame is "at hand" (automatic type inference? something else also?)

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

I think it should be referencing data-transformations

doc/user_guide/transform/aggregate.rst

dsmedia · 2024-12-24T04:17:06Z

doc/user_guide/transform/aggregate.rst

+argmin     An input data object containing the minimum field value.                     N/A
+argmax     An input data object containing the maximum field value.                     :ref:`gallery_line_chart_with_custom_legend`
+average    The mean (average) field value. Identical to mean.                           :ref:`gallery_layer_line_color_rule`
+count      The total count of data objects in the group.                                :ref:`gallery_simple_heatmap`


Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Maybe that phrasing could replace

"... in the other axis" (#2655 (comment))

dsmedia · 2024-12-24T04:27:38Z

doc/user_guide/transform/aggregate.rst

+=========  ===========================================================================  =====================================
+Aggregate  Description                                                                  Example
+=========  ===========================================================================  =====================================


The vega-lite docs appear to list these in a more logical (if implicit) order, starting with count-related functions (including count, valid, values, missing, and distinct), moving to basic mathematical operations (sum, product), then to central tendency measures (mean/average, variance/variancep, stdev/stdevp, stderr, median), followed by distribution statistics (q1, q3, ci0, ci1), and finally ending with range functions (min/argmin, max/argmax). The ordering here appears to be in alphabetial order, though it's not strictly so (e.g. ci01). I would have a slight preference for the vega-lite-style functional organization scheme (and with explicit headings for the categories).

I agree on changing the order.

I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition

doc/user_guide/transform/aggregate.rst

dangotbanned · 2024-12-29T17:58:10Z

(#2655 (review))

@dsmedia thanks so much for reviewing so quickly!
I'll try my best to circle back to this over the next few days, if nobody else beats me to it

@dsmedia

@dsmedia Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>

Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>

dangotbanned

Thanks again for doing the heavy lifting on this review @dsmedia

I think I've responded to/applied all of your suggestions and added a few I spotted

dangotbanned · 2024-12-29T19:19:16Z

doc/user_guide/transform/aggregate.rst

@@ -8,7 +8,7 @@ There are two ways to aggregate data within Altair: within the encoding itself,
 or using a top level aggregate transform.

 The aggregate property of a field definition can be used to compute aggregate
-summary statistics (e.g., median, min, max) over groups of data.
+summary statistics (e.g., :code:`median`, :code:`min`, :code:`max`) over groups of data.


I do think these should have some markup, but since they aren't functions - median etc seems like the wrong choice.

Something like "median(...)" would link more closely to how you'd use it

doc/user_guide/transform/aggregate.rst

dangotbanned · 2024-12-29T19:32:54Z

doc/user_guide/transform/aggregate.rst

+**Note:** As mentioned in :doc:`../data`, this approach of transforming the
+data with Pandas is preferable if we already have the DataFrame at hand.


Also, this suggests that data.html discusses these benefits of when a Pandas transformation is preferable, but it wasn't immediately obvious which part of this section of the docs it is referring to.

I think it should be referencing data-transformations

dangotbanned · 2024-12-29T19:43:00Z

doc/user_guide/transform/aggregate.rst

+   alt.Chart(cars).mark_bar().encode(
+      y='Origin:N',
+      # shorthand form of alt.Y(aggregate='count')
+      x='count()'
+   )


The comment seems like it meant alt.X(aggregate='count'); but I think we can do without

Suggested change

alt.Chart(cars).mark_bar().encode(

y='Origin:N',

# shorthand form of alt.Y(aggregate='count')

x='count()'

)

alt.Chart(cars).mark_bar().encode(

x='count()',

y='Origin:N'

)

dangotbanned · 2024-12-29T19:50:22Z

doc/user_guide/transform/aggregate.rst

+**Note:** The :code:`count` aggregate function is of type
+:code:`quantitative` by default, it does not matter if the source data is a
+DataFrame, URL pointer, CSV file or JSON file.


Suggested change

**Note:** The :code:`count` aggregate function is of type

:code:`quantitative` by default, it does not matter if the source data is a

DataFrame, URL pointer, CSV file or JSON file.

.. note::

The :code:`count` aggregate function is of type :code:`quantitative` by default,

it does not matter if the source data is a DataFrame, URL pointer, CSV file or JSON file.

dangotbanned · 2024-12-29T20:24:05Z

doc/user_guide/transform/aggregate.rst

+=========  ===========================================================================  =====================================
+Aggregate  Description                                                                  Example
+=========  ===========================================================================  =====================================


I agree on changing the order.

I'd probably need to see the end result of adding categories though.
The naive approach of just adding a category field would add a lot of repetition

doc/user_guide/transform/aggregate.rst

dangotbanned · 2024-12-29T20:37:07Z

doc/user_guide/transform/aggregate.rst

+argmin     An input data object containing the minimum field value.                     N/A
+argmax     An input data object containing the maximum field value.                     :ref:`gallery_line_chart_with_custom_legend`
+average    The mean (average) field value. Identical to mean.                           :ref:`gallery_layer_line_color_rule`
+count      The total count of data objects in the group.                                :ref:`gallery_simple_heatmap`


Vega-Lite docs also state

Note: ‘count’ operates directly on the input objects and return the same value regardless of the provided field.

Just mentioning in case it's worth adding here as well?

Maybe that phrasing could replace

"... in the other axis" (#2655 (comment))

improved documentation on agg funcs

7ceec5a

betaigeuze mentioned this pull request Nov 30, 2022

improve documentation on aggregation #2645

Open

dangotbanned linked an issue Dec 23, 2024 that may be closed by this pull request

improve documentation on aggregation #2645

Open

dangotbanned requested review from joelostblom and mattijn December 23, 2024 13:27

dangotbanned changed the title ~~Improved docs on Transforms~~ docs: Improved docs on Transforms Dec 23, 2024

dangotbanned added the documentation label Dec 23, 2024

dangotbanned added 4 commits December 23, 2024 19:05

chore: copy encoding.rst rename

cb79d5d

From vega@dfb11f5#diff-3f8dbb48ec3017cd5b2722c66cd989b66c7832d8627e30474a20b0b6048f192b

Merge remote-tracking branch 'upstream/main' into pr/tempdata73/2655

7f82821

fix: apply changes on top of main

50ad1a5

Previous merge was super messy, due to 2 year old PR

revert: Undo removal of trailing comma

aa6b486

dangotbanned requested a review from dsmedia December 23, 2024 19:46

dsmedia reviewed Dec 24, 2024

View reviewed changes

dangotbanned and others added 2 commits December 29, 2024 20:16

Apply suggestions from code review

783a1f0

@dsmedia Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>

Update doc/user_guide/transform/aggregate.rst

baf808f

Co-authored-by: Daniel Sorid <63077097+dsmedia@users.noreply.github.com>

dangotbanned requested changes Dec 29, 2024

View reviewed changes

dangotbanned and others added 8 commits December 30, 2024 15:33

Merge branch 'main' into improve-agg-doc

0a94934

Merge branch 'main' into improve-agg-doc

5fdd170

docs: Add missing values description

be65149

Merge branch 'main' into improve-agg-doc

7f66e23

Merge branch 'main' into improve-agg-doc

78a07db

docs: fix grammar in aggregation introduction

97c036b

docs: improve phrasing in aggregate.rst

f0bbc8c

Merge branch 'main' into improve-agg-doc

d1fc997

dangotbanned added this to the 5.6.0 milestone Jan 14, 2025

dangotbanned mentioned this pull request Jan 17, 2025

Tracking: uv transition #3773

Open

6 tasks

dangotbanned added 2 commits January 17, 2025 12:28

Merge branch 'main' into improve-agg-doc

796a86e

Merge remote-tracking branch 'upstream/main' into pr/tempdata73/2655

85e9d95

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Improved docs on Transforms #2655

docs: Improved docs on Transforms #2655

tempdata73 commented Jul 10, 2022

mattijn commented Jul 11, 2022

dangotbanned commented Dec 23, 2024 •

edited

Loading

dangotbanned commented Dec 23, 2024

dsmedia commented Dec 23, 2024

dsmedia left a comment

dsmedia Dec 24, 2024

dangotbanned Dec 29, 2024

dsmedia Dec 24, 2024

dangotbanned Dec 29, 2024

dsmedia Dec 24, 2024

dangotbanned Dec 29, 2024

dangotbanned commented Dec 29, 2024

dangotbanned left a comment

dangotbanned Dec 29, 2024

dangotbanned Dec 29, 2024

dangotbanned Dec 29, 2024

dangotbanned Dec 29, 2024

dangotbanned Dec 29, 2024

dangotbanned Dec 29, 2024

		Note: As mentioned in :doc:`../data`, this approach of transforming the
		data with Pandas is preferable if we already have the DataFrame at hand.

docs: Improved docs on Transforms #2655

Are you sure you want to change the base?

docs: Improved docs on Transforms #2655

Conversation

tempdata73 commented Jul 10, 2022

mattijn commented Jul 11, 2022

dangotbanned commented Dec 23, 2024 • edited Loading

Update

dangotbanned commented Dec 23, 2024

dsmedia commented Dec 23, 2024

dsmedia left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dangotbanned commented Dec 29, 2024

dangotbanned left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dangotbanned commented Dec 23, 2024 •

edited

Loading