Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs(example): Adds Confidence Interval Ellipses #3747

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Jan 4, 2025

Will close #3715

Description

Adds an example inspired by (ggplot2|plotnine).stat_ellipse().

As can be seen in the first commit, this PR began by rebasing a closed PR from almost 7 years ago.

Relevant info from (#3715 (comment)):

I believe plotnine.stat_ellipse would be an example of implementing this with numpy, scipy.
Source code

I also found an old closed PR (#514 by @essicolo) that would have added an example for this.
The blocker at the time is no longer an issue as (#3202 by @joelostblom) added scipy as a docs dependency.

Example

image

Tasks

Future Work

I think a more generalized version of this would be a good fit for https://github.com/vega/altair_ally.
An issue might be the scipy dependency, which I really was hoping to be able to avoid here.
The dendrogram example shows some kind of inlining from scipy - but I have no idea if that is possible for:

@dangotbanned
Copy link
Member Author

Cannot express how relieved I am to see the CI finally green 😅
be087d2

@dangotbanned dangotbanned changed the title docs(DRAFT): Add Confidence Interval Ellipse example docs: Add Confidence Interval Ellipse example Jan 5, 2025
@dangotbanned dangotbanned changed the title docs: Add Confidence Interval Ellipse example docs(example): Adds Confidence Interval Ellipses Jan 5, 2025
Includes comment removal suggestion in (#3747 (comment))
- Fixed a type ignore (causes by incomplete stubs)
- Renamed variables
- Make replace the implicit `"index"` column with naming it `"order"`

#3747 (comment)
@dangotbanned dangotbanned marked this pull request as ready for review January 10, 2025 16:45
@dangotbanned dangotbanned requested a review from mattijn January 10, 2025 16:53
@mattijn
Copy link
Contributor

mattijn commented Jan 16, 2025

Thanks for this example! Can we add this to the category case studies instead of distributions? If the uncertainty was computed within Altair (not yet possible) than it was a good example for the distribution section.

On-the-fly computation of these confidence regions by doing selections would be really great though. Continue dreaming for vega/vega-lite#6043.

Can we remove this line regarding the import annotation from __future__? Probably also requires removing other types from the function, but that is fine for this example.

Can we rename the functions? Eg can we use confidence_region_2d and grouped_confidence_regions instead?

I also tried to reduce the usage of pandas by using dicts directly. I think that can work, but the readability is slightly less than current situation. But doing more with dicts as containers is always good 😊

@dangotbanned
Copy link
Member Author

Thanks for this example!

No problem @mattijn, although I feel my role in this was mostly making the connection between #3715, #514 and ggplot2.stat_ellipse.
It was interesting though brushing up on some scipy/numpy - as I don't usually interact with these packages directly.

Can we add this to the category case studies instead of distributions? If the uncertainty was computed within Altair (not yet possible) than it was a good example for the distribution section.
On-the-fly computation of these confidence regions by doing selections would be really great though. Continue dreaming for vega/vega-lite#6043.

Huh, I guess I'd never considered where the computation occured being a factor in the category.
Yeah I can recategorise if you think that would be a better fit

Can we remove this line regarding the import annotation from __future__? Probably also requires removing other types from the function, but that is fine for this example.

Yeah I think we might not need that line at all actually.
I added it out of habit when I used tuple[...] here, but that should be valid since https://docs.python.org/3.9/whatsnew/3.9.html#type-hinting-generics-in-standard-collections

I'll take a look today and see what works 👍

arr: np.ndarray[tuple[int, int], np.dtype[np.float64]],

Can we rename the functions? Eg can we use confidence_region_2d and grouped_confidence_regions instead?

Yeah those seem like much better names!
Bikeshedding a little here, but maybe confidence_region_2d and confidence_region(s?)_grouped?.
I don't feel too strongly about it since this is an example - but I'd usually try to match the prefix for related functionality

I also tried to reduce the usage of pandas by using dicts directly. I think that can work, but the readability is slightly less than current situation. But doing more with dicts as containers is always good 😊

Interesting, I hadn't thought to try this without a dataframe at all 🤔
I'm not a fan of pandas, but since we're already using it (vega_datasets) for iris - I think it is simpler just to keep the data in that container.
We'll also be able to easily adapt this example to penguins (#2231) after (#3631) and in #2213

@dangotbanned dangotbanned mentioned this pull request Jan 17, 2025
6 tasks
@dangotbanned dangotbanned marked this pull request as draft January 17, 2025 13:23
@dangotbanned dangotbanned marked this pull request as ready for review January 17, 2025 14:08
@dangotbanned
Copy link
Member Author

@mattijn hopefully I've addressed all of your points in (#3747 (commits))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support/document Confidence Interval Ellipse
3 participants