Version 5.1.0
What's Changed
- Update Vega-Lite from version 5.8.0 to version 5.14.1; see Vega-Lite Release Notes.
Enhancements
-
The
chart.transformed_data()
method was added to extract transformed chart dataFor example when having an Altair chart including aggregations:
import altair as alt from vega_datasets import data cars = data.cars.url chart = alt.Chart(cars).mark_bar().encode( y='Cylinders:O', x='mean_acc:Q' ).transform_aggregate( mean_acc='mean(Acceleration)', groupby=["Cylinders"] ) chart
Its now possible to call thechart.transformed_data
method to extract a pandas DataFrame containing the transformed data.chart.transformed_data()
This method is dependent on VegaFusion with the embed extras enabled.
-
Introduction of a new data transformer named
vegafusion
VegaFusion is an external project that provides efficient Rust implementations of most of Altair's data transformations. Using VegaFusion as Data Transformer it can overcome the Altair MaxRowsError by performing data-intensive aggregations in Python and pruning unused columns from the source dataset.
The data transformer can be enabled as such:
import altair as alt alt.data_transformers.enable("vegafusion") # default is "default"
DataTransformerRegistry.enable('vegafusion')
And one can now visualize a very large DataFrame as histogram where the binning is done within VegaFusion:
import pandas as pd import altair as alt # prepare dataframe with 1 million rows flights = pd.read_parquet( "https://vegafusion-datasets.s3.amazonaws.com/vega/flights_1m.parquet" ) delay_hist = alt.Chart(flights).mark_bar(tooltip=True).encode( alt.X("delay", bin=alt.Bin(maxbins=30)), alt.Y("count()") ) delay_hist
When thevegafusion
data transformer is active, data transformations will be pre-evaluated when displaying, saving and converting charts as dictionary or JSON.See a detailed overview on the VegaFusion Data Transformer in the documentation.
-
A
JupyterChart
class was added to support accessing params and selections from PythonThe
JupyterChart
class makes it possible to update charts after they have been displayed and access the state of interactions from Python.For example when having an Altair chart including a selection interval as brush:
import altair as alt from vega_datasets import data source = data.cars() brush = alt.selection_interval(name="interval", value={"x": [80, 160], "y": [15, 30]}) chart = alt.Chart(source).mark_point().encode( x='Horsepower:Q', y='Miles_per_Gallon:Q', color=alt.condition(brush, 'Cylinders:O', alt.value('grey')), ).add_params(brush) jchart = alt.JupyterChart(chart) jchart
It is now possible to return the defined interval selection within Python using theJupyterChart
jchart.selections.interval.value
{'Horsepower': [80, 160], 'Miles_per_Gallon': [15, 30]}
The selection dictionary may be converted into a pandas query to filter the source DataFrame:
filter = " and ".join([ f"{v[0]} <= `{k}` <= {v[1]}" for k, v in jchart.selections.interval.value.items() ]) source.query(filter)
Another possibility of the newJupyerChart
class is to useIPyWidgets
to control parameters in Altair. Here we use an ipywidgetIntSlider
to control the Altair parameter namedcutoff
.import pandas as pd import numpy as np from ipywidgets import IntSlider, link, VBox rand = np.random.RandomState(42) df = pd.DataFrame({ 'xval': range(100), 'yval': rand.randn(100).cumsum() }) cutoff = alt.param(name="cutoff", value=23) chart = alt.Chart(df).mark_point().encode( x='xval', y='yval', color=alt.condition( alt.datum.xval < cutoff, alt.value('red'), alt.value('blue') ) ).add_params( cutoff ) jchart = alt.JupyterChart(chart) slider = IntSlider(min=0, max=100, description='ipywidget') link((slider, "value"), (jchart.params, "cutoff")) VBox([slider, jchart])
TheJupyterChart
class is dependent on AnyWidget. See a detailed overview in the new documentation page on JupyterChart Interactivity.
-
Support for field encoding inference for objects that support the DataFrame Interchange Protocol
We are maturing support for objects build upon the DataFrame Interchange Protocol in Altair.
Given the following pandas DataFrame with an ordered categorical column-type:import altair as alt from vega_datasets import data # Clean Title column movies = data.movies() movies["Title"] = movies["Title"].astype(str) # Convert MPAA rating to an ordered categorical rating = movies["MPAA_Rating"].astype("category") rating = rating.cat.reorder_categories( ['Open', 'G', 'PG', 'PG-13', 'R', 'NC-17', 'Not Rated'] ).cat.as_ordered() movies["MPAA_Rating"] = rating # Build chart using pandas chart = alt.Chart(movies).mark_bar().encode( alt.X("MPAA_Rating"), alt.Y("count()") ) chart
We can convert the DataFrame to a PyArrow Table and observe that the types are now equally infered when rendering the chart.import pyarrow as pa # Build chart using PyArrow chart = alt.Chart(pa.Table.from_pandas(movies)).mark_bar().encode( alt.X("MPAA_Rating"), alt.Y("count()") ) chart
Vega-Altair support of the DataFrame Interchange Protocol is dependent on PyArrow.
-
A new transform method
transform_extent
is availableSee the following example how this transform can be used:
import pandas as pd import altair as alt df = pd.DataFrame( [ {"a": "A", "b": 28}, {"a": "B", "b": 55}, {"a": "C", "b": 43}, {"a": "D", "b": 91}, {"a": "E", "b": 81}, {"a": "F", "b": 53}, {"a": "G", "b": 19}, {"a": "H", "b": 87}, {"a": "I", "b": 52}, ] ) base = alt.Chart(df, title="A Simple Bar Chart with Lines at Extents").transform_extent( extent="b", param="b_extent" ) bars = base.mark_bar().encode(x="b", y="a") lower_extent_rule = base.mark_rule(stroke="firebrick").encode( x=alt.value(alt.expr("scale('x', b_extent[0])")) ) upper_extent_rule = base.mark_rule(stroke="firebrick").encode( x=alt.value(alt.expr("scale('x', b_extent[1])")) ) bars + lower_extent_rule + upper_extent_rule
-
It is now possible to add configurable pixels-per-inch (ppi) metadata to saved and displayed PNG images
import altair as alt from vega_datasets import data source = data.cars() chart = alt.Chart(source).mark_boxplot(extent="min-max").encode( alt.X("Miles_per_Gallon:Q").scale(zero=False), alt.Y("Origin:N"), ) chart.save("box.png", ppi=300)
alt.renderers.enable("png", ppi=144) # default ppi is 72 chart
Bug Fixes
- Don't call
len
on DataFrame Interchange Protocol objects (#3111)
Maintenance
- Add support for new referencing logic in version 4.18 of the jsonschema package
Backward-Incompatible Changes
- Drop support for Python 3.7 which is end-of-life (#3100)
- Hard dependencies: Increase minimum required pandas version to 0.25 (#3130)
- Soft dependencies: Increase minimum required vl-convert-python version to 0.13.0 and increase minimum required vegafusion version to 1.4.0 (#3163, #3160)
New Contributors
- @thomend made their first contribution in #3086
- @NickCrews made their first contribution in #3155
Release Notes by Pull Request
Click to view all 52 PRs merged for this release
- Explicitly specify arguments for to_dict and to_json methods for top-level chart objects by @binste in #3073
- Add Vega-Lite to Vega compiler registry and format arg to to_dict() and to_json() by @jonmmease in #3071
- Sanitize timestamps in arrow tables by @jonmmease in #3076
- Fix ridgeline example by @binste in #3082
- Support extracting transformed chart data using VegaFusion by @jonmmease in #3081
- Improve troubleshooting docs regarding Vega-Lite 5 by @binste in #3074
- Make transformed_data public and add initial docs by @jonmmease in #3084
- MAINT: Gitignore venv folders and use gitignore for black by @binste in #3087
- Fixed Wheat and Wages case study by @thomend in #3086
- Type hints: Parts of folders "vegalite", "v5", and "utils" by @binste in #2976
- Fix CI by @jonmmease in #3095
- Add VegaFusion data transformer with mime renderer, save, and to_dict/to_json integration by @jonmmease in #3094
- Unpin vl-convert-python in dev/ci dependencies by @jonmmease in #3099
- Drop support for Python 3.7 which is end-of-life by @binste in #3100
- Add support to transformed_data for reconstructed charts (with from_dict/from_json) by @binste in #3102
- Add VegaFusion data transformer documentation by @jonmmease in #3107
- Don't call len on DataFrame interchange protocol object by @jonmmease in #3111
- copied percentage calculation in example by @thomend in #3116
- Distributions and medians of likert scale ratings by @thomend in #3120
- Support for type inference for DataFrames using the DataFrame Interchange Protocol by @jonmmease in #3114
- Add some 5.1.0 release note entries by @jonmmease in #3123
- Add a code of conduct by @joelostblom in #3124
- master -> main by @jonmmease in #3126
- Handle pyarrow-backed columns in pandas 2 DataFrames by @jonmmease in #3128
- Fix accidental requirement of Pandas 1.5. Bump minimum Pandas version to 0.25. Run tests with it by @binste in #3130
- Add Roadmap and CoC to the documentation by @jonmmease in #3129
- MAINT: Use importlib.metadata and packaging instead of deprecated pkg_resources by @binste in #3133
- Add online JupyterChart widget based on AnyWidget by @jonmmease in #3119
- feat(widget): prefer lodash-es/debounce to reduce import size by @manzt in #3135
- Fix contributing descriptions by @thomend in #3121
- Implement governance structure based on GitHub's MVG by @binste in #3139
- Type hint schemapi.py by @binste in #3142
- Add JupyterChart section to Users Guide by @jonmmease in #3137
- Add governance page to the website by @jonmmease in #3144
- MAINT: Remove altair viewer as a development dependency by @binste in #3147
- Add support for new referencing resolution in jsonschema>=4.18 by @binste in #3118
- Update Vega-Lite to 5.14.1. Add transform_extent by @binste in #3148
- MAINT: Fix type hint errors which came up with new pandas-stubs release by @binste in #3154
- JupyterChart: Add support for params defined in the extent transform by @jonmmease in #3151
- doc: Add tooltip to Line example with custom order by @NickCrews in #3155
- docs: examples: add line plot with custom order by @NickCrews in #3156
- docs: line: Improve prose on custom ordering by @NickCrews in #3158
- docs: examples: remove connected_scatterplot by @NickCrews in #3159
- Refactor optional import logic and verify minimum versions by @jonmmease in #3160
- Governance: Mark @binste as committee chair by @binste in #3165
- Add ppi argument for saving and displaying charts as PNG images by @jonmmease in #3163
- Silence AnyWidget warning (and support hot-reload) in development mode by @jonmmease in #3166
- Update roadmap.rst by @mattijn in #3167
- Add return type to transform_extent by @binste in #3169
- Use import_vl_convert in _spec_to_mimebundle_with_engine for better error message by @jonmmease in #3168
- update example world projections by @mattijn in #3170
- Send initial selections to Python in JupyterChart by @jonmmease in #3172
Full Changelog: v5.0.1...v5.1.0