Replies: 2 comments
-
I'm finally coming back around to thinking about this, and wanted to jot down some more detailed plans. RustSimplify transform evaluationIn the core Rust implementation, I would like to simplify how Vega transforms are converted into representations for evaluation. Currently, Vega transforms are implemented against our custom The main implementation of this trait is the I developed this architecture in order to support generating SQL for non-DataFusion dialects. The only current application of this functionality is using DuckDb as an alternative sql engine in Python. Testing the other dialects is a challenge, and currently the only tests are snapshot tests that I've manually validated using a Hex notebook with each data connection. In the meantime, DataFusion has added support for unparsing LogicalPlans back to SQL in a couple of dialects (currently DataFusion, Postgres, MySQL, and sqlite). My assumption is that the DataFusion or Postgres dialects will be compatible with DuckDB for the subset of functionality the VegaFusion relies on, and so it should be possible to maintain DuckDB support while dramatically simplifying the architecture using this approach. This simplest option here is probably to keep our current DataFrame abstraction, but turn it from a trait back to a struct. The method implementations would use a DataFusion With this approach, we can drop the Drop some DataFusion UDFsDataFusion's feature set has increased dramatically across the board since I initially wrote VegaFusion. This means that it shouldn't be necessary to use as many custom UDFs. We'll still want an architecture that makes it easy to use UDFs/UDAFs, but we should be able to remove a bunch at this point. Looking at how DataFusion's unparse works, it looks like the name of custom UDFs that aren't otherwise intercepted are passed through to the generated sql. This means that if we were only targeting DuckDB to start, we could implement DuckDB functions as DataFusion UDFs/UDAFs and the generated SQL would work out. Lift ChartStateI'd like to move the JavaScript / WASMIf we update This doesn't need to be part of 2.0, but unlike past versions, DataFusion now supports evaluating queries when compiled to wasm, so it should be possible to also use VegaFusion entirely client side with DataFusion and/or DuckDB. Due to package size, it may make sense to publish separate packages for the workflow of connecting VegaFusion to a runtime on the server, and the workflow of running VegaFusion entirely in the browser. But we can see if the package size difference is enough to warrant doing it this way. PythonDrop Altair FunctionalityVegaFusion was initially designed to work with Altair entirely from the outside. And this is how it's still documented at vegafusion.io. Since then, we've integrated nearly all of VegaFusion's original Altair functionality into Altair itself, including I'd like to remove all of this functionality from the Merge vegafusion-python-embedIt has become more complicated than helpful to have I'd like to merge these together into a single Drop vegafusion-jupyter, rework VegaFusionWidgetOne use case that isn't handled by Altair's With this change, we can drop the vegafusion-jupyter package. Use Narwhals and PyCapsule APITo make VegaFusion a little lighter weight for use with Polars, I'd like to remove the hard This would make it possible to use Altair+VegaFusion with Polars without pulling in pandas or pyarrow. JavaThe Java API is pretty incomplete, is broken on CI, and not used as far as I know. So I'd like to drop it for now. If someone has a use case for it in the future, it will be pretty easy to pull back out from git. DocumentationContentThe current VegaFusion docs focus primarily on the original Altair integration. This should all be removed and replaced by a few links to the Altair documentation. Instead, the docs should primarily focus on the topics outlined in https://vegafusion.io/low_level.html. Some possible angles:
LocationI'd like to move the docs into the vegafusion repo and add a pixi task to sync them to the docs repo for github pages I'd also like to move the integration demos to the main repo as well. Next stepsI'm planning to create a v2 branch and then target that with PRs that implement the above. After 2.0After VegaFusion 2.0, I'm most interested in integrating Avenger into VegaFusion to support rendering select marks from a chart into images. My general idea is that VegaFusion should be able to replace a mark (like a symbol mark) with a Vega image mark containing the result of rendering the original mark using Avenger. My plan is to scale the image using the same scales as the original mark, so that things like pan and zoom still work. And when displayed in an interactive context like JupyterChart, the image would re-render asynchronously during pan and zoom operations. The feel should be similar to using pan and zoom in mapping software, where the map tiles fill in asynchronously. This will provide a way to support scatter charts with millions of points, and it will be possible to create rect marks with millions of instances, which can be used to represent heatmaps and images. |
Beta Was this translation helpful? Give feedback.
-
Wanted to share some general plans for what I'm picturing for VegaFusion 2.0.
Background
To recap, VegaFusion 1.0 marked several important milestones for the project:
vf.transformed_data
function.Version 1.2 introduced a suite of
save
functions for exporting Altair charts to external file formats after performing VegaFusion's pre-transform process.Since VegaFusion 1.2, I've been working on integrating these same features into Altair itself. Altair 5.1 includes the initial integration with VegaFusion for:
"vegafusion"
data transformer causes the existing Altair renderers to use VegaFusion to pre-transform chart specifications before sending the results to the browser. It also causes the existing Altairchart.save
andchart.to_json
to use VegaFusion.Drop Altair features and
vegafusion-jupyter
packageAs of Altair 5.1, the only Altair feature of VegaFusion that's not possible with Altair directly is the VegaFusion widget renderer. Yesterday, I opened a PR (vega/altair#3281) to update Altair's JupyterChart to support the functionality of
VegaFusionWidget
, where interactive chart transformations are performed in Python.Once this is merged into Altair and released, there will no longer be any reason for a end-user to
import vegafusion as vf
, as all of the VegaFusion Altair functionality will be available directly in Altair. There will also be no need to install thevegafusion-jupyter
Python package.This is really exciting! And it makes VegaFusion useful to a much larger user base. For VegaFusion 2.0, I'd like to remove all of the Altair functionality from the
vegafusion
Python package and to remove thevegafusion-jupyter
package from the VegaFusion repo.Combine
vegafusion
andvegafusion-python-embed
packagesvegafusion-python-embed
is the native Python/Rust library andvegafusion
is a pure Python package that is intended to be the public interface to VegaFusion's functionality. The reason these are separate package is that I was picturing supporting the scenario where thevegafusion
package communicated with VegaFusion server over grpc. I didn't see this all the way through, and I haven't run into any demand for this feature. By combining these packages, we can remove the nascent code for communicating with VegaFusion server, and it would remove the burden of having to make sure the versions ofvegafusion
andvegafusion-python-embed
match.I also want to fully type the pure Python API so it's easer to use from Altair.
Documentation
The VegaFusion documentation would need a near total overhaul, as it's mostly focused on the Altair functionality. The new documentation should focus on VegaFusion's role as a collection of building blocks for scaling Vega systems.
Beta Was this translation helpful? Give feedback.
All reactions