Allow for "direct query" style queries #2333
Replies: 3 comments
-
Apart from the cases where the current flow is simply not feasible, it would also make reports on large tables more editor-friendly. For instance, imagine that the |
Beta Was this translation helpful? Give feedback.
-
I like this. At the moment we're focused on building out what you can do in the client side run time. For example, more interactive components like pivot tables and the recent dimension grid, as well as increasing the interactive performance from supporting source tables with ~1 million records to 10-20 million, but at some point we will likely explore pushing queries down to the data warehouse. |
Beta Was this translation helpful? Give feedback.
-
If I'm correctly following the logic described in the Universal SQL article, the browser-based DuckDB WASM currently is limited to querying ONLY the Parquet files created by `npm run source. Is that correct? Given that DuckDB WASM has the ability to directly query files in object storage (e.g., partitioned Parquet files on AWS S3) or remote tables in MotherDuck, what would prevent Evidence's DuckDB WASM from using the cached Parquet file(s) for some queries and, say, remote Parquet files on S3 for drill downs to more detail? I can imagine that making this local/remote Data Source setup would be more complicated for other databases such as Snowflake or BigQuery (e.g., pushing queries down to the data warehouse, per @mcrascal). However, it seems that the DuckDB and MotherDuck sources might be straightforward, given DuckDB's built-in ability to query files in cloud object storage. Please let me know if I'm missing something. 😁 Our use case involves Evidence reports built upon Parquet files (soon to be Iceberg formatted) hosted on S3, with each file (table) partitioned by day. We've built Evidence reports from this S3-hosted data that create cached Parquet files containing daily tables for the current day, as well as aggregated monthly tables for the most recent 6 months. These cached Parquet files make our reports render quite quickly, and give us the filter interactivity that we need. However, for example, we'd like to give users the ability to look at daily tables for alternative days, without having to create cached Parquet files for all of those days. Similarly, our users sometimes will be looking at a report based on a set of monthly tables that are cached, and then want to drill down into data from one or more daily tables that are not cached. In these circumstances, adding the interactivity delay of waiting for queries against tables on S3 would be no problem. So, we'd love to have a DuckDB-based Data Source setup so that it automatically caches the current daily tables to Parquet in the browser, yet pushes any queries to non-cached daily tables off to the partitioned-by-day Parquet files in object storage. Does that make sense for Evidence? |
Beta Was this translation helpful? Give feedback.
-
Feature Description
While the choice of pre-loading all the data together with DuckDB makes the experience of using evidence quite snappy, it also means that some types of dashboard are not viable - simply because they would require loading way too much data. It would be great if it were possible to allow some queries to hit the database directly.
While it might seems undesirable, the number of OLAP databases that could handle this type of workload has exploded, and it would make the deployment easier and solution space (much) larger.
Goal of Feature
Expand the solution space from pre-aggregated data that fits easily into memory to the full set of (ad-hoc) questions one could ask from the data. For instance, drill downs per product, country, membership status, ...
Current Solution / Workarounds
Currently we need to have a ton of queries that precompute the drill downs we think are interesting, and disallow more than a single variable in these drill downs (ie sales per brand, sales per country, sales per channel)
Alternatives
Use evidence as a nice looking (very basic) email report, and use another tool for the deep-dives. Super unsatisfying, because it would require multiple tools again (and those tools can usually also send emails)
Beta Was this translation helpful? Give feedback.
All reactions