Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add duckdb support #1398

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

add duckdb support #1398

wants to merge 14 commits into from

Conversation

ahuang11
Copy link
Collaborator

@ahuang11 ahuang11 commented Aug 22, 2024

Closes #1397

import hvplot.duckdb

import pandas as pd
import duckdb

df = pd.DataFrame({
    'x': [1, 2, 3],
    'y': [4, 5, 6]
})

df.to_parquet('test.parquet')

connection = duckdb.connect(database=':memory:', read_only=False)
connection.from_parquet("test.parquet").hvplot(
    x='x', y='y', kind='scatter'
)
image

Copy link

codecov bot commented Aug 22, 2024

Codecov Report

Attention: Patch coverage is 93.20388% with 7 lines in your changes missing coverage. Please review.

Project coverage is 88.86%. Comparing base (efeda78) to head (9ed44f2).

Files with missing lines Patch % Lines
hvplot/converter.py 33.33% 2 Missing ⚠️
hvplot/duckdb.py 87.50% 2 Missing ⚠️
hvplot/tests/testpatch.py 89.47% 2 Missing ⚠️
hvplot/plotting/core.py 96.87% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1398      +/-   ##
==========================================
+ Coverage   88.73%   88.86%   +0.12%     
==========================================
  Files          51       52       +1     
  Lines        7592     7694     +102     
==========================================
+ Hits         6737     6837     +100     
- Misses        855      857       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@MarcSkovMadsen
Copy link
Collaborator

Is the long term goal to implement a real duckdb backend for more efficient usage?

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 22, 2024

Can you elaborate on real duckdb backend?

This PR doesn't change the converter at all; only patches duckdb so DuckDBPyRelation.hvplot() is available.

It does not do DuckDBPyRelation.fetchdf().hvplot() AFAIK.

@MarcSkovMadsen
Copy link
Collaborator

Yes.

Is the plan to implement a real HoloViz backend such that you only query the data needed for the plot? For example when using groupby.

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

I'm not sure I follow; this PR allows hvplot off of a DuckDBRelation, which is already the result of a query.

connection > relation (select * from table where ...) > hvplot

connection.execute("SELECT * FROM table WHERE col = 'A'").hvplot()

Ah I see the confusion:

connection.from_parquet("test.parquet").hvplot(
    x='x', y='y', kind='scatter'
)

Can be drilled down further

connection.from_parquet("test.parquet").execute(...).hvplot(
    x='x', y='y', kind='scatter'
)

@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

Was slightly mistaken, but a little modification works:
https://github.com/holoviz/hvplot/pull/1398/files#diff-a47979cba5da76fbc1aec07d45b4403241da5cfbbbf6b36652f75c52fb644824R360

image

Not sure if we should manually add the optimization to subset x/y columns (e.g. relation.select(x, y), or let the user do it themselves.

@ahuang11 ahuang11 marked this pull request as ready for review August 28, 2024 20:52
@ahuang11 ahuang11 requested a review from hoxbro August 28, 2024 20:53
hvplot/duckdb.py Outdated Show resolved Hide resolved
hvplot/duckdb.py Outdated Show resolved Hide resolved
hvplot/duckdb.py Outdated Show resolved Hide resolved
@ahuang11
Copy link
Collaborator Author

ahuang11 commented Aug 28, 2024

Oh I guess it is converting to pandas object first:
lambda self: hvPlotTabular(self.df())

https://github.com/holoviz/hvplot/pull/1398/files#diff-be8c4d86a0e1601a53aca277191c3a5d8e160fc39c2b4d75396c83b6a38ae610R15

Edit: Now it does what you proposed

Copy link
Member

@maximlt maximlt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok on adding DuckDB support. It looks like you should update the diagram on the landing page too, unless you consider this experimental first.
image

doc/index.md Show resolved Hide resolved
doc/user_guide/Integrations.ipynb Show resolved Hide resolved
doc/user_guide/Integrations.ipynb Outdated Show resolved Hide resolved
@maximlt maximlt added this to the 0.11.0 milestone Sep 13, 2024
@maximlt
Copy link
Member

maximlt commented Sep 13, 2024

@ahuang11 can you also update this PR based on #1359 now it has been merged?

@ahuang11
Copy link
Collaborator Author

Okay I think this is ready. I added the diagrams + added changes from #1359

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh have you just replaced Intake by DuckDB? Have you recently removed Intake support?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think he has moved intake up.

Some notes:

  • Arrow are missing
  • Matplotlib is not straight.

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gosh I'm blind!

table = df_duckdb.groupby(['origin', 'mfr'])['mpg'].mean().sort_values().tail(5)
table.hvplot.barh('mfr', 'mpg', by='origin', stacked=True)
```
```{image} ./_static/home/dask.gif
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dask.gif shows bokeh.sampledata.penguins.

Suggested change
```{image} ./_static/home/dask.gif
```{image} ./_static/home/pandas.gif

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a duckdb backend
4 participants