Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

R: rel_to_sql() generates invalid SQL for r_dataframe_scan() #7

Closed
2 tasks done
krlmlr opened this issue Apr 10, 2023 · 10 comments
Closed
2 tasks done

R: rel_to_sql() generates invalid SQL for r_dataframe_scan() #7

krlmlr opened this issue Apr 10, 2023 · 10 comments
Assignees
Labels
bug an unexpected problem or unintended behavior

Comments

@krlmlr
Copy link
Collaborator

krlmlr commented Apr 10, 2023

What happens?

To what extent is the SQL generated by rel_to_sql() supposed to be runnable? A simple example involving row_number() already gives parse errors.

To Reproduce

con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
invisible(DBI::dbExecute(con, "CREATE MACRO \"==\"(a, b) AS a = b"))
df1 <- data.frame(a = 1)

rel1 <- duckdb:::rel_from_df(con, df1, experimental = experimental)
rel2 <- duckdb:::rel_project(
  rel1,
  list({
    tmp_expr <- duckdb:::expr_window(duckdb:::expr_function("row_number", list()), list(), list(), offset_expr = NULL, default_expr = NULL)
    duckdb:::expr_set_alias(tmp_expr, "___row_number")
    tmp_expr
  })
)
sql <- duckdb:::rel_to_sql(rel2)
writeLines(sql)
#> SELECT row_number(, NULL, NULL) OVER () AS ___row_number FROM (SELECT * FROM r_dataframe_scan(0x107deb930, (experimental = false))) AS dataframe_4427004208_1881778879
DBI::dbGetQuery(con, sql)
#> Error: Parser Error: syntax error at or near ","
#> LINE 1: SELECT row_number(, NULL, NULL) OVER () AS ___row_number ...
#>                           ^

Created on 2023-04-10 with reprex v2.0.2

OS:

macOS aarch64

DuckDB Version:

7bdaddf8e4504405218c8521c7c38d9c3abf33f6

DuckDB Client:

R

Full Name:

Kirill Müller

Affiliation:

cynkra GmbH

Have you tried this on the latest master branch?

  • I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

  • I agree
@hannes
Copy link
Member

hannes commented Apr 12, 2023

It's supposed to be runnable so this is a bug

@krlmlr krlmlr added the bug an unexpected problem or unintended behavior label Apr 16, 2023
@Tmonster
Copy link
Contributor

Tmonster commented May 9, 2023

@krlmlr can this be closed?

@krlmlr krlmlr changed the title R: rel_to_sql() generates invalid SQL for row_number() R: rel_to_sql() generates invalid SQL for r_dataframe_scan() May 11, 2023
@krlmlr
Copy link
Collaborator Author

krlmlr commented May 11, 2023

I now see:

con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
invisible(DBI::dbExecute(con, "CREATE MACRO \"==\"(a, b) AS a = b"))
df1 <- data.frame(a = 1)

rel1 <- duckdb:::rel_from_df(con, df1, experimental = experimental)
rel2 <- duckdb:::rel_project(
  rel1,
  list({
    tmp_expr <- duckdb:::expr_window(duckdb:::expr_function("row_number", list()), list(), list(), offset_expr = NULL, default_expr = NULL)
    duckdb:::expr_set_alias(tmp_expr, "___row_number")
    tmp_expr
  })
)
sql <- duckdb:::rel_to_sql(rel2)
writeLines(sql)
#> SELECT row_number() OVER () AS ___row_number FROM (SELECT * FROM r_dataframe_scan(0x1282ed580, (experimental = false))) AS dataframe_4969125248_1395266360
DBI::dbGetQuery(con, sql)
#> Error: Parser Error: syntax error at or near "x1282ed580"
#> LINE 1: ... FROM (SELECT * FROM r_dataframe_scan(0x1282ed580, (experimental = false))) AS...
#>                                                   ^

Created on 2023-05-11 with reprex v2.0.2

I wonder if we can emit an inline table instead of r_dataframe_scan(), to generate something along the following lines:

SELECT row_number() OVER () AS ___row_number FROM (SELECT 1.0 AS a) AS dataframe_4969125248_1395266360

Bonus points if this supports data frames with zero or more than one row.

@Tmonster
Copy link
Contributor

So this is because the query executes on a raw pointer for the data frame scan. If this could execute this would be bad for multiple reasons. I think we can close this issue and make another one focused on changing the default table name for data frame scans? Might also be good to discuss in our meeting.

You can also still generate valid & runnable sql if you create the table in the duckdb instance.

con <- DBI::dbConnect(duckdb::duckdb())
experimental <- FALSE
invisible(DBI::dbExecute(con, "CREATE MACRO \"==\"(a, b) AS a = b"))
dbExecute(con, "CREATE TABLE t1 (a int, b varchar);")
rel1 <- duckdb:::rel_from_table(con, "t1");
rel2 <- duckdb:::rel_project(
  rel1,
  list({
    tmp_expr <- duckdb:::expr_window(duckdb:::expr_function("row_number", list()), list(), list(), offset_expr = NULL, default_expr = NULL)
    duckdb:::expr_set_alias(tmp_expr, "___row_number")
    tmp_expr
  })
)
sql <- duckdb:::rel_to_sql(rel2)
writeLines(sql)
# SELECT row_number() OVER () AS ___row_number FROM MAIN.t1
DBI::dbGetQuery(con, sql)
# [1] ___row_number
# <0 rows> (or 0-length row.names)

@krlmlr
Copy link
Collaborator Author

krlmlr commented May 16, 2023

This issue isn't about making r_dataframe_scan() work, it's about emitting alternative SQL for that call. Let's discuss.

@github-actions
Copy link
Contributor

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

@krlmlr
Copy link
Collaborator Author

krlmlr commented Sep 4, 2023

Still interested in a solution here.

@hannes hannes transferred this issue from duckdb/duckdb Sep 13, 2023
@hannes
Copy link
Member

hannes commented Sep 13, 2023

This is actually desired behaviour, we don't want to be able to create pointer values from SQL because of the security implications.

@hannes hannes closed this as completed Sep 13, 2023
@hannes
Copy link
Member

hannes commented Sep 13, 2023

What we could do here is 1) enable a replacement scan for R data frames that looks in the enclosing scope for data frames with the table name and 2) override the ToString of r_dataframe_scan to have the name. However, I think the workaround that Tom provided is sufficient for a debugging feature.

Copy link
Contributor

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 13, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug an unexpected problem or unintended behavior
Projects
None yet
Development

No branches or pull requests

3 participants