feat: Add `col.types` argument to `duckdb_read_csv()` #445

eli-daniels · 2024-09-30T17:18:20Z

This addresses 2 issues: #118 and #383

col.types can be given as a named character vector providing the names and types of the columns such as: col.types = c(col0 = 'VARCHAR', col1 = 'COUBLE' , etc..) or an unnamed character vector, then col names are taken from the read.csv output or the col.names argument. Column names given by col.types are preferred over col.names.

The data types are provided as DuckDB data types, so VARCHAR, DOUBLE, BIGINT, etc...

As part of this I changed dbWriteTable to dbCreateTable. So could easily add the temporary parameter mentioned here #142, but don't want to get ahead of myself.

Also made a minor addition to the docs to mention that the csv files are appended to the table if the table already exists, I hope that is okay.

Happy to modify anything that may be amiss!

Closes #383.

krlmlr · 2024-10-14T18:49:15Z

Thanks for the effort, good idea!

This function is getting too complex now, is there a way to split this into more focused functions, one of which would be responsible for inferring the column types? Can you propose a refactoring that will later make it easy to add this functionality?

The branch now also conflicts with the main branch, sorry about that.

xiaodaigh · 2024-10-15T13:05:48Z

R/csv.R

@@ -12,6 +14,9 @@
 #' @param delim Which field separator should be used
 #' @param quote Which quote character is used for columns in the CSV file
 #' @param col.names Override the detected or generated column names
+#' @param col.types Character vector of column types in the same order as col.names,
+#' or a named character vector where names are column names and types pairs.
+#' Valid ypes are DuckDB data types, e.g. VARCHAR, DOUBLE, DATE, BIGINT, BOOLEAN, etc.


instead of saying etc, can we point to a documentation somewhere?

xiaodaigh · 2024-10-15T13:12:05Z

tests/testthat/test-read.R

+      col.types = c(
+        Sepal.Length = "DOUBLE",
+        Sepal.Width = "DOUBLE",
+        Petal.Length = "DOUBLE",
+        Petal.Width = "DOUBLE",
+        Species = "DOUBLE"
+      )


There are no tests for date types, feels risky

eli-daniels · 2024-10-16T12:51:47Z

Yep, I'll give it a shot and add some tests and better docs.

Is there any way to avoid recompiling DuckDB when doing RCMD checks?

krlmlr · 2024-10-17T05:33:23Z

I use ccache to speed up recompilation: https://stackoverflow.com/a/45512708/946850, https://dirk.eddelbuettel.com/blog/2017/11/27/ .

eli-daniels · 2024-10-23T08:26:24Z

I've refactored it a bit and added a DATE type test. Ready for review.

If there is anything else I can do to make it better, let me know.

RCMD Check gives : [ FAIL 0 | WARN 0 | SKIP 52 | PASS 4810 ]

krlmlr

Thanks, looks like CI/CD is failing.

R/csv.R

eli-daniels · 2024-10-28T08:06:27Z

Thanks for the review @krlmlr

krlmlr · 2024-10-28T16:55:12Z

Thanks! This is a good start, but ideally, I'd like to proceed as outlined in #118 (comment) . Merging for now, let's see.

eli-daniels · 2024-10-28T18:55:38Z

Agreed, but this adds the functionality asked across a few issues. Hopefully it helps someone. I'm still thinking about how to approach the method discussed in #118

eli-daniels and others added 2 commits September 30, 2024 19:10

add col.types to duckdb_read_csv

99aef28

Merge branch 'main' into main

63c80cf

eli-daniels and others added 2 commits October 14, 2024 22:27

Merge branch 'main' into resolve-conflict

f153a8b

Merge branch 'main' into resolve-conflict

d773adb

xiaodaigh reviewed Oct 15, 2024

View reviewed changes

Merge branch 'main' into resolve-conflict

ba5592a

krlmlr force-pushed the main branch from 640f15f to 169cc0a Compare October 19, 2024 15:51

eli-daniels and others added 5 commits October 21, 2024 15:38

Merge branch 'main' into resolve-conflict

bc936dd

ref : duckdb_read_csv

25fa363

feat : add dates test to read_csv_duckdb

f98dca3

post check addition

652562d

Merge branch 'main' into resolve-conflict

9763564

refactor tests for more insightful test messages

618a163

krlmlr reviewed Oct 27, 2024

View reviewed changes

R/csv.R Show resolved Hide resolved

R/csv.R Outdated Show resolved Hide resolved

eli-daniels and others added 2 commits October 28, 2024 08:45

Merge branch 'main' into resolve-conflict

aa63c0f

Apply suggestions from code review

4317965

eli-daniels and others added 4 commits October 28, 2024 09:47

fix: name to names

a9c281f

doc: add link to duckdb data type docs

6ec79ad

Formatting

b60728c

Document

9319b63

krlmlr enabled auto-merge October 28, 2024 16:53

krlmlr changed the title ~~add col.types to duckdb_read_csv~~ feat: Add col.types argument to duckdb_read_csv() Oct 28, 2024

krlmlr added this pull request to the merge queue Oct 28, 2024

Merged via the queue into duckdb:main with commit 7c8f306 Oct 28, 2024
23 checks passed

eli-daniels deleted the resolve-conflict branch October 28, 2024 18:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `col.types` argument to `duckdb_read_csv()` #445

feat: Add `col.types` argument to `duckdb_read_csv()` #445

eli-daniels commented Sep 30, 2024 •

edited by krlmlr

Loading

krlmlr commented Oct 14, 2024

xiaodaigh Oct 15, 2024

xiaodaigh Oct 15, 2024 •

edited

Loading

eli-daniels commented Oct 16, 2024

krlmlr commented Oct 17, 2024

eli-daniels commented Oct 23, 2024

krlmlr left a comment

eli-daniels commented Oct 28, 2024

krlmlr commented Oct 28, 2024

eli-daniels commented Oct 28, 2024

feat: Add col.types argument to duckdb_read_csv() #445

feat: Add col.types argument to duckdb_read_csv() #445

Conversation

eli-daniels commented Sep 30, 2024 • edited by krlmlr Loading

krlmlr commented Oct 14, 2024

xiaodaigh Oct 15, 2024

Choose a reason for hiding this comment

xiaodaigh Oct 15, 2024 • edited Loading

Choose a reason for hiding this comment

eli-daniels commented Oct 16, 2024

krlmlr commented Oct 17, 2024

eli-daniels commented Oct 23, 2024

krlmlr left a comment

Choose a reason for hiding this comment

eli-daniels commented Oct 28, 2024

krlmlr commented Oct 28, 2024

eli-daniels commented Oct 28, 2024

feat: Add `col.types` argument to `duckdb_read_csv()` #445

feat: Add `col.types` argument to `duckdb_read_csv()` #445

eli-daniels commented Sep 30, 2024 •

edited by krlmlr

Loading

xiaodaigh Oct 15, 2024 •

edited

Loading