-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integer64 still remains numeric upon opening with read_fst #267
Comments
Hi @markdanese, thanks for reporting your issue! I think this is just the way that library(bit64)
# write and read again
read_write_cycle <- function(x) {
tmp_file <- tempfile(fileext = "fst")
x |>
fst::write_fst(tmp_file)
fst::read_fst(tmp_file)
}
# sample table with very large integers
x <- data.frame(
LongInt = bit64::as.integer64(sample(1e10:(1e10 + 100), 1000, replace = TRUE))
)
typeof(x$LongInt)
#> [1] "double"
str(x$LongInt)
#> integer64 [1:1000] 10000000047 10000000005 10000000008 10000000083 10000000077 10000000054 10000000026 10000000039 ...
y <- read_write_cycle(x)
typeof(y$LongInt)
#> [1] "double"
str(y$LongInt)
#> integer64 [1:1000] 10000000047 10000000005 10000000008 10000000083 10000000077 10000000054 10000000026 10000000039 ... I hope that helps! |
@MarcusKlik It is more than just naming because it leads to my code failing. I have to load the bit64 package in order for my code to work. This never used to be the case (I have used this script for several years). But it is certainly my fault for not putting together a reproducible example. And it may be an issue with the bit64 package (or some strange interaction between it and fst). Or perhaps even something related to my M1 Mac since it started after I got a new computer. I will try to create an example and see if that helps. |
This looks to me like an rstudio issue. If you can't repro the creation - can you share a file demonstrating it? |
Hi @markdanese, I can't see anything funny about the returned # write and read again
read_write_cycle <- function(x) {
tmp_file <- tempfile(fileext = "fst")
x |>
fst::write_fst(tmp_file)
fst::read_fst(tmp_file)
}
# sample table with very large integers
x <- data.frame(
LongInt = bit64::as.integer64(sample(1e10:(1e10 + 100), 1000, replace = TRUE))
)
y <- read_write_cycle(x)
attributes(x$LongInt)
#> $class
#> [1] "integer64"
attributes(y$LongInt)
#> $class
#> [1] "integer64" perhaps you can still add a reproducible example showing the exact problem? thanx |
My apologies for not getting back to this. As I said above, it may not be an fst problem. But I wanted others to know about it, and how to work around it, in case someone else ran into this problem. Hopefully this is more helpful. Please let me know if you need anything else.
Here is the output on my computer from the script above:
|
I just updated all of my packages to R 4.3.2 and the results are the same. Just FYI.
|
Hi @markdanese, I think the issue is with how library(bit64)
x <- data.frame(
person_id = c(1346900019, 1348000031),
age = c(80, 75)
)
# column types
typeof(x$person_id)
#> [1] "double"
typeof(x$age)
#> [1] "double"
# registered as integer64
is.integer64(x$person_id)
#> [1] FALSE in this case, your columns are both stored as doubles. If you want to use integer64 and integer columns, you could use: library(bit64)
x <- data.frame(
person_id = as.integer64(c(1346900019, 1348000031)),
age = c(80L, 75L)
)
# column types
typeof(x$person_id)
#> [1] "double"
typeof(x$age)
#> [1] "integer"
# registered as integer64
is.integer64(x$person_id)
#> [1] TRUE you can see that although the integer64 column is registered as such, in fact it's just a double in memory (that works because a double has a byte length of 8, the same as integer64) After a write/read cycle, the column types are preserved: # write read cycle
write_fst(x, "test_fst.fst")
y <- read_fst("test_fst.fst")
# column types
typeof(y$person_id)
#> [1] "double"
typeof(y$age)
#> [1] "integer"
# registered as integer64
is.integer64(y$person_id)
#> [1] TRUE
print(y$person_id)
#> integer64
#> [1] 1346900019 1348000031 does that answer your question? |
Please feel free to close this issue if it isn't related to fst. I really don't understand how all of this works, and I don't follow your explanation. It is still a problem for me. I can't open my fst files and use them without the bit64 library call. As you can see above, if I don't call bit64 explicitly, I get person identifier numbers that look like "6.654570e-315" which are not usable. I don't "want" to use bit64. It is the only thing that seems to make my data usable when reading in an fst file. |
Just in case my example was misleading, this shows the problem in selecting a record immediately after opening the fst file.
|
I apologize in advance for not having a reproducible example but I don't really know how to do it in this situation. Will amend this if I figure something out.
I use
write_fst()
to save a file that uses integer64 for the person id. When I open the file usingread_fst()
it comes in as numeric with an 'integer64' label (which I have never seen before. When I look at the file it is all scientific notation. See below for the person_id for the cohort object.However, when I simply run
bit64::is.integer64(cohort$person_id)
(which returns TRUE) its class immediately becomes integer64 and all is fine. There is no resaving or any changes to the actual files other than this single line of code. See below for the status upon refreshing the view in RStudio:So it seems that when restoring an integer64 file, something isn't quite completing the process to make the variable an integer64. This is all done in a fresh session.
If this should be directed elsewhere, please let me know.
The text was updated successfully, but these errors were encountered: