You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking into lib/factor/factor_v7.cpp and see code like if (*nrOfLevels < 128). In the comment it says
// use 1 byte per int (Na encoding takes 1 bit)
which seems to be "wasting" the other 7 bits once that one bit is used, technically can support up to 256 distinct values (including NA and NaN).
Without much background, I assume it's to do with how R encodes the values, so it's always stored as int instead of unsigned int. I know if it's too expensive in terms of performance to relax this to 256 by converting to unsigned int. I know Julia supports unsigned with its UInt8 type.
The text was updated successfully, but these errors were encountered:
Hi @xiaodaigh, thanks for your question. In an R factor, the value NA is encoded as a NA value in the value vector:
# some factorx<-factor(sample(LETTERS, 10), levels=LETTERS)
# set factor value to NAx[5] <-NA# underlying value is set to NA
as.integer(x)
#> [1] 23 22 16 18 NA 5 8 4 2 26
This could have been done better in R I guess, for example by coding the value 0 as the NA, but with the current implementation that leads to an error:
For performance reasons, fst takes bit 32 from the factor values and adds bit 0-7 to that to get a single byte. So these 7 bits can only be used for < 128 levels. I could also re-code value 0 as an NA, but that would require more processing and would reduce the speed of the filter...
I was looking into lib/factor/factor_v7.cpp and see code like
if (*nrOfLevels < 128)
. In the comment it sayswhich seems to be "wasting" the other 7 bits once that one bit is used, technically can support up to 256 distinct values (including NA and NaN).
Without much background, I assume it's to do with how R encodes the values, so it's always stored as
int
instead ofunsigned int
. I know if it's too expensive in terms of performance to relax this to 256 by converting tounsigned int
. I know Julia supports unsigned with its UInt8 type.The text was updated successfully, but these errors were encountered: