You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Used to detect non-utf8 strings in r_obj_encode_utf8() and optimise the default case of no encoding conversion needed:
Avoid cloning a STRSXP if no translation is needed.
Avoid allocating a CHARSXP if no translation is needed (in case of mixed encodings in the vector, less important I guess?).
We could just call Encoding(x) <- "UTF8" from R. This will unconditionally clone the input vector unless it doesn't have any references. To preserve our optimisation, either of these would work:
Add a predicate to the C API to determine if a CHARSXP is encoded in UTF8
Improve do_setencoding() to only duplicate if needed so that it's a noop in the common case. Bonus points if exported on C side, e.g. as Rf_EnsureUtf8()?
My sense is that these new unconditional allocs would be bad for performance in dplyr. @DavisVaughan you added these optimisations for vctrs, could you confirm please?
The text was updated successfully, but these errors were encountered:
Part of #1706
LEVELS()
was added in #1187Used to detect non-utf8 strings in
r_obj_encode_utf8()
and optimise the default case of no encoding conversion needed:We could just call
Encoding(x) <- "UTF8"
from R. This will unconditionally clone the input vector unless it doesn't have any references. To preserve our optimisation, either of these would work:Add a predicate to the C API to determine if a
CHARSXP
is encoded in UTF8Improve
do_setencoding()
to only duplicate if needed so that it's a noop in the common case. Bonus points if exported on C side, e.g. asRf_EnsureUtf8()
?My sense is that these new unconditional allocs would be bad for performance in dplyr. @DavisVaughan you added these optimisations for vctrs, could you confirm please?
The text was updated successfully, but these errors were encountered: