Remove `LEVELS()` accessor #1726

lionel- · 2024-07-02T10:38:04Z

Part of #1706

LEVELS() was added in #1187

Used to detect non-utf8 strings in r_obj_encode_utf8() and optimise the default case of no encoding conversion needed:

Avoid cloning a STRSXP if no translation is needed.
Avoid allocating a CHARSXP if no translation is needed (in case of mixed encodings in the vector, less important I guess?).

We could just call Encoding(x) <- "UTF8" from R. This will unconditionally clone the input vector unless it doesn't have any references. To preserve our optimisation, either of these would work:

Add a predicate to the C API to determine if a CHARSXP is encoded in UTF8
Improve do_setencoding() to only duplicate if needed so that it's a noop in the common case. Bonus points if exported on C side, e.g. as Rf_EnsureUtf8()?

My sense is that these new unconditional allocs would be bad for performance in dplyr. @DavisVaughan you added these optimisations for vctrs, could you confirm please?

The text was updated successfully, but these errors were encountered:

lionel- · 2024-07-02T11:26:26Z

There's now wch/r-source@0c753e4 so we should be able to preserve our optimisations.

Closes #1726

lionel- mentioned this issue Jul 2, 2024

Non-API calls conundrum #1706

Open

lionel- added a commit that referenced this issue Jul 2, 2024

Remove dependency on LEVELS()

a4bbdf2

Closes #1726

lionel- linked a pull request Jul 2, 2024 that will close this issue

Remove dependency on LEVELS() #1728

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `LEVELS()` accessor #1726

Remove `LEVELS()` accessor #1726

lionel- commented Jul 2, 2024

lionel- commented Jul 2, 2024

Remove LEVELS() accessor #1726

Remove LEVELS() accessor #1726

Comments

lionel- commented Jul 2, 2024

lionel- commented Jul 2, 2024

Remove `LEVELS()` accessor #1726

Remove `LEVELS()` accessor #1726