-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Escape \0 in identifiers and literals #2900
base: master
Are you sure you want to change the base?
Conversation
This should probably throw an (early) error instead of silently changing the data, similar to what |
We can certainly do that, but it would be more expensive and given I doubt anyone has hit this issue so far (ever!) it feels undesirable to make everyone incur this performance cost just in case someone at some time ever tries to put |
CI failed for a reason seemingly unrelated to this PR, so I've pushed a bump commit. |
Can you quantify the performance cost? As an alternative, |
I was thinking about escapeIdentifier and adding a check before running the regex, which would mean looping over the string twice (both via regexp). But of course we can throw an error in the Almost 5 years ago, the escapeIdentifier logic was changed from a loop to a regexp: On that commit you'll see there's a comment from me in 2019 where I benchmarked to find that this was a 11x performance gain. Sadly jsperf has been broken for years, and of course JS performance has changed since then, so I've built a new benchmark for three implementations: https://jsbench.me/bblf6kjk4z/2 function escapeIdentifierLoop(str) {
var escaped = '"'
for (var i = 0; i < str.length; i++) {
var c = str[i]
if (c === '"') {
escaped += c + c
} else if (c === '\0') {
throw new Error("\\0 forbidden");
} else {
escaped += c
}
}
escaped += '"'
return escaped
}
function escapeIdentifierRegexpAndError(str) {
if (/\0/.test(str)) throw new Error("\\0 forbidden");
return '"' + str.replace(/"/g, '""') + '"'
}
function escapeIdentifierRegexpAndInvisibleSub(str) {
return '"' + str.replace(/["\0]/g, '""') + '"'
}
function escapeIdentifierUnsafe(str) {
return '"' + str.replace(/"/g, '""') + '"'
}
const strings = [];
const escaped = [];
for (let i = 0; i < 100_000; i++) {
strings.push(String(i) + "some_identifier" + String(i));
} (Note: none of the benchmarked strings contain The result in Chrome is that my original proposal is indeed fastest, but only by ~25% (6.9 Mops/s vs 5.5 Mops/s) - I was really expecting a prior check to double the time but of course it's ready only so I guess it has less overhead. Even two regexps is still significantly faster than replacing the regexps with a loop (~1.8 Mops/s). So I think the performance overhead of checking and throwing an error is probably reasonable, so I've switched over to that. Every call to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor style suggestions, mostly for consistency
Co-authored-by: Charmander <~@charmander.me>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing this!
Current escaping was ported from the PostgreSQL source code, wherein
\0
cannot occur in a string (because C). However, it can in JS.Currently if you
escapeIdentifier
with a\0
in the string, no error is thrown, but when you run the query an 'invalid message format' error is thrown:I don't think any legitimate code would ever trigger this, but it does feel like
escapeIdentifier
should ensure that independent of the input, no PostgreSQL protocol errors should be raised. To address this with minimal impact on performance, I've simply had\0
be replaced with"
. There are definitely other options for handling it, but I don't think any of them are worth the trade-off in performance. Since we already have a loop inescapeLiteral
I just had\0
stripped from there as if it didn't exist.Thanks for
pg
❤️