Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle Array(Bytes) encoding for query params containing non-Unicode text #290

Merged
merged 1 commit into from
Oct 25, 2024

Conversation

jgaskins
Copy link
Contributor

@jgaskins jgaskins commented Oct 13, 2024

Ran into #267 tonight trying to bulk-insert MessagePack-encoded data (via unnest) and can confirm that it's Array(Bytes) that it's choking on. Since all arrays are text-encoded, we need to ensure text encoding for Bytes can handle binary data.

I tried special-casing Array(Bytes) all the way down, but not only does that require a lot more understanding of the binary encoding format than I currently have, it would've been an incomplete solution that only handles linear bytea arrays. Anyone using nested arrays would run into the same problem. So I decided against that.

I don't like that this solution increases the payload size over the wire by encoding every byte as 2 hexadecimal characters. Arrays of strings are implemented in terms of this method, so this impacts text[] encoding, but it doesn't impact encoding of strings outside of arrays, so performance impact should be minimal for most use cases. And every other PQ implementation I've found so far, other than libpq itself, also uses the hex encoding so it seems that if we're slow we're at least in good company. I think my ideal solution would be to convert all param types to binary encoding, but this was a far easier step and I couldn't find the specification for the binary encoding and it's late and I'm sleeeeeeepy.

Please check my work on this. I did confirm that the test case deserializes into the expected string:

db.query_each "SELECT '\\x7468697320697320612022736c69636522'::bytea" do |rs|
  puts String.new rs.read(Bytes)
end
# => this is a "slice"

… and my MessagePack-encoded data unpacks correctly after bulk-inserting it now, but I'm not confident there isn't something I've overlooked.

Fixes #267

Since arrays are text-encoded, we need to text-encode them inside
arrays.
@jgaskins
Copy link
Contributor Author

Arrays of strings are implemented in terms of this method, so this impacts text[] encoding

I just realized, this part may be a lie because this spec still passes. I must've read the code wrong.

@will
Copy link
Owner

will commented Oct 21, 2024

Arrays of strings are implemented in terms of this method, so this impacts text[] encoding

I just realized, this part may be a lie because this spec still passes. I must've read the code wrong.

Ok yeah, I was mulling that line over a bit but I think it's not impacted.

I plan to merge this later this week thanks for sending it in!

@will will merged commit 2c7461a into will:master Oct 25, 2024
1 check passed
@will
Copy link
Owner

will commented Oct 25, 2024

Thanks!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exception sending query with bytea[] binary array type?
2 participants