Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally validate names of Record, Enums and Fixed. #415

Merged
merged 6 commits into from
Jul 17, 2024

Conversation

papanikge
Copy link
Contributor

Apache Avro spec has some naming guidelines for Record, enums and fixed. This library correctly checks that and fails when a file has a badly named field.

However, the official Avro library does not 😢.
avro-tools tojson <badly-named-file.ocf> happily reads the file (same goes for AWS - I think that's because they use Java)

There is a discussion about it here in the mailing list, and furthermore Python seems to have apache/avro#1995

Relates and closes issue #414

Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, just got back from vacation.

While I am generally against globals, I do wonder if in this case it would be a more sane option to have a single global flag given the circumstances. Thoughts?

schema_parse.go Outdated Show resolved Hide resolved
schema_parse.go Outdated Show resolved Hide resolved
@papanikge
Copy link
Contributor Author

Sorry for the delay, just got back from vacation.

Absolutely no worries.

While I am generally against globals, I do wonder if in this case it would be a more sane option to have a single global flag given the circumstances. Thoughts?

Ha! Yeah I thought of that, but I couldn't spoil another person's repo without their consent 😄 😛

I agree, these changes barely justify the cause, and a global solution could easily fit, if we are sure that there won't be another similar change in the future. Arguably though, if something else comes up, we can refactor the global into a DecoderConfig or something similar.

I'll do an attempt so we can see if it looks ok-ish

@papanikge papanikge force-pushed the conditional-validate-names branch from 58b6f09 to baeea99 Compare July 16, 2024 14:54
@papanikge
Copy link
Contributor Author

papanikge commented Jul 16, 2024

@nrwiersma I just pushed. Please let me know what you think.
Not sure of how the API should look like so I went ahead and kept the DecoderFunc generic.

I also added a unit test with an invalid file

@papanikge papanikge requested a review from nrwiersma July 16, 2024 14:59
Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks way better to me, great work.

ocf/ocf.go Outdated
type DecoderFunc func(*decoderConfig)

// WithSkipNameValidation sets whether to skip validating of field names when decoding.
func WithSkipNameValidation(skip bool) DecoderFunc {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would urge caution here. It is unclear that this is setting a global value, so creating 2 decoders, one with this option and one without, it would be expected that they behave differently, however they will not. I would suggest leaving this out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, good callout. I didn't think of that :(
How would the usage would look like though?
Do you prefer the callers setting the var from outside the package?
I find this a bit weird, although I might be just me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see 2 options here:

  1. Expose the global var (it is currently exposed) and let people set it directly.
  2. Do not expose the global var and provide a function to set it, like SkipNameValidation(true).

They both largely equate to the same thing, so either way is fine by me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps in the second option the name could be something like GlobalSkipNameValidation, but yeah, a global var setting would do the same semantically.

I'll remove it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Please take another look

@nrwiersma
Copy link
Member

Perhaps it would be good to document this in the README as well, to make it more discoverable.

@papanikge
Copy link
Contributor Author

Perhaps it would be good to document this in the README as well, to make it more discoverable.

Makes sense. I added it in the Avro Schema Validation section

Copy link
Member

@nrwiersma nrwiersma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉 Thanks.

@nrwiersma nrwiersma merged commit 8ea2833 into hamba:main Jul 17, 2024
2 checks passed
@papanikge
Copy link
Contributor Author

Thank you. Can we get a new version tag too? 😇

@papanikge papanikge deleted the conditional-validate-names branch July 17, 2024 08:55
@nrwiersma
Copy link
Member

Yes, will tag in a day or 2 to see there are no more changes coming.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants