Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #11811 insensitive header name set #11823
Fix #11811 insensitive header name set #11823
Changes from all commits
309f95c
e62a8af
a198651
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically Set implementations support random access or O(log(size)) and rarely have linear access. This may cause surprise performance issues in the calling code.
What about replacing the
List<String> list
variable with a hash set, while retaining all other logic?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned set will have "random access" via the set API, but it will be implemented by
AbstractSet
as an O(n) search. Moreover, a user is unlikely to be using the header name set for random access, as they have the directget
methods available to seek and individual field. The only realistic usage of the header set is to iterator over it's values to see all headers. The list is a perfect compromise in this case.Note also, that due to HTTP headers need to know about ordering, we have already determined with JMH benchmarks that for realistic sizes of HTTP header collections, it is more efficient to do an O(n) search than to have the expense of building a O(log(n)) map. Thus rather than creating an ordered map of HttpFields or a map and a list, we simple create a list of HttpFields. So the
get
methods are actually O(N) as well. This is something we have carefully considered and will continue to monitor.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on assumptions how the user uses this set. They may iterate over it (then we are good), or they may use
contains
and then we're less good. We do not need a sorted set here. HashSet would be faster to build than TreeSet and would have same semantics as the set returned here. If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them). The new code doesn't look as super optimized to me because of AbstractSet. If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, but they are assumptions based on good experience and real world data. See the detailed benchmarks and discussions in #3682
Except that HashSet is not case insensitive, so additional work is need to change cases etc.
again, not suitable for case insensitive. Also, the order we want is not the natural sort order of the keys.
We totally agree that streams are not really optimal.
I would not say it is super optimized, but it is somewhat optimized as informed by #3682
We tend not to clutter the code with such comments, as the issues and PRs that have all the detail are both persistent and linked from source code. However, I'm not opposed to making the comment in the HttpFields class generally that it is implemented internally as a list. I'll reopen #11811 to remind myself to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Set<String>
returned here is not case insensitive either, by virtue of inheriting fromAbstractSet
and defining only iterator and size.AbstractSet
usesObject.equals
semantics.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi said:
Good point! I'm working on fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11846... still a work in progress.