-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix #11811 insensitive header name set #11823
Fix #11811 insensitive header name set #11823
Conversation
Fix #11811 insensitive header name set by: + Using a EnumSet and TreeSet to ensure no duplicates in the set + Using an ArrayList to preserve the ordering (not necessary, but useful).
jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java
Show resolved
Hide resolved
jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java
Outdated
Show resolved
Hide resolved
return new AbstractSet<>() | ||
{ | ||
@Override | ||
public Iterator<String> iterator() | ||
{ | ||
return list.iterator(); | ||
} | ||
|
||
@Override | ||
public int size() | ||
{ | ||
return list.size(); | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically Set implementations support random access or O(log(size)) and rarely have linear access. This may cause surprise performance issues in the calling code.
What about replacing the List<String> list
variable with a hash set, while retaining all other logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The returned set will have "random access" via the set API, but it will be implemented by AbstractSet
as an O(n) search. Moreover, a user is unlikely to be using the header name set for random access, as they have the direct get
methods available to seek and individual field. The only realistic usage of the header set is to iterator over it's values to see all headers. The list is a perfect compromise in this case.
Note also, that due to HTTP headers need to know about ordering, we have already determined with JMH benchmarks that for realistic sizes of HTTP header collections, it is more efficient to do an O(n) search than to have the expense of building a O(log(n)) map. Thus rather than creating an ordered map of HttpFields or a map and a list, we simple create a list of HttpFields. So the get
methods are actually O(N) as well. This is something we have carefully considered and will continue to monitor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on assumptions how the user uses this set. They may iterate over it (then we are good), or they may use contains
and then we're less good. We do not need a sorted set here. HashSet would be faster to build than TreeSet and would have same semantics as the set returned here. If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them). The new code doesn't look as super optimized to me because of AbstractSet. If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is based on assumptions how the user uses this set.
Indeed, but they are assumptions based on good experience and real world data. See the detailed benchmarks and discussions in #3682
They may iterate over it (then we are good), or they may use
contains
and then we're less good.
Our evidence is that if they usecontains
we are still good. Only if they usecontains
many times does the list performance approach the cost of building a hashmap or other structure.
We do not need a sorted set here.
It is not needed, but HTTP headers are ordered, so maintaining that order can improve compliance with many SHOULD aspects of the RFCs that do specify some aspects of ordering.
HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.
Except that HashSet is not case insensitive, so additional work is need to change cases etc.
If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.
again, not suitable for case insensitive. Also, the order we want is not the natural sort order of the keys.
BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them).
We totally agree that streams are not really optimal.
The new code doesn't look as super optimized to me because of AbstractSet.
I would not say it is super optimized, but it is somewhat optimized as informed by #3682
If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").
We tend not to clutter the code with such comments, as the issues and PRs that have all the detail are both persistent and linked from source code. However, I'm not opposed to making the comment in the HttpFields class generally that it is implemented internally as a list. I'll reopen #11811 to remind myself to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.
Except that HashSet is not case insensitive, so additional work is need to change cases etc.
The Set<String>
returned here is not case insensitive either, by virtue of inheriting from AbstractSet
and defining only iterator and size. AbstractSet
uses Object.equals
semantics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi said:
The Set returned here is not case insensitive either
Good point! I'm working on fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #11846... still a work in progress.
Fix #11811 insensitive header name set by: