Fix #11811 insensitive header name set #11823

gregw · 2024-05-21T01:56:37Z

Fix #11811 insensitive header name set by:

Using a EnumSet and TreeSet to ensure no duplicates in the set
Using an ArrayList to preserve the ordering (not necessary, but useful).

Fix #11811 insensitive header name set by: + Using a EnumSet and TreeSet to ensure no duplicates in the set + Using an ArrayList to preserve the ordering (not necessary, but useful).

jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java

…aderNameSet

findepi · 2024-05-23T11:57:33Z

jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java

+        return new AbstractSet<>()
+        {
+            @Override
+            public Iterator<String> iterator()
+            {
+                return list.iterator();
+            }
+
+            @Override
+            public int size()
+            {
+                return list.size();
+            }
+        };


Typically Set implementations support random access or O(log(size)) and rarely have linear access. This may cause surprise performance issues in the calling code.
What about replacing the List<String> list variable with a hash set, while retaining all other logic?

The returned set will have "random access" via the set API, but it will be implemented by AbstractSet as an O(n) search. Moreover, a user is unlikely to be using the header name set for random access, as they have the direct get methods available to seek and individual field. The only realistic usage of the header set is to iterator over it's values to see all headers. The list is a perfect compromise in this case.

Note also, that due to HTTP headers need to know about ordering, we have already determined with JMH benchmarks that for realistic sizes of HTTP header collections, it is more efficient to do an O(n) search than to have the expense of building a O(log(n)) map. Thus rather than creating an ordered map of HttpFields or a map and a list, we simple create a list of HttpFields. So the get methods are actually O(N) as well. This is something we have carefully considered and will continue to monitor.

This is based on assumptions how the user uses this set. They may iterate over it (then we are good), or they may use contains and then we're less good. We do not need a sorted set here. HashSet would be faster to build than TreeSet and would have same semantics as the set returned here. If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.

BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them). The new code doesn't look as super optimized to me because of AbstractSet. If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").

This is based on assumptions how the user uses this set.

Indeed, but they are assumptions based on good experience and real world data. See the detailed benchmarks and discussions in #3682

They may iterate over it (then we are good), or they may use contains and then we're less good.
Our evidence is that if they use contains we are still good. Only if they use contains many times does the list performance approach the cost of building a hashmap or other structure.

We do not need a sorted set here.
It is not needed, but HTTP headers are ordered, so maintaining that order can improve compliance with many SHOULD aspects of the RFCs that do specify some aspects of ordering.

HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.

Except that HashSet is not case insensitive, so additional work is need to change cases etc.

If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.

again, not suitable for case insensitive. Also, the order we want is not the natural sort order of the keys.

BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them).

We totally agree that streams are not really optimal.

The new code doesn't look as super optimized to me because of AbstractSet.

I would not say it is super optimized, but it is somewhat optimized as informed by #3682

If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").

We tend not to clutter the code with such comments, as the issues and PRs that have all the detail are both persistent and linked from source code. However, I'm not opposed to making the comment in the HttpFields class generally that it is implemented internally as a list. I'll reopen #11811 to remind myself to do that.

HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.

Except that HashSet is not case insensitive, so additional work is need to change cases etc.

The Set<String> returned here is not case insensitive either, by virtue of inheriting from AbstractSet and defining only iterator and size. AbstractSet uses Object.equals semantics.

@findepi said:

The Set returned here is not case insensitive either

Good point! I'm working on fixes.

See #11846... still a work in progress.

Fix #11811 insensitive header name set

309f95c

Fix #11811 insensitive header name set by: + Using a EnumSet and TreeSet to ensure no duplicates in the set + Using an ArrayList to preserve the ordering (not necessary, but useful).

gregw requested review from sbordet, janbartel and lachlan-roberts May 21, 2024 01:57

lachlan-roberts requested changes May 21, 2024

View reviewed changes

jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java Show resolved Hide resolved

jetty-core/jetty-http/src/main/java/org/eclipse/jetty/http/HttpFields.java Outdated Show resolved Hide resolved

updates from review

e62a8af

gregw requested a review from lachlan-roberts May 21, 2024 07:17

janbartel approved these changes May 21, 2024

View reviewed changes

lachlan-roberts approved these changes May 22, 2024

View reviewed changes

Merge branch 'jetty-12.0.x' into fix/jetty-12.0.x/11811/insensitiveHe…

a198651

…aderNameSet

sbordet approved these changes May 22, 2024

View reviewed changes

gregw merged commit 10d0898 into jetty-12.0.x May 22, 2024
9 checks passed

gregw deleted the fix/jetty-12.0.x/11811/insensitiveHeaderNameSet branch May 22, 2024 20:42

findepi reviewed May 23, 2024

View reviewed changes

gregw mentioned this pull request May 27, 2024

Fix contains in HttpFields name set and prove random access to HttpFields via EnumMap not worth it. #11846

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #11811 insensitive header name set #11823

Fix #11811 insensitive header name set #11823

gregw commented May 21, 2024

findepi May 23, 2024

gregw May 23, 2024

findepi May 24, 2024

findepi May 24, 2024

gregw May 27, 2024 •

edited

Loading

findepi May 27, 2024

gregw May 27, 2024

gregw May 27, 2024

Fix #11811 insensitive header name set #11823

Fix #11811 insensitive header name set #11823

Conversation

gregw commented May 21, 2024

findepi May 23, 2024

Choose a reason for hiding this comment

gregw May 23, 2024

Choose a reason for hiding this comment

findepi May 24, 2024

Choose a reason for hiding this comment

findepi May 24, 2024

Choose a reason for hiding this comment

gregw May 27, 2024 • edited Loading

Choose a reason for hiding this comment

findepi May 27, 2024

Choose a reason for hiding this comment

gregw May 27, 2024

Choose a reason for hiding this comment

gregw May 27, 2024

Choose a reason for hiding this comment

gregw May 27, 2024 •

edited

Loading