Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #11811 insensitive header name set #11823

Merged
merged 3 commits into from
May 22, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@

package org.eclipse.jetty.http;

import java.util.AbstractSet;
import java.util.ArrayList;
import java.util.Collections;
import java.util.EnumSet;
Expand All @@ -24,6 +25,7 @@
import java.util.NoSuchElementException;
import java.util.Objects;
import java.util.Set;
import java.util.TreeSet;
import java.util.concurrent.TimeUnit;
import java.util.function.BiFunction;
import java.util.function.BiPredicate;
Expand Down Expand Up @@ -596,7 +598,41 @@ default Enumeration<String> getFieldNames()
*/
default Set<String> getFieldNamesCollection()
{
return stream().map(HttpField::getName).collect(Collectors.toSet());
Set<HttpHeader> seenByHeader = EnumSet.noneOf(HttpHeader.class);
Set<String> seenByName = null;
List<String> list = new ArrayList<>(size());
lachlan-roberts marked this conversation as resolved.
Show resolved Hide resolved

for (HttpField f : this)
{
HttpHeader header = f.getHeader();
if (header == null)
{
if (seenByName == null)
seenByName = new TreeSet<>(String::compareToIgnoreCase);
if (seenByName.add(f.getName()))
list.add(f.getName());
}
else if (seenByHeader.add(header))
{
list.add(f.getName());
}
}

// use the list to retain a rough ordering
return new AbstractSet<>()
{
@Override
public Iterator<String> iterator()
{
return list.iterator();
}

@Override
public int size()
{
return list.size();
}
};
Comment on lines +622 to +635
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically Set implementations support random access or O(log(size)) and rarely have linear access. This may cause surprise performance issues in the calling code.
What about replacing the List<String> list variable with a hash set, while retaining all other logic?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The returned set will have "random access" via the set API, but it will be implemented by AbstractSet as an O(n) search. Moreover, a user is unlikely to be using the header name set for random access, as they have the direct get methods available to seek and individual field. The only realistic usage of the header set is to iterator over it's values to see all headers. The list is a perfect compromise in this case.

Note also, that due to HTTP headers need to know about ordering, we have already determined with JMH benchmarks that for realistic sizes of HTTP header collections, it is more efficient to do an O(n) search than to have the expense of building a O(log(n)) map. Thus rather than creating an ordered map of HttpFields or a map and a list, we simple create a list of HttpFields. So the get methods are actually O(N) as well. This is something we have carefully considered and will continue to monitor.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on assumptions how the user uses this set. They may iterate over it (then we are good), or they may use contains and then we're less good. We do not need a sorted set here. HashSet would be faster to build than TreeSet and would have same semantics as the set returned here. If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them). The new code doesn't look as super optimized to me because of AbstractSet. If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").

Copy link
Contributor Author

@gregw gregw May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is based on assumptions how the user uses this set.

Indeed, but they are assumptions based on good experience and real world data. See the detailed benchmarks and discussions in #3682

They may iterate over it (then we are good), or they may use contains and then we're less good.
Our evidence is that if they use contains we are still good. Only if they use contains many times does the list performance approach the cost of building a hashmap or other structure.

We do not need a sorted set here.
It is not needed, but HTTP headers are ordered, so maintaining that order can improve compliance with many SHOULD aspects of the RFCs that do specify some aspects of ordering.

HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.

Except that HashSet is not case insensitive, so additional work is need to change cases etc.

If ordering of headers returned by this method is somehow important, LinkedHashSet could provide for that.

again, not suitable for case insensitive. Also, the order we want is not the natural sort order of the keys.

BTW, from code maintainer perspective, the previous version of the code didn't look like super optimized -- it used streams' along with their overhead (for example in Trino project we area heavy stream users except for data runtime, where we avoid them).

We totally agree that streams are not really optimal.

The new code doesn't look as super optimized to me because of AbstractSet.

I would not say it is super optimized, but it is somewhat optimized as informed by #3682

If it indeed is carefully optimized now, my personal preference would be to put a code comment informing the reader about the tradeoffs considered (like "this is optimal for requests that have no more than about x headers").

We tend not to clutter the code with such comments, as the issues and PRs that have all the detail are both persistent and linked from source code. However, I'm not opposed to making the comment in the HttpFields class generally that it is implemented internally as a list. I'll reopen #11811 to remind myself to do that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HashSet would be faster to build than TreeSet and would have same semantics as the set returned here.

Except that HashSet is not case insensitive, so additional work is need to change cases etc.

The Set<String> returned here is not case insensitive either, by virtue of inheriting from AbstractSet and defining only iterator and size. AbstractSet uses Object.equals semantics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@findepi said:

The Set returned here is not case insensitive either

Good point! I'm working on fixes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #11846... still a work in progress.

}

/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,47 @@ public void testGet()
assertThrows(NoSuchElementException.class, () -> header.getField(2));
}

@Test
public void testCaseInsensitive()
{
HttpFields header = HttpFields.build()
.add("expect", "100")
.add("RaNdOm", "value")
.add("Accept-Charset", "UTF-8")
.add("accept-charset", "UTF-16")
.add("foo-bar", "one")
.add("Foo-Bar", "two")
.asImmutable();

assertThat(header.get("expect"), is("100"));
assertThat(header.get("Expect"), is("100"));
assertThat(header.get("EXPECT"), is("100"));
assertThat(header.get("eXpEcT"), is("100"));
assertThat(header.get(HttpHeader.EXPECT), is("100"));

assertThat(header.get("random"), is("value"));
assertThat(header.get("Random"), is("value"));
assertThat(header.get("RANDOM"), is("value"));
assertThat(header.get("rAnDoM"), is("value"));
assertThat(header.get("RaNdOm"), is("value"));

assertThat(header.get("Accept-Charset"), is("UTF-8"));
assertThat(header.get("accept-charset"), is("UTF-8"));
assertThat(header.get(HttpHeader.ACCEPT_CHARSET), is("UTF-8"));

assertThat(header.getValuesList("Accept-Charset"), contains("UTF-8", "UTF-16"));
assertThat(header.getValuesList("accept-charset"), contains("UTF-8", "UTF-16"));
assertThat(header.getValuesList(HttpHeader.ACCEPT_CHARSET), contains("UTF-8", "UTF-16"));

assertThat(header.get("foo-bar"), is("one"));
assertThat(header.get("Foo-Bar"), is("one"));
assertThat(header.getValuesList("foo-bar"), contains("one", "two"));
assertThat(header.getValuesList("Foo-Bar"), contains("one", "two"));

// We know the order of the set is deterministic
assertThat(header.getFieldNamesCollection(), contains("expect", "RaNdOm", "Accept-Charset", "foo-bar"));
}

@ParameterizedTest
@MethodSource("mutables")
public void testGetKnown(HttpFields.Mutable header)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,60 @@ public void destroy() throws Exception
LifeCycle.stop(_server);
}

@Test
public void testCaseInsensitiveHeaders() throws Exception
{
final AtomicReference<Boolean> resultIsSecure = new AtomicReference<>();

startServer(new HttpServlet()
{
@Override
protected void doGet(HttpServletRequest request, HttpServletResponse resp)
{
assertThat(request.getHeader("accept"), is("*/*"));
assertThat(request.getHeader("Accept"), is("*/*"));
assertThat(request.getHeader("ACCEPT"), is("*/*"));
assertThat(request.getHeader("AcCePt"), is("*/*"));

assertThat(request.getHeader("random"), is("value"));
assertThat(request.getHeader("Random"), is("value"));
assertThat(request.getHeader("RANDOM"), is("value"));
assertThat(request.getHeader("rAnDoM"), is("value"));
assertThat(request.getHeader("RaNdOm"), is("value"));

assertThat(request.getHeader("Accept-Charset"), is("UTF-8"));
assertThat(request.getHeader("accept-charset"), is("UTF-8"));

assertThat(Collections.list(request.getHeaders("Accept-Charset")), contains("UTF-8", "UTF-16"));
assertThat(Collections.list(request.getHeaders("accept-charset")), contains("UTF-8", "UTF-16"));

assertThat(request.getHeader("foo-bar"), is("one"));
assertThat(request.getHeader("Foo-Bar"), is("one"));
assertThat(Collections.list(request.getHeaders("foo-bar")), contains("one", "two"));
assertThat(Collections.list(request.getHeaders("Foo-Bar")), contains("one", "two"));

assertThat(Collections.list(request.getHeaderNames()),
contains("Host", "Connection", "Accept", "RaNdOm", "Accept-Charset", "Foo-Bar"));
}
});

String rawResponse = _connector.getResponse(
"""
GET / HTTP/1.1
Host: local
Connection: close
Accept: */*
RaNdOm: value
Accept-Charset: UTF-8
accept-charset: UTF-16
Foo-Bar: one
foo-bar: two

""");
HttpTester.Response response = HttpTester.parseResponse(rawResponse);
assertThat(response.getStatus(), is(HttpStatus.OK_200));
}

@Test
public void testIsSecure() throws Exception
{
Expand Down
Loading