Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Querystring Search Performance #1252

Closed
7 tasks
tisto opened this issue Oct 27, 2021 · 17 comments
Closed
7 tasks

Querystring Search Performance #1252

tisto opened this issue Oct 27, 2021 · 17 comments

Comments

@tisto
Copy link
Member

tisto commented Oct 27, 2021

Possible Solutions

Use GET instead of POST

  • Send GET request instead of POST (for @querystring-search)
  • Send query payload as body (query param contains arrays and nested structures)
  • The URL should contain a subtraversal path: (e.g. "@querystring-search/1669e4ae9c5258f973ea00a6c31578a9)
GET /plone/folder/@querystring-search/1669e4ae9c5258f973ea00a6c31578a9 HTTP/1.1
Accept: application/json
Authorization: Basic YWRtaW46c2VjcmV0
Content-Type: application/json

{
    "body": {
        "b_size": "2",
        "b_start": "0",
        "fullobjects": "False",
        "limit": "10",
        "query": [
            {
                "i": "Title",
                "o": "plone.app.querystring.operation.string.is",
                "v": "Welcome to Plone"
            },
            {
                "i": "path",
                "o": "plone.app.querystring.operation.string.path",
                "v": "/news"
            }
        ],
        "sort_on": "sortable_title",
        "sort_order": "reverse",
    },
    "id": "doc1",
    "method": "GET",
}

Problems to solve

  • plone.rest needs to be able to send a GET request with payload (see Get with payload plone.rest#125)
  • plone.rest needs to be able to encode/decode payload that contains arrays and nested structures (see Get with payload plone.rest#125)
  • plone.rest needs to be able to get/ignore the hash in the subtraversal (might just work because of acquisition?)
  • the frontend needs to be able to generate a unique hash value for a given query
  • the frontend needs to be able to send a GET request with payload in the body (that works with arrays and nested structures)
  • Varnish needs to be able to cache the URL "GET /plone/folder/@querystring-search/1669e4ae9c5258f973ea00a6c31578a9"
  • Cloudflare and other services need to be able to cache the URL "GET /plone/folder/@querystring-search/1669e4ae9c5258f973ea00a6c31578a9"
@tisto
Copy link
Member Author

tisto commented Oct 29, 2021

jMeter Performance Test Results

querystring-search performance tests

Jenkins (3)

respondingTimeGraphPerTestCaseMode

-> you can see that the results are actually pretty close to each other, at least when we hammer the instance a bit.
-> with less load, you can see that "fullobjects=1" takes double the time than the request without, when doing lots of request that does not matter that much any longer

spread of the results for one kind of request

Jenkins (2)

-> you can see that even for the exact same request (in this case, content object with fullobjects=1) the response times are pretty widespread, this looks like a load problem

cc @jensens @cekk @sneridagh

@jensens
Copy link
Member

jensens commented Oct 29, 2021

See linked PR at ZCatalog above here.

@tisto
Copy link
Member Author

tisto commented Oct 31, 2021

@jensens on our way back home from Sorrento, @jackahl and I came up with an idea of how to solve the querystring-search caching issue. We could, as discussed, use a GET request instead of POST and use subtraversal for a unique URL. Though, instead of either encoding the complex query in the URL as GET params (which has lots of possible side effects and the character limit), we could just send the query in the body and generate a unique hash in the URL, to make it unique.

I outlined the idea above:

#1252 (comment)

And I also started to play around with plone.rest, to see if that works:

plone/plone.rest#125

Though, the main question would be if we get away with bending the HTTP standard:

https://stackoverflow.com/questions/978061/http-get-with-request-body?answertab=votes#tab-top

and if Varnish/Cloudflare or any other system (Python requests lib, curl, etc. seem to work) just works or if we run into issues.

The other option would be to send the encoded querystring in the URL as subtraversal, with the problem that it is pretty opaque for devs and that it can run into the 2000 characters limit:

https://stackoverflow.com/questions/417142/what-is-the-maximum-length-of-a-url-in-different-browsers

@jackahl gave me the idea for the solution we came up with when he proposed to send the "clean" query within the HTTP body, just for reference and dev convenience.

I'd love to hear what you think! :)

cc @sneridagh @ericof

@lukasgraf @buchi did you ever had the need to cache POST requests in plone.restapi?

@jensens
Copy link
Member

jensens commented Oct 31, 2021

My overall reaction: Great idea, using a hash and sub-traversal .... but is this within the spec?

Specification wise (HTTP 1.1) in RFC 7231 https://www.rfc-editor.org/rfc/rfc7231#page-25

A payload within a GET request message has no defined semantics;
sending a payload body on a GET request might cause some existing
implementations to reject the request.

When we implement it this way we have no guarantee if webserver, caches, web application firewalls, and such operating between the client and the api-server are working correctly.

My conclusion: No, better not. We open pandoras box here.

I would propose to check the length of the encoded payload, and if the length exceeds the limit: go with a POST.
According to the last URL in the previous post here, this is for a JS-generated request something around 2k is save, which is a lot for a query. Browser detection and increasing the limit based on this could be an option.

Inspired by your idea to use the body, we could use the Headers to send the payload (combined with a hash). But this feels somehow wrong and hackish as well, even if we would kind of stay within the specs.

Overall this is not that satisfying

@tisto
Copy link
Member Author

tisto commented Oct 31, 2021

@jensens I share your doubts and I agree that this could cause problems. Though, Elastic Search is doing exactly this (GET with a query in the payload):

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-search.html#search-search-api-example

Therefore it might be worth a shot. If we would run into issues, this would be a deal-breaker of course.

Making the frontend decide to use either GET or POST could be a valid way to go. Though we still have to fix the nested querystring issue and this would mean that long queries can not be cached at all.

I thought about using headers as well. Though, as you said, this feels hacky and there is also a size limit on HTTP headers. The body is the only size-safe way.

@tisto
Copy link
Member Author

tisto commented Nov 1, 2021

Just a follow up: Solr also uses GET for search requests and it seems that GET is even the default here:

https://solr.apache.org/guide/8_10/json-request-api.html
https://dzone.com/articles/solr-select-query-get-vs-post

@tisto
Copy link
Member Author

tisto commented Nov 1, 2021

@sneridagh @tiberiuichim @plone/volto-team my main challenge right now it how to properly convert a nested dict with an array to a proper querystring. In Volto we are using the "query-string" lib which intentionally does not support nesting:

https://github.com/sindresorhus/query-string#nesting

After digging into this rabbit whole I can fully understand this decision. Though, this would mean that on Volto (or any other restapi request) needs to "stringify" an object within the request.

In JS this would look like this:

const queryString = require('query-string');

queryString.stringify({
	foo: 'bar',
	nested: JSON.stringify({
		unicorn: 'cake'
	})
});
//=> 'foo=bar&nested=%7B%22unicorn%22%3A%22cake%22%7D'

In Python like this:

        payload = {
            "query": json.dumps(
                [
                    {
                        "i": "Title",
                        "o": "plone.app.querystring.operation.string.is",
                        "v": "Welcome to Plone",
                    },
                    {
                        "i": "path",
                        "o": "plone.app.querystring.operation.string.path",
                        "v": "/news",
                    },
                ]
            ),
            "sort_on": "sortable_title",
            "sort_order": "reverse",
            "limit": "10",
            "fullobjects": "False",
            "b_start": "0",
            "b_size": "2",
        }

        response = requests.get(
            self.document.absolute_url(),
            headers={"Accept": "application/json"},
            params=payload,
            auth=(SITE_OWNER_NAME, SITE_OWNER_PASSWORD),
        )

(link to my playground in plone.rest: https://github.com/plone/plone.rest/pull/125/files)

Though, there are other libraries (like "qs") which seem to support nested querystrings:

> var qs = require('qs');
undefined
> query = {
...     "query": [
...         {
.....             "i": "Title",
.....             "o": "plone.app.querystring.operation.string.is",
.....             "v": "Welcome to Plone",
.....         },
...         {
.....             "i": "path",
.....             "o": "plone.app.querystring.operation.string.path",
.....             "v": "/news",
.....         },
...     ],
...     "sort_on": "sortable_title",
...     "sort_order": "reverse",
...     "limit": "10",
...     "fullobjects": "False",
...     "b_start": "0",
...     "b_size": "2",
... }
{
  query: [
    {
      i: 'Title',
      o: 'plone.app.querystring.operation.string.is',
      v: 'Welcome to Plone'
    },
    {
      i: 'path',
      o: 'plone.app.querystring.operation.string.path',
      v: '/news'
    }
  ],
  sort_on: 'sortable_title',
  sort_order: 'reverse',
  limit: '10',
  fullobjects: 'False',
  b_start: '0',
  b_size: '2'
}
> var str = qs.stringify(query)
undefined
> console.log(str)
query%5B0%5D%5Bi%5D=Title&query%5B0%5D%5Bo%5D=plone.app.querystring.operation.string.is&query%5B0%5D%5Bv%5D=Welcome%20to%20Plone&query%5B1%5D%5Bi%5D=path&query%5B1%5D%5Bo%5D=plone.app.querystring.operation.string.path&query%5B1%5D%5Bv%5D=%2Fnews&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2
var out = qs.parse(str)
undefined
> console.log(out)
{
  query: [
    {
      i: 'Title',
      o: 'plone.app.querystring.operation.string.is',
      v: 'Welcome to Plone'
    },
    {
      i: 'path',
      o: 'plone.app.querystring.operation.string.path',
      v: '/news'
    }
  ],
  sort_on: 'sortable_title',
  sort_order: 'reverse',
  limit: '10',
  fullobjects: 'False',
  b_start: '0',
  b_size: '2'
}
````

See https://github.com/ljharb/qs#parsing-objects for details.

Let me know what you think.

@sneridagh
Copy link
Member

At first sight, I guess it's better to use something that supports it, so no conversion whatsoever needed (in either side).

Fact: qs: 6k stars,query-string: 5k. Also, Sindre Sorhus is known to be quite opinionated.

If we choose to change to qs, how about the Python parser? Does it detects the nesting well?

@jensens
Copy link
Member

jensens commented Nov 2, 2021

Back to work after a day off and thinking further.

  • we can give the GET-with-Body approach a try, but should have a plan B if it does not work out. Probably in 99% of cases its fine.
  • query string encoding of a string should be possible as well. In Python there is https://pypi.org/project/jsonurl/ - if this is compatible with the way JS libs do send it?
  • doing POST for larger queries is IMO not a bad thing. Usually our problems queries with need of caching are more on the simpler side.
  • we also could introduce aliases for the crazy-long operations strings, which probably save us many bytes in larger queries (I guess so at least) (like paq.o.str.path - this is a registry thing anyway).

@sneridagh
Copy link
Member

@jensens +1 to all.

@tiberiuichim
Copy link
Contributor

Side info, but maybe good to know: we use 'paqo' in the serialized p.a.querystring query for the URL, in search block: https://github.com/plone/volto/blob/master/src/components/manage/Blocks/Search/hocs/withSearch.jsx#L177

@robgietema
Copy link
Member

The HTTP spec doesn't say you can't send data with a GET request but it all depends on the frameworks used if it is supported or not. Another issue with this approach is that the requests can't be cached. If you use query string parameters you are able to cache the requests. There are however things to take in to account when using the query string for this;

Complex data formats

JSON is not supported by default as a query string. There is also no "global" standard on how to do this. There are multiple methods of encoding a JSON object to a string, let's take the following query:

  query: [
    {
      i: 'Title',
      o: 'plone.app.querystring.operation.string.is',
      v: 'Welcome to Plone'
    },
    {
      i: 'path',
      o: 'plone.app.querystring.operation.string.path',
      v: '/news'
    }
  ],
  sort_on: 'sortable_title',
  sort_order: 'reverse',
  limit: '10',
  fullobjects: 'False',
  b_start: '0',
  b_size: '2'
};

Which results in the following transforms:

Encode URI: %7B%22query%22%3A%5B%7B%22i%22%3A%22Title%22%2C%22o%22%3A%22plone.app.querystring.operation.string.is%22%2C%22v%22%3A%22Welcome%20to%20Plone%22%7D%2C%7B%22i%22%3A%22path%22%2C%22o%22%3A%22plone.app.querystring.operation.string.path%22%2C%22v%22%3A%22%2Fnews%22%7D%5D%2C%22sort_on%22%3A%22sortable_title%22%2C%22sort_order%22%3A%22reverse%22%2C%22limit%22%3A%2210%22%2C%22fullobjects%22%3A%22False%22%2C%22b_start%22%3A%220%22%2C%22b_size%22%3A%222%22%7D (453)
Rison:      (b_size:'2',b_start:'0',fullobjects:False,limit:'10',query:!((i:Title,o:plone.app.querystring.operation.string.is,v:'Welcome to Plone'),(i:path,o:plone.app.querystring.operation.string.path,v:/news)),sort_on:sortable_title,sort_order:reverse) (242)
O-Rison:    b_size:'2',b_start:'0',fullobjects:False,limit:'10',query:!((i:Title,o:plone.app.querystring.operation.string.is,v:'Welcome to Plone'),(i:path,o:plone.app.querystring.operation.string.path,v:/news)),sort_on:sortable_title,sort_order:reverse (240)
JSURL:      ~(query~(~(i~'Title~o~'plone.app.querystring.operation.string.is~v~'Welcome*20to*20Plone)~(i~'path~o~'plone.app.querystring.operation.string.path~v~'*2fnews))~sort_on~'sortable_title~sort_order~'reverse~limit~'10~fullobjects~'False~b_start~'0~b_size~'2) (253)
QS:         query[0][i]=Title&query[0][o]=plone.app.querystring.operation.string.is&query[0][v]=Welcome to Plone&query[1][i]=path&query[1][o]=plone.app.querystring.operation.string.path&query[1][v]=/news&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2 (279)
URLON:      $query@$i=Title&o=plone.app.querystring.operation.string.is&v=Welcome%20to%20Plone;&$i=path&o=plone.app.querystring.operation.string.path&v=//news;;&sort_on=sortable_title&sort_order=reverse&limit=10&fullobjects=False&b_start=0&b_size=2 (236)

The encode uri method is in my opinion the safest and most straight forward implementation and shouldn't be hard to implement both on the frontend and the backend.

Length

Altough the HTTP spec doesn't limit the length of a querystring all browsers do. Currenty the following limits are in place:

  • Microsoft Edge: 81k characters
  • Chrome: 64k
  • Firefox: 64k
  • Safari: >80k
  • Opera: >190k
  • Microsoft Internet Explorer: 2k
  • Apache: configurable
  • Microsoft IIS: 16k
  • Nginx: configurable
  • haproxy: configurable
  • varnish: configurable

Looking at the above examples even the largest (url encoding) is "only" 453 characters so even if you want to support Internet Explorer almost all requests should work.

@sneridagh
Copy link
Member

LGTM, let's discuss this.

Also, FYI, I just remembered that this PR is in place: plone/volto#4159

No clue how they are encoding the state in the URL. I guess we should then sync before merging.

@tisto
Copy link
Member Author

tisto commented Mar 31, 2023

@robgietema thanks for the write-up. We have to take into account the F5 Big IP component:

https://my.f5.com/manage/s/article/K60450445 (default is 2048 bytes)

@jensens
Copy link
Member

jensens commented Mar 31, 2023

We could think about shorting strings like plone.app.querystring.operation.string.is.

@tiberiuichim
Copy link
Contributor

@jensens see #1252 (comment)

@ericof
Copy link
Member

ericof commented Apr 19, 2023

@tisto Should we consider this done?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants