Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Swift connection pool size is too small #509

Open
jd opened this issue Nov 22, 2017 · 4 comments
Open

Swift connection pool size is too small #509

jd opened this issue Nov 22, 2017 · 4 comments

Comments

@jd
Copy link
Member

jd commented Nov 22, 2017

When Swift is used as an incoming measures driver, the number of connections opened by httplib3 is too small to succeed. On large batches, the error occurs:

[Wed Nov 22 16:17:50.622612 2017] [:error] [pid 31427] 2017-11-22 16:17:50,622 [31427] WARNING  urllib3.connectionpool: Connection pool is full, discarding connection: 10.41.11.12
[Wed Nov 22 16:17:50.625786 2017] [:error] [pid 31427] [remote ::1:80] mod_wsgi (pid=31427): Exception occurred processing WSGI script '/bin/gnocchi-api'.
[Wed Nov 22 16:17:50.625846 2017] [:error] [pid 31427] [remote ::1:80] Traceback (most recent call last):
[Wed Nov 22 16:17:50.625898 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
[Wed Nov 22 16:17:50.625950 2017] [:error] [pid 31427] [remote ::1:80]     resp = self.call_func(req, *args, **self.kwargs)
[Wed Nov 22 16:17:50.625961 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
[Wed Nov 22 16:17:50.625975 2017] [:error] [pid 31427] [remote ::1:80]     return self.func(req, *args, **kwargs)
[Wed Nov 22 16:17:50.625991 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__
[Wed Nov 22 16:17:50.626008 2017] [:error] [pid 31427] [remote ::1:80]     response = req.get_response(self.application)
[Wed Nov 22 16:17:50.626016 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1327, in send
[Wed Nov 22 16:17:50.626031 2017] [:error] [pid 31427] [remote ::1:80]     application, catch_exc_info=False)
[Wed Nov 22 16:17:50.626039 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1291, in call_application
[Wed Nov 22 16:17:50.626052 2017] [:error] [pid 31427] [remote ::1:80]     app_iter = application(self.environ, start_response)
[Wed Nov 22 16:17:50.626060 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/paste/urlmap.py", line 216, in __call__
[Wed Nov 22 16:17:50.626075 2017] [:error] [pid 31427] [remote ::1:80]     return app(environ, start_response)
[Wed Nov 22 16:17:50.626109 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 131, in __call__
[Wed Nov 22 16:17:50.626127 2017] [:error] [pid 31427] [remote ::1:80]     resp = self.call_func(req, *args, **self.kwargs)
[Wed Nov 22 16:17:50.626145 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/dec.py", line 196, in call_func
[Wed Nov 22 16:17:50.626159 2017] [:error] [pid 31427] [remote ::1:80]     return self.func(req, *args, **kwargs)
[Wed Nov 22 16:17:50.626168 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/oslo_middleware/base.py", line 131, in __call__
[Wed Nov 22 16:17:50.626182 2017] [:error] [pid 31427] [remote ::1:80]     response = req.get_response(self.application)
[Wed Nov 22 16:17:50.626190 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1327, in send
[Wed Nov 22 16:17:50.626204 2017] [:error] [pid 31427] [remote ::1:80]     application, catch_exc_info=False)
[Wed Nov 22 16:17:50.626212 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/request.py", line 1291, in call_application
[Wed Nov 22 16:17:50.626226 2017] [:error] [pid 31427] [remote ::1:80]     app_iter = application(self.environ, start_response)
[Wed Nov 22 16:17:50.626235 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/webob/exc.py", line 1162, in __call__
[Wed Nov 22 16:17:50.626263 2017] [:error] [pid 31427] [remote ::1:80]     return self.application(environ, start_response)
[Wed Nov 22 16:17:50.626274 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/rest/app.py", line 71, in __call__
[Wed Nov 22 16:17:50.626290 2017] [:error] [pid 31427] [remote ::1:80]     return self.app(environ, start_response)
[Wed Nov 22 16:17:50.626300 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/pecan/middleware/recursive.py", line 56, in __call__
[Wed Nov 22 16:17:50.626316 2017] [:error] [pid 31427] [remote ::1:80]     return self.application(environ, start_response)
[Wed Nov 22 16:17:50.626325 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/pecan/core.py", line 840, in __call__
[Wed Nov 22 16:17:50.626339 2017] [:error] [pid 31427] [remote ::1:80]     return super(Pecan, self).__call__(environ, start_response)
[Wed Nov 22 16:17:50.626348 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/pecan/core.py", line 683, in __call__
[Wed Nov 22 16:17:50.626361 2017] [:error] [pid 31427] [remote ::1:80]     self.invoke_controller(controller, args, kwargs, state)
[Wed Nov 22 16:17:50.626376 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/pecan/core.py", line 574, in invoke_controller
[Wed Nov 22 16:17:50.626401 2017] [:error] [pid 31427] [remote ::1:80]     result = controller(*args, **kwargs)
[Wed Nov 22 16:17:50.626409 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/rest/api.py", line 1588, in post
[Wed Nov 22 16:17:50.626422 2017] [:error] [pid 31427] [remote ::1:80]     for metric in known_metrics))
[Wed Nov 22 16:17:50.626435 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/incoming/__init__.py", line 128, in add_measures_batch
[Wed Nov 22 16:17:50.626450 2017] [:error] [pid 31427] [remote ::1:80]     in six.iteritems(metrics_and_measures)))
[Wed Nov 22 16:17:50.626458 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 312, in parallel_map
[Wed Nov 22 16:17:50.626471 2017] [:error] [pid 31427] [remote ::1:80]     return list(executor.map(lambda args: fn(*args), list_of_args))
[Wed Nov 22 16:17:50.626479 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 605, in result_iterator
[Wed Nov 22 16:17:50.626493 2017] [:error] [pid 31427] [remote ::1:80]     yield future.result()
[Wed Nov 22 16:17:50.626507 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/concurrent/futures/_base.py", line 429, in result
[Wed Nov 22 16:17:50.626519 2017] [:error] [pid 31427] [remote ::1:80]     return self.__get_result()
[Wed Nov 22 16:17:50.626527 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/concurrent/futures/thread.py", line 62, in run
[Wed Nov 22 16:17:50.626540 2017] [:error] [pid 31427] [remote ::1:80]     result = self.fn(*self.args, **self.kwargs)
[Wed Nov 22 16:17:50.626548 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/utils.py", line 312, in <lambda>
[Wed Nov 22 16:17:50.626560 2017] [:error] [pid 31427] [remote ::1:80]     return list(executor.map(lambda args: fn(*args), list_of_args))
[Wed Nov 22 16:17:50.626567 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/gnocchi/incoming/swift.py", line 58, in _store_new_measures
[Wed Nov 22 16:17:50.626580 2017] [:error] [pid 31427] [remote ::1:80]     data)
[Wed Nov 22 16:17:50.626588 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1829, in put_object
[Wed Nov 22 16:17:50.626602 2017] [:error] [pid 31427] [remote ::1:80]     response_dict=response_dict)
[Wed Nov 22 16:17:50.626610 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1678, in _retry
[Wed Nov 22 16:17:50.626627 2017] [:error] [pid 31427] [remote ::1:80]     service_token=self.service_token, **kwargs)
[Wed Nov 22 16:17:50.626635 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 1317, in put_object
[Wed Nov 22 16:17:50.626647 2017] [:error] [pid 31427] [remote ::1:80]     conn.request('PUT', path, contents, headers)
[Wed Nov 22 16:17:50.626654 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 439, in request
[Wed Nov 22 16:17:50.626666 2017] [:error] [pid 31427] [remote ::1:80]     files=files, **self.requests_args)
[Wed Nov 22 16:17:50.626673 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/swiftclient/client.py", line 422, in _request
[Wed Nov 22 16:17:50.626685 2017] [:error] [pid 31427] [remote ::1:80]     return self.request_session.request(*arg, **kwarg)
[Wed Nov 22 16:17:50.626693 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 508, in request
[Wed Nov 22 16:17:50.626707 2017] [:error] [pid 31427] [remote ::1:80]     resp = self.send(prep, **send_kwargs)
[Wed Nov 22 16:17:50.626714 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/requests/sessions.py", line 618, in send
[Wed Nov 22 16:17:50.626726 2017] [:error] [pid 31427] [remote ::1:80]     r = adapter.send(request, **kwargs)
[Wed Nov 22 16:17:50.626733 2017] [:error] [pid 31427] [remote ::1:80]   File "/usr/lib/python2.7/site-packages/requests/adapters.py", line 490, in send
[Wed Nov 22 16:17:50.626746 2017] [:error] [pid 31427] [remote ::1:80]     raise ConnectionError(err, request=request)
[Wed Nov 22 16:17:50.626771 2017] [:error] [pid 31427] [remote ::1:80] ConnectionError: ('Connection aborted.', BadStatusLine("''",))
@jd jd added the help wanted label Nov 22, 2017
@sileht
Copy link
Member

sileht commented Nov 22, 2017

urllib3 connection pool size is handled by python-request and can be configured like.

a = requests.adapters.HTTPAdapter(pool_maxsize=N)
s = requests.Session()
s.mount('https://', a)
s.mount('https://', a)

But the request.Session() used by swiftclient is not customisable: https://github.com/openstack/python-swiftclient/blob/master/swiftclient/client.py#L384

swiftclient have to be fixed first...

@jd
Copy link
Member Author

jd commented Nov 28, 2017

There is an upstream bug opened here https://bugs.launchpad.net/python-swiftclient/+bug/1671949

@zaitcev
Copy link
Contributor

zaitcev commented Dec 12, 2017

Please pardon my lack of understanding, but I do not follow how the underlying symptom ended with a desire to make the connection pool larger. I say that because, I think, BadStatusLine('') means that the client opens the TCP connection, possibly establishes TLS, sends PUT down, but a read() from the socket ends with EOF. Either Swift's own proxy or its LB (HAproxy in case of RHOS) closed the connection without returning anything.

If we look into urllib3/connectionpool.py:HTTPConnectionPool.urlopen(), then indeed there's no telling who's raising the HTTPException - _get_conn() or _make_request(). Both of them end with "Connection aborted." But it seems to me that only _make_request() can produce BadStatusLine('').

But if so, sizing the pool isn't going to do Gnocchi any good. The Swift or HAproxy which it is using must be tuned for larger number of connections. Heck I'd have a look if someone's segfaulting. As long as it's only a traceback, Swift is good at returning at least 500. But in this case, see the scenario above.

Julien, is there something that am I missing here?

@jd
Copy link
Member Author

jd commented Dec 12, 2017

Either Swift's own proxy or its LB (HAproxy in case of RHOS) closed the connection without returning anything.

I did not have time to dig into that back then, but the swift CLI tool was working properly, and gnocchi-upgrade ran properly. So Swift was functioning.

I don't know what kind of deployment it was and @Prophidys could give us a hint here. We may be able to access the system to take a look

The thing is that as you say, there is a high number of connections opened by Gnocchi due to the way it parallelizes the tasks. Maybe one the source of the problem is that Swift proxy was not configured to handle that many connections (though I kinda doubt it), though the connection pool not being configurable is still a problem :)

sileht added a commit to sileht/gnocchi that referenced this issue Apr 27, 2018
By default, python-requests share a pool for the whole process and this
one have only 10 parallels connections. This change use the
keystoneauth1 adapter. So, we have a pool dedicated for swift and we can
configure it. And we also get all tcp keepalive workaround for free.

Closes: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue Apr 27, 2018
By default, python-requests share a pool for the whole process and this
one have only 10 parallels connections. This change use the
keystoneauth1 adapter. So, we have a pool dedicated for swift and we can
configure it. And we also get all tcp keepalive workaround for free.

This allows keep the default for 'retries'.

Closes: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue May 17, 2018
By default, python-requests share a pool for the whole process and this
one have only 10 parallels connections. This change use the
keystoneauth1 adapter. So, we have a pool dedicated for swift and we can
configure it. And we also get all tcp keepalive workaround for free.

This allows keep the default for 'retries'.

Closes: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue May 17, 2018
By default, python-requests share a pool for the whole process and this
one have only 10 parallels connections. This change use the
keystoneauth1 adapter. So, we have a pool dedicated for swift and we can
configure it. And we also get all tcp keepalive workaround for free.

This allows keep the default for 'retries'.

Closes: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: gnocchixyz#509
sileht added a commit to sileht/gnocchi that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: gnocchixyz#509
mergify bot pushed a commit that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: #509
sileht added a commit to sileht/gnocchi that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: gnocchixyz#509
(cherry picked from commit bc18ebd)
mergify bot pushed a commit that referenced this issue May 22, 2018
This change removes usage of threads with swift driver.

This avoids to get "Connection aborted" because a thread is stuck
and the server side decide to break the connection.

Related-bug: #509
(cherry picked from commit bc18ebd)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants