mqtt_node keeping large amount of connection details in memory #2688
-
As reported in rabbitmq-users thread "High memory consumption (and message queue) in mqtt_node" we see RabbitMQ consuming more and more memory and never draining the Erlang mailbox. We've created a repo that recreates the issue with a high likelyhood by creating a connection, channel, consumer and queue per connection. One can tune the connection to say 10k, 20k, etc until the situation is triggered. The repo is here: mqtt-connections In addition to the screenshots attached in rabbitmq-users mail thread this is what the stack trace looks like |
Beta Was this translation helpful? Give feedback.
Replies: 14 comments 11 replies
-
run |
Beta Was this translation helpful? Give feedback.
-
What WAL size are you using?
See: https://www.rabbitmq.com/mqtt.html#consensus
…On Thu, 17 Dec 2020 at 21:37, Carl Hörberg ***@***.***> wrote:
run node churn.js ***@***.*** 1 and see how the
ra_log_ets grows indefinitely
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#2688 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJAHFDMZZEF35UC7FWALL3SVJ225ANCNFSM4VAIPAXQ>
.
--
*Karl Nilsson*
|
Beta Was this translation helpful? Give feedback.
-
tried with 64 MB, no difference |
Beta Was this translation helpful? Give feedback.
-
rabbitmq config, to make high MQTT connection count possible:
then run: The stack trace indicates that it's mostly stuck in this function: rabbitmq-server/deps/rabbitmq_mqtt/src/mqtt_machine.erl Lines 72 to 83 in e3bbdfe |
Beta Was this translation helpful? Give feedback.
-
I do see a difference with a lower |
Beta Was this translation helpful? Give feedback.
-
Attaching the crash.log.gz |
Beta Was this translation helpful? Give feedback.
-
The multiple mqtt_node ets tables seems to be cleared up when the inbox for |
Beta Was this translation helpful? Give feedback.
-
Now, when restarting the server after opened the 10000 connections and then closed them, always results in a failed boot:
Nothing but deleting the quorum directory seems to get it moving. |
Beta Was this translation helpful? Give feedback.
-
I will take a closer look next week. What kind of disks are you using?
On Thu, 17 Dec 2020 at 23:37, Carl Hörberg ***@***.***> wrote:
Now, when restarting the server after opened the 10000 connections and
then closed them, always results in a failed boot:
[error] <0.18031.1> CRASH REPORT Process <0.18031.1> with 0 neighbours exited with reason: {timeout,{gen_statem,
@carl-xps13-fedora'},trigger_election,5000]}} in application_master:init/4 line 138
[info] <0.44.0> Application rabbitmq_mqtt exited with reason: ***@***.***
election,5000]}}
[error] <0.17838.1> BOOT FAILED
[error] <0.17838.1> ===========
[info] <0.44.0> Application rabbitmq_mqtt exited with reason: ***@***.***
election,5000]}}
[error] <0.17838.1> Error during startup: {error,
[error] <0.17838.1> {rabbitmq_mqtt,
[error] <0.17838.1> {bad_return,
[error] <0.17838.1> {{rabbit_mqtt,start,[normal,[]]},
[error] <0.17838.1> {'EXIT',
[error] <0.17838.1> {timeout,
[error] <0.17838.1> {gen_statem,call,
[error] <0.17838.1> ***@***.***'},
[error] <0.17838.1> trigger_election,5000]}}}}}}}
Nothing but deleting the quorum directory seems to get it moving.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2688 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJAHFCTI3C6JXDY65OUCMTSVKI3DANCNFSM4VAIPAXQ>
.
--
*Karl Nilsson*
|
Beta Was this translation helpful? Give feedback.
-
We've tried with AWS GP2, local SSD and RAM disk, no difference, no real
disk usage anyway, just a single core very busy
…On Fri, 18 Dec 2020, 07:59 Karl Nilsson, ***@***.***> wrote:
I will take a closer look next week. What kind of disks are you using?
On Thu, 17 Dec 2020 at 23:37, Carl Hörberg ***@***.***>
wrote:
> Now, when restarting the server after opened the 10000 connections and
> then closed them, always results in a failed boot:
>
> [error] <0.18031.1> CRASH REPORT Process <0.18031.1> with 0 neighbours
exited with reason: {timeout,{gen_statem,
> @carl-xps13-fedora'},trigger_election,5000]}} in
application_master:init/4 line 138
> [info] <0.44.0> Application rabbitmq_mqtt exited with reason:
***@***.***
> election,5000]}}
> [error] <0.17838.1> BOOT FAILED
> [error] <0.17838.1> ===========
> [info] <0.44.0> Application rabbitmq_mqtt exited with reason:
***@***.***
> election,5000]}}
> [error] <0.17838.1> Error during startup: {error,
> [error] <0.17838.1> {rabbitmq_mqtt,
> [error] <0.17838.1> {bad_return,
> [error] <0.17838.1> {{rabbit_mqtt,start,[normal,[]]},
> [error] <0.17838.1> {'EXIT',
> [error] <0.17838.1> {timeout,
> [error] <0.17838.1> {gen_statem,call,
> [error] <0.17838.1> ***@***.***'},
> [error] <0.17838.1> trigger_election,5000]}}}}}}}
>
> Nothing but deleting the quorum directory seems to get it moving.
>
> —
> You are receiving this because you commented.
>
>
> Reply to this email directly, view it on GitHub
> <
#2688 (comment)
>,
> or unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/AAJAHFCTI3C6JXDY65OUCMTSVKI3DANCNFSM4VAIPAXQ
>
> .
>
--
*Karl Nilsson*
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2688 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABL6TW4SJHS2GM2DJEJA4DSVL4WPANCNFSM4VAIPAXQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Awesome! Will try it out in a bit
…On Mon, Dec 21, 2020 at 1:16 PM Karl Nilsson ***@***.***> wrote:
hah a bit too quick, just pushed a fix
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2688 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABL6TWRLMKDNPUYM3VVRETSV44APANCNFSM4VAIPAXQ>
.
|
Beta Was this translation helpful? Give feedback.
-
Just tested it, my initial trials shows a huge difference!
…On Mon, Dec 21, 2020 at 1:45 PM Karl Nilsson ***@***.***> wrote:
I also have an approach that can bring the same test down to a few ms but
it will require a bit more work and testing.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#2688 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABL6TSKK6LOY6IUX53LOE3SV47N7ANCNFSM4VAIPAXQ>
.
|
Beta Was this translation helpful? Give feedback.
-
#2692 seems to address this. |
Beta Was this translation helpful? Give feedback.
-
We'll take a build with #2692 in it for a few spins. |
Beta Was this translation helpful? Give feedback.
#2692 seems to address this.