Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

plugin might fail to shutdown and stalls the shutdown process #298

Closed
kares opened this issue Jun 14, 2021 · 4 comments
Closed

plugin might fail to shutdown and stalls the shutdown process #298

kares opened this issue Jun 14, 2021 · 4 comments

Comments

@kares
Copy link
Contributor

kares commented Jun 14, 2021

Logstash information: Observed under LS 7.13 - plugin version 4.2.4.

JVM (e.g. java -version): OpenJDK 64-Bit Server VM (11.0.10+9 mixed mode)

Description of the problem including expected versus actual behavior:

On SIGTERM Logstash initiates a shutdown process, this process will first stop all inputs.
In rare cases (multiple file inputs helps reproduce the issue) as the plugin attempts to stop it's identity codec, it ends up waking up the cleaner thread but the thread gets right back to (a 5 minute) sleep (when the cleaner.stop -> wakeup happens right before sleep on the other thread).

NOTE: might also be worth looking into reverting the stop sequence - quit the watcher first and than close the codec?

Steps to reproduce:

Unfortunately, haven't been able to reproduce this locally (likely needs more load/time).
Planning to attempt to follow-up with an "isolated" reproducer - attempting to close the codec manually.

Thread dump (parts):

"[main]-pipeline-manager" #57 daemon prio=5 os_prio=0 cpu=87233.16ms elapsed=633608.22s tid=0x00007fb70819a800 nid=0xe6e0 in Object.wait()  [0x00007fb7704d2000]
   java.lang.Thread.State: TIMED_WAITING (on object monitor)
	at java.lang.Object.wait(java.base@11.0.10/Native Method)
	- waiting on <no object reference available>
	at java.lang.Thread.join(java.base@11.0.10/Thread.java:1313)
	- waiting to re-lock in wait() <0x00000006b4711bc8> (a java.lang.Thread)
	at org.jruby.internal.runtime.NativeThread.join(NativeThread.java:75)
	at org.jruby.RubyThread.join(RubyThread.java:1134)
	at java.lang.invoke.LambdaForm$DMH/0x0000000800783040.invokeVirtual(java.base@11.0.10/LambdaForm$DMH)
	at java.lang.invoke.LambdaForm$MH/0x000000080107c440.invoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@11.0.10/Invokers$Holder)
	at usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_codec_minus_multiline_minus_3_dot_0_dot_10.lib.logstash.codecs.identity_map_codec.RUBY$method$stop$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-multiline-3.0.10/lib/logstash/codecs/identity_map_codec.rb:71)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@11.0.10/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x00000008007cd840.invoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@11.0.10/Invokers$Holder)
	at usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_codec_minus_multiline_minus_3_dot_0_dot_10.lib.logstash.codecs.identity_map_codec.RUBY$method$close$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-multiline-3.0.10/lib/logstash/codecs/identity_map_codec.rb:204)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@11.0.10/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x00000008007cd840.invoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786840.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x0000000800786c40.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@11.0.10/Invokers$Holder)
	at usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_4.lib.logstash.inputs.file.RUBY$method$stop$0(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.4/lib/logstash/inputs/file.rb:390)
	at usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_2_dot_4.lib.logstash.inputs.file.RUBY$method$stop$0$__VARARGS__(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-input-file-4.2.4/lib/logstash/inputs/file.rb)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@11.0.10/DirectMethodHandle$Holder)
    ...	

	
"Ruby-0-Thread-294: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-multiline-3.0.10/lib/logstash/codecs/identity_map_codec.rb:51" #334 daemon prio=5 os_prio=0 cpu=269.76ms elapsed=633599.68s tid=0x00007fb528007800 nid=0xea45 waiting on condition  [0x00007fb372aeb000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method)
	- parking to wait for  <0x00000006b6027860> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.10/LockSupport.java:234)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1079)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1369)
	at java.util.concurrent.Semaphore.tryAcquire(java.base@11.0.10/Semaphore.java:415)
	at org.jruby.RubyThread$SleepTask2.run(RubyThread.java:1541)
	at org.jruby.RubyThread$SleepTask2.run(RubyThread.java:1528)
	at org.jruby.RubyThread.executeTask(RubyThread.java:1585)
	at org.jruby.RubyThread.executeTask(RubyThread.java:1571)
	at org.jruby.RubyThread.sleep(RubyThread.java:1452)
	at org.jruby.RubyKernel.sleep(RubyKernel.java:691)
	at org.jruby.RubyKernel$INVOKER$s$0$1$sleep.call(RubyKernel$INVOKER$s$0$1$sleep.gen)
	at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodN.call(JavaMethod.java:815)
	at org.jruby.internal.runtime.methods.DynamicMethod.call(DynamicMethod.java:200)
	at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:172)
	at org.jruby.ir.interpreter.InterpreterEngine.processCall(InterpreterEngine.java:316)
	at org.jruby.ir.interpreter.StartupInterpreterEngine.interpret(StartupInterpreterEngine.java:72)
	at org.jruby.ir.interpreter.Interpreter.INTERPRET_BLOCK(Interpreter.java:116)
	at org.jruby.runtime.MixedModeIRBlockBody.commonYieldPath(MixedModeIRBlockBody.java:137)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:60)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:52)
	at org.jruby.runtime.Block.call(Block.java:139)
	at org.jruby.RubyProc.call(RubyProc.java:318)
	at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:105)
	at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)

   Locked ownable synchronizers:
	- None

... from a different stalled shutdown dump it's visible where the identity_map_codec is stuck sleep-ing:

"Ruby-0-Thread-293: :1" #335 daemon prio=5 os_prio=0 cpu=33.86ms elapsed=12350.21s tid=0x00007f8058045000 nid=0xd33 waiting on condition  [0x00007f7e1bffd000]
   java.lang.Thread.State: TIMED_WAITING (parking)
	at jdk.internal.misc.Unsafe.park(java.base@11.0.10/Native Method)
	- parking to wait for  <0x00000006b677ef10> (a java.util.concurrent.Semaphore$NonfairSync)
	at java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.10/LockSupport.java:234)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1079)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.tryAcquireSharedNanos(java.base@11.0.10/AbstractQueuedSynchronizer.java:1369)
	at java.util.concurrent.Semaphore.tryAcquire(java.base@11.0.10/Semaphore.java:415)
	at org.jruby.RubyThread$SleepTask2.run(RubyThread.java:1541)
	at org.jruby.RubyThread$SleepTask2.run(RubyThread.java:1528)
	at org.jruby.RubyThread.executeTask(RubyThread.java:1585)
	at org.jruby.RubyThread.executeTask(RubyThread.java:1571)
	at org.jruby.RubyThread.sleep(RubyThread.java:1452)
	at org.jruby.RubyKernel.sleep(RubyKernel.java:691)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@11.0.10/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x000000080080d440.invoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x000000080075e040.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x000000080075e440.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x000000080075e040.reinvoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.LambdaForm$MH/0x000000080075e440.guard(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.linkToCallSite(java.base@11.0.10/Invokers$Holder)
	at usr.share.logstash.vendor.bundle.jruby.$2_dot_5_dot_0.gems.logstash_minus_codec_minus_multiline_minus_3_dot_0_dot_10.lib.logstash.codecs.identity_map_codec.RUBY$block$start$1(/usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/logstash-codec-multiline-3.0.10/lib/logstash/codecs/identity_map_codec.rb:53)
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(java.base@11.0.10/DirectMethodHandle$Holder)
	at java.lang.invoke.LambdaForm$MH/0x00000008007a5040.invoke(java.base@11.0.10/LambdaForm$MH)
	at java.lang.invoke.Invokers$Holder.invokeExact_MT(java.base@11.0.10/Invokers$Holder)
	at org.jruby.runtime.CompiledIRBlockBody.callDirect(CompiledIRBlockBody.java:138)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:58)
	at org.jruby.runtime.IRBlockBody.call(IRBlockBody.java:52)
	at org.jruby.runtime.Block.call(Block.java:139)
	at org.jruby.RubyProc.call(RubyProc.java:318)
	at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:105)
	at java.lang.Thread.run(java.base@11.0.10/Thread.java:834)

   Locked ownable synchronizers:
	- None
@kares
Copy link
Contributor Author

kares commented Jun 17, 2021

at a deeper (technical) level the issue is caused by the codec behavior: logstash-plugins/logstash-codec-multiline#68
once the codec issue is resolved this issue should no longer surface (updating logstash-codec-multiline should be enough).

@kares
Copy link
Contributor Author

kares commented Jun 23, 2021

to resolve the shutdown stall it's sufficient to update the multiline codec (to version >= 3.0.11) e.g.
bin/logstash-plugin update logstash-codec-multiline

@roaksoax
Copy link

@kares with the release of the multiline code, should this issue be closed?

@kares
Copy link
Contributor Author

kares commented Jun 29, 2021

good call, yes I thin it's a sufficient as a bug resolution.
initially thought more fixes on the input will be necessary,
while they are somehow related they can be seen as improvements

@kares kares closed this as completed Jun 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants