You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm using Lmax Disruptor with 3 producers and 30 consumers with busy spin strategy. The ring buffer size is 65536.
The producers are publishing events with a high rate of 20.000 events/second and after a while (when the disruptor get full) the disruptor hangs (the producers are blocked and the consumers are not processing anymore the events).
I have added below the thread dump for producers and consumers. Consumers stack traces:
"Thread-72" - Thread t@124
java.lang.Thread.State: RUNNABLE
at app//com.lmax.disruptor.BusySpinWaitStrategy.waitFor(BusySpinWaitStrategy.java:39)
at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base@11.0.20/java.lang.Thread.run(Thread.java:829)
Locked ownable synchronizers:
- None
"Thread-73" - Thread t@125
java.lang.Thread.State: RUNNABLE
at app//com.lmax.disruptor.BusySpinWaitStrategy.waitFor(BusySpinWaitStrategy.java:39)
at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
at java.base@11.0.20/java.lang.Thread.run(Thread.java:829)
Locked ownable synchronizers:
- None
Producers stack traces:
"thread-context-16" - Thread t@142
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
.............
Locked ownable synchronizers:
- None
...........
"pool-9-thread-1" - Thread t@144
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
.............
Locked ownable synchronizers:
- locked <15a0ebf2> (a java.util.concurrent.ThreadPoolExecutor$Worker)
.............
"DefaultQuartzScheduler_Worker-1" - Thread t@147
java.lang.Thread.State: TIMED_WAITING
at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
.............
Locked ownable synchronizers:
- None
The producers are using the following code snippet for publishing events:
var ringBuffer = disruptor.getRingBuffer();
final long sequence = ringBuffer.next();
try {
Command command = disruptor.get(sequence);
command.setType("SAMPLE");
....
} finally {
disruptor.getRingBuffer().publish(sequence);
}
and the producers are blocked in an endless loop inside ringBuffer.getNext() method.
The consumers are using a modulo load balancing for processing the events based on the index of command.
After blocking I can see the following values in the disruptor:
Wild guess: It looks like one of the consumers has failed to notify the ringbuffer that it has progressed past a particular sequence (the minimum gating sequence).
We'd need to see the code you're using to set up the consumers and publishers to know more.
I note the issue description producers are at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) , which is in effect a LockSupport.parkNanos(1) call. Was this issue produced on a RHEL8+ environment perhaps? If so, sounds like this is similar to #219 and the increased CPU that short LockSupport.parkNanos intervals will face on RHEL 8+. I also recently found an issue where producers where consuming high CPU on com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) when the buffer was full. Once triggered, this could perpetuate the issue state as multiple producers could potentially spin too much on that CPU busy wait and prevent consumers from getting appropriate CPU themselves to clear the buffers. It appears MultiProducerSequencer would need a larger/configurable interval like was done in the SleepingWaitStrategy.java or actually use any larger interval available from its wait strategy as noted in the TODO for this line.
Hello Team,
I'm using Lmax Disruptor with 3 producers and 30 consumers with busy spin strategy. The ring buffer size is 65536.
The producers are publishing events with a high rate of 20.000 events/second and after a while (when the disruptor get full) the disruptor hangs (the producers are blocked and the consumers are not processing anymore the events).
I have added below the thread dump for producers and consumers.
Consumers stack traces:
Producers stack traces:
The producers are using the following code snippet for publishing events:
and the producers are blocked in an endless loop inside ringBuffer.getNext() method.
The consumers are using a modulo load balancing for processing the events based on the index of command.
After blocking I can see the following values in the disruptor:
Could you please help me to solve this issue?
Thank you in advance!
Bogdan
The text was updated successfully, but these errors were encountered: