Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lmax Disruptor Version 4.0.0 with MultiProducers blocks in ringbuffer.next() call #463

Open
oanesbogdan opened this issue Oct 10, 2023 · 2 comments

Comments

@oanesbogdan
Copy link

oanesbogdan commented Oct 10, 2023

Hello Team,

I'm using Lmax Disruptor with 3 producers and 30 consumers with busy spin strategy. The ring buffer size is 65536.
The producers are publishing events with a high rate of 20.000 events/second and after a while (when the disruptor get full) the disruptor hangs (the producers are blocked and the consumers are not processing anymore the events).

I have added below the thread dump for producers and consumers.
Consumers stack traces:

                "Thread-72" - Thread t@124
                   java.lang.Thread.State: RUNNABLE
                        at app//com.lmax.disruptor.BusySpinWaitStrategy.waitFor(BusySpinWaitStrategy.java:39)
                        at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
                        at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
                        at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
                        at java.base@11.0.20/java.lang.Thread.run(Thread.java:829)
        
                   Locked ownable synchronizers:
                        - None

                "Thread-73" - Thread t@125
                   java.lang.Thread.State: RUNNABLE
                        at app//com.lmax.disruptor.BusySpinWaitStrategy.waitFor(BusySpinWaitStrategy.java:39)
                        at app//com.lmax.disruptor.ProcessingSequenceBarrier.waitFor(ProcessingSequenceBarrier.java:56)
                        at app//com.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:159)
                        at app//com.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125)
                        at java.base@11.0.20/java.lang.Thread.run(Thread.java:829)
    
                   Locked ownable synchronizers:
                        - None

Producers stack traces:

              "thread-context-16" - Thread t@142
                 java.lang.Thread.State: TIMED_WAITING
                      at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
                      at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
                      at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
              .............
              Locked ownable synchronizers:
                      - None
               ...........
              "pool-9-thread-1" - Thread t@144
                 java.lang.Thread.State: TIMED_WAITING
                      at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
                      at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
                      at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
              .............
              Locked ownable synchronizers:
                      - locked <15a0ebf2> (a java.util.concurrent.ThreadPoolExecutor$Worker)
              .............
              "DefaultQuartzScheduler_Worker-1" - Thread t@147
                 java.lang.Thread.State: TIMED_WAITING
                      at java.base@11.0.20/jdk.internal.misc.Unsafe.park(Native Method)
                      at java.base@11.0.20/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:357)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136)
                      at app//com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:105)
                      at app//com.lmax.disruptor.RingBuffer.next(RingBuffer.java:263)
              .............
              Locked ownable synchronizers:
                      - None

The producers are using the following code snippet for publishing events:

              var ringBuffer = disruptor.getRingBuffer();
              final long sequence = ringBuffer.next();
              try {
                  Command command = disruptor.get(sequence);
                  command.setType("SAMPLE");
                  ....
              } finally {
                  disruptor.getRingBuffer().publish(sequence);
              }

and the producers are blocked in an endless loop inside ringBuffer.getNext() method.

The consumers are using a modulo load balancing for processing the events based on the index of command.

After blocking I can see the following values in the disruptor:

          disruptor.getRingBuffer().getCursor() = 86053
          disruptor.getRingBuffer().getBufferSize() = 65536
          disruptor.getRingBuffer().getMinimumGatingSequence() = 20517 
          disruptor.getRingBuffer().remainingCapacity() = 0 

Could you please help me to solve this issue?

Thank you in advance!
Bogdan

@grumpyjames
Copy link
Member

Wild guess: It looks like one of the consumers has failed to notify the ringbuffer that it has progressed past a particular sequence (the minimum gating sequence).

We'd need to see the code you're using to set up the consumers and publishers to know more.

@aogburn
Copy link

aogburn commented Jun 20, 2024

I note the issue description producers are at com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) , which is in effect a LockSupport.parkNanos(1) call. Was this issue produced on a RHEL8+ environment perhaps? If so, sounds like this is similar to #219 and the increased CPU that short LockSupport.parkNanos intervals will face on RHEL 8+. I also recently found an issue where producers where consuming high CPU on com.lmax.disruptor.MultiProducerSequencer.next(MultiProducerSequencer.java:136) when the buffer was full. Once triggered, this could perpetuate the issue state as multiple producers could potentially spin too much on that CPU busy wait and prevent consumers from getting appropriate CPU themselves to clear the buffers. It appears MultiProducerSequencer would need a larger/configurable interval like was done in the SleepingWaitStrategy.java or actually use any larger interval available from its wait strategy as noted in the TODO for this line.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants