[QFJ-738] Deadlock on disconnect Created: 15/Apr/13  Updated: 02/Apr/15  Resolved: 13/Feb/14

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.5.1
Fix Version/s: 1.6.0

Type: Bug Priority: Major
Reporter: Alexander Kormushin Assignee: Christoph John
Resolution: Fixed Votes: 0
Labels: None
Environment:

Windows 7, Java SE 1.6.0_30


Attachments: Text File quickfixj-deadlock.txt     Text File stacktrace.txt    
Issue Links:
Relates
relates to QFJ-664 Upgrade MINA from 1.1.8 to 2.0.x Closed
is related to QFJ-638 SocketSynchronousWrites=Y Hangs Threa... Closed

 Description   

I'm starting quickfixj acceptor and initiator in the same thread. For test purposes only. Don't know whether it is a valid use case. Will appreciate any thoughts on this.

From time to time following deadlock occurs:

Found one Java-level deadlock:
=============================
"QFJ Message Processor":
  waiting to lock monitor 0x000000000702f0b0 (object 0x00000007c39b1230, a java.lang.Object),
  which is held by "QFJ Timer"
"QFJ Timer":
  waiting to lock monitor 0x00000000070323e0 (object 0x00000007c39f11d0, a java.lang.String),
  which is held by "QFJ Message Processor"

Java stack information for the threads listed above:
===================================================
"QFJ Message Processor":
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.doWrite(VmPipeFilterChain.java:192)
	- waiting to lock <0x00000007c39b1230> (a java.lang.Object)
	at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.filterWrite(AbstractIoFilterChain.java:631)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1400(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:824)
	at org.apache.mina.filter.codec.ProtocolCodecFilter$ProtocolEncoderOutputImpl.doFlush(ProtocolCodecFilter.java:436)
	at org.apache.mina.filter.codec.support.SimpleProtocolEncoderOutput.flush(SimpleProtocolEncoderOutput.java:112)
	at org.apache.mina.filter.codec.ProtocolCodecFilter.filterWrite(ProtocolCodecFilter.java:237)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1400(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterWrite(AbstractIoFilterChain.java:824)
	at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterWrite(AbstractIoFilterChain.java:727)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterWrite(AbstractIoFilterChain.java:445)
	at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterWrite(AbstractIoFilterChain.java:436)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireEvent(VmPipeFilterChain.java:100)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.flushEvents(VmPipeFilterChain.java:65)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.pushEvent(VmPipeFilterChain.java:56)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireFilterWrite(VmPipeFilterChain.java:141)
	at org.apache.mina.transport.vmpipe.support.VmPipeSessionImpl.write0(VmPipeSessionImpl.java:146)
	at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:149)
	at org.apache.mina.common.support.BaseIoSession.write(BaseIoSession.java:135)
	at quickfix.mina.IoSessionResponder.send(IoSessionResponder.java:47)
	at quickfix.Session.send(Session.java:2322)
	- locked <0x00000007c39f11d0> (a java.lang.String)
	at quickfix.Session.sendRaw(Session.java:2253)
	at quickfix.Session.generateLogon(Session.java:2196)
	at quickfix.Session.nextLogon(Session.java:2001)
	at quickfix.Session.next(Session.java:958)
	at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:114)
	at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:77)
	at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:94)
	at java.lang.Thread.run(Thread.java:662)
"QFJ Timer":
	at quickfix.Session.getResponder(Session.java:508)
	- waiting to lock <0x00000007c39f11d0> (a java.lang.String)
	at quickfix.Session.hasResponder(Session.java:518)
	at quickfix.mina.AbstractIoHandler.sessionClosed(AbstractIoHandler.java:95)
	at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.sessionClosed(AbstractIoFilterChain.java:677)
	at org.apache.mina.common.support.AbstractIoFilterChain.callNextSessionClosed(AbstractIoFilterChain.java:321)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$800(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.sessionClosed(AbstractIoFilterChain.java:781)
	at org.apache.mina.filter.codec.ProtocolCodecFilter.sessionClosed(ProtocolCodecFilter.java:299)
	at org.apache.mina.common.support.AbstractIoFilterChain.callNextSessionClosed(AbstractIoFilterChain.java:321)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$800(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.sessionClosed(AbstractIoFilterChain.java:781)
	at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.sessionClosed(AbstractIoFilterChain.java:599)
	at org.apache.mina.common.support.AbstractIoFilterChain.callNextSessionClosed(AbstractIoFilterChain.java:321)
	at org.apache.mina.common.support.AbstractIoFilterChain.fireSessionClosed(AbstractIoFilterChain.java:313)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireEvent(VmPipeFilterChain.java:124)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.flushEvents(VmPipeFilterChain.java:65)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.pushEvent(VmPipeFilterChain.java:56)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireSessionClosed(VmPipeFilterChain.java:159)
	at org.apache.mina.common.support.IoServiceListenerSupport.fireSessionDestroyed(IoServiceListenerSupport.java:233)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.doClose(VmPipeFilterChain.java:240)
	- locked <0x00000007c39b1230> (a java.lang.Object)
	at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.filterClose(AbstractIoFilterChain.java:644)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1500(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterClose(AbstractIoFilterChain.java:830)
	at org.apache.mina.common.IoFilterAdapter.filterClose(IoFilterAdapter.java:98)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1500(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterClose(AbstractIoFilterChain.java:830)
	at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterClose(AbstractIoFilterChain.java:732)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterClose(AbstractIoFilterChain.java:456)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireEvent(VmPipeFilterChain.java:128)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.flushEvents(VmPipeFilterChain.java:65)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.pushEvent(VmPipeFilterChain.java:56)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireFilterClose(VmPipeFilterChain.java:135)
	at org.apache.mina.transport.vmpipe.support.VmPipeSessionImpl.close0(VmPipeSessionImpl.java:140)
	at org.apache.mina.common.support.BaseIoSession.close(BaseIoSession.java:119)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.doClose(VmPipeFilterChain.java:241)
	- locked <0x00000007c39b1230> (a java.lang.Object)
	at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.filterClose(AbstractIoFilterChain.java:644)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1500(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterClose(AbstractIoFilterChain.java:830)
	at org.apache.mina.common.IoFilterAdapter.filterClose(IoFilterAdapter.java:98)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.access$1500(AbstractIoFilterChain.java:54)
	at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.filterClose(AbstractIoFilterChain.java:830)
	at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.filterClose(AbstractIoFilterChain.java:732)
	at org.apache.mina.common.support.AbstractIoFilterChain.callPreviousFilterClose(AbstractIoFilterChain.java:464)
	at org.apache.mina.common.support.AbstractIoFilterChain.fireFilterClose(AbstractIoFilterChain.java:456)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireEvent(VmPipeFilterChain.java:128)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.flushEvents(VmPipeFilterChain.java:65)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.pushEvent(VmPipeFilterChain.java:56)
	at org.apache.mina.transport.vmpipe.support.VmPipeFilterChain.fireFilterClose(VmPipeFilterChain.java:135)
	at org.apache.mina.transport.vmpipe.support.VmPipeSessionImpl.close0(VmPipeSessionImpl.java:140)
	at org.apache.mina.common.support.BaseIoSession.close(BaseIoSession.java:119)
	at quickfix.mina.IoSessionResponder.disconnect(IoSessionResponder.java:69)
	at quickfix.Session.disconnect(Session.java:1918)
	- locked <0x00000007c39a3358> (a java.lang.String)
	at quickfix.Session.next(Session.java:1790)
	at quickfix.mina.SessionConnector$SessionTimerTask.run(SessionConnector.java:283)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
	at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)

Found 1 deadlock.


 Comments   
Comment by Alexander Kormushin [ 15/Apr/13 ]

Full stack trace attached.

Comment by Alexander Kormushin [ 16/Apr/13 ]

It seems the issue is only reproducible with VM_PIPE transport.

Comment by Christoph John [ 17/Apr/13 ]

Could you please check if the problem exists with a newer version? E.g. there was QFJ-645. The stack trace looks different but the problem also appeared on disconnection.

Comment by Attila-Mihaly Balazs [ 10/Sep/13 ]

This is still happening with 1.5.3, see the attached stacktrace.

The sequence of events is the following (from the attached stacktrace):

  • T1 (FixMessageHandler-ExecThread) obtains responderSync in quickfix.Session.send
  • T2 (main) obtains session.getLock() (this isn't shown in the thread dump - apparently jstack still doesn't have very good suppport for explicit locks - it happens in VmPipeFilterChain$VmPipeIoProcessor.remove at line 253)
  • T1 tries to obtain the same lock (sssion.getLock()) at VmPipeFilterChain$VmPipeIoProcessor.flush (line 184)
  • T2 tries to obtain responderSync at Session.disconnect (line 1892).

This only happens with the VmPipe transport (used by me for fast integration testing) and is solved by the following patch (which calls responder.send outside of the lock). The patch is against trunk and all the unit tests pass with it.

Index: core/src/main/java/quickfix/Session.java
===================================================================
— core/src/main/java/quickfix/Session.java (revision 1114)
+++ core/src/main/java/quickfix/Session.java (working copy)
@@ -2475,13 +2475,15 @@

private boolean send(String messageString) {
getLog().onOutgoing(messageString);
+ Responder responder;
synchronized (responderSync) {
if (!hasResponder())

{ getLog().onEvent("No responder, not sending message: " + messageString); return false; }
  • return getResponder().send(messageString);
    + responder = getResponder();
    }
    + return responder.send(messageString);
    }

private boolean isCorrectCompID(String senderCompID, String targetCompID) {

Comment by Christoph John [ 10/Sep/13 ]

Thanks for the patch. Could you please attach it since the code formatting in comments is not very optimal. Thank you.

Comment by Christoph John [ 17/Dec/13 ]

Did some more tests with unit test MultiAcceptorTest and could reproduce this almost always. However, the suggested patch does not solve this issue at all times. One more problem is that on a disconnection the AbstractIoHandler gets notified by MINA via callback sessionClosed and also tries to get hands on the responderSync lock (via Session.hasResponder() and disconnect()).

I guess we need to analyse if it is always feasible to check the hasResponder() from outside or just call the necessary Session method and do the responder check inside the Session.
Moreover, since almost all deadlocks I saw (also from other QFJ issues) were based around the disconnect() method, maybe it would be better to switch on a initiateDisconnect-flag of some sort from outside and handle the disconnection from inside the next() method or similar. Don't know if this will solve all problems, though.

Generated at Sat Nov 23 08:49:22 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.