[QFJ-215] QFJ deadlocks in Session.disconnect() code when a Windows client disconnects from Linux or Mac server Created: 26/Jul/07  Updated: 11/Feb/09  Resolved: 27/Jul/07

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: None
Fix Version/s: 1.3.0

Type: Bug Priority: Default
Reporter: Toli Kuznets Assignee: Steve Bate
Resolution: Fixed Votes: 0
Labels: None
Environment:

rev 709


Issue Links:
Relates
is related to QFJ-220 QFJ hangs in IoSessionResponder.disco... Closed

 Description   

I'm encountering a situation (in the HEAD code) where I get a deadlock when i disconnect from a QFJ acceptor running on Linux or Mac OS X from a Windows machine.
I can reproduce this using the sample executor/banai applications, so i don't think that it has anything to do with my application code.

Repro:
Run the executor app on a Mac machine
Setup Banzai to connect from a Windows machine to the acceptor on a Mac or a Linux box. this has to be a separate box, not localhost. VMWare Windows -> Linux host works.

The first time the connection will go through successfully. Now, press Ctrl-C to disconnect the Banzai process - the "acceptor" side registers the disconnect messages:
<20070726-02:30:40, FIX.4.0:EXEC->BANZAI, event> (Responding to logon request)
Jul 25, 2007 7:30:43 PM quickfix.mina.AbstractIoHandler exceptionCaught
SEVERE: socket exception (/10.1.9.204:1747): Connection reset by peer
<20070726-02:30:43, FIX.4.4:EXEC->BANZAI, event> (Disconnecting)

Now, try connecting Banzai again. on the Banzai side, you see the logon initiation messages:
<20070726-02:33:10, FIX.4.1:BANZAI->EXEC, event> (Initiated logon request)
<20070726-02:33:10, FIX.4.4:BANZAI->EXEC, outgoing> (8=FIX.4.4☺9=66☺35=A☺34=14☺4
9=BANZAI☺52=20070726-02:33:10.312☺56=EXEC☺98=0☺108=30☺10=007☺)

On the 2nd try, when you see Banzai trying to log on, there are no messages showing up on the Executor side. Doing a Ctrl-\ to get a stack trace yields the following:

"SocketAcceptorIoProcessor-0.0" prio=10 tid=0x905e1800 nid=0x2e7f in Object.wait() [0x8ffad000..0x8ffadf30]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        - waiting on <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture)
        at java.lang.Object.wait(Object.java:485)
        at org.apache.mina.common.support.DefaultIoFuture.join(DefaultIoFuture.java:86)
        - locked <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture)
        at quickfix.mina.IoSessionResponder.disconnect(IoSessionResponder.java:44)
        at quickfix.Session.disconnect(Session.java:1369)
        at quickfix.mina.AbstractIoHandler.exceptionCaught(AbstractIoHandler.java:82)
        at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.exceptionCaught(AbstractIoFilterChain.java:695)
        at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
        at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
        at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794)
        at org.apache.mina.common.IoFilterAdapter.exceptionCaught(IoFilterAdapter.java:78)
        at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
        at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
        at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794)
        at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.exceptionCaught(AbstractIoFilterChain.java:611)
        at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
        at org.apache.mina.common.support.AbstractIoFilterChain.fireExceptionCaught(AbstractIoFilterChain.java:407)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:293)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:241)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$500(SocketIoProcessor.java:44)
        at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:563)
        at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:43)
        at java.lang.Thread.run(Thread.java:619)

When you run this on a vanilla Quickfix/J-1.2.1 distribution everything works. However, it starts breaking using the head (as of 709) revision.
Linux->Mac and Linux->linux works, it seems that only when you enter Windows into the equation it starts to misbehave.

I'm inclined to think this is a QFJ and not a MINA issue, since it works fine with 1.2.1 but not with SVN revision.
Also, i've tried various versions of MINA - 1.0.3, 1.0.4 and 1.1.1, and the app still deadlocks in the same way.

I've looked at the code, and it seems that there are 2 different code paths that happen.
When a non-Windows initiator disconnects, the xxx.mina.xxx.DefaultCloseFuture.setValue() is hit, after which we get into the quickfix.mina.IoSessionResponder.disconnect() and sail right through the close().join() code.
However, with a Windows iniatiator, we go straight into the IoSessionResponder, and it seems that the close0() function is never hit so we never mark the future as closing and hang as a result.

I saw something related in https://issues.apache.org/jira/browse/DIRMINA-261 but since we are not specifying SO_LINGER (or SocketLinger) i don't think it applies. i tried setting SocketLinger to be 0, but that didn't change the behaviour.

I can reproduce this consistently, so would like any advice on how to go about further debugging or fixing this.



 Comments   
Comment by Steve Bate [ 27/Jul/07 ]

I've removed the CloseFuture.close() call. That shouldn't have been there because the threading model we are using will cause MINA to not be able to process the close completion event (and complete the join()).

Generated at Sat Nov 23 14:20:57 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.