Details
Description
I'm encountering a situation (in the HEAD code) where I get a deadlock when i disconnect from a QFJ acceptor running on Linux or Mac OS X from a Windows machine.
I can reproduce this using the sample executor/banai applications, so i don't think that it has anything to do with my application code.
Repro:
Run the executor app on a Mac machine
Setup Banzai to connect from a Windows machine to the acceptor on a Mac or a Linux box. this has to be a separate box, not localhost. VMWare Windows -> Linux host works.
The first time the connection will go through successfully. Now, press Ctrl-C to disconnect the Banzai process - the "acceptor" side registers the disconnect messages:
<20070726-02:30:40, FIX.4.0:EXEC->BANZAI, event> (Responding to logon request)
Jul 25, 2007 7:30:43 PM quickfix.mina.AbstractIoHandler exceptionCaught
SEVERE: socket exception (/10.1.9.204:1747): Connection reset by peer
<20070726-02:30:43, FIX.4.4:EXEC->BANZAI, event> (Disconnecting)
Now, try connecting Banzai again. on the Banzai side, you see the logon initiation messages:
<20070726-02:33:10, FIX.4.1:BANZAI->EXEC, event> (Initiated logon request)
<20070726-02:33:10, FIX.4.4:BANZAI->EXEC, outgoing> (8=FIX.4.4☺9=66☺35=A☺34=14☺4
9=BANZAI☺52=20070726-02:33:10.312☺56=EXEC☺98=0☺108=30☺10=007☺)
On the 2nd try, when you see Banzai trying to log on, there are no messages showing up on the Executor side. Doing a Ctrl-\ to get a stack trace yields the following:
"SocketAcceptorIoProcessor-0.0" prio=10 tid=0x905e1800 nid=0x2e7f in Object.wait() [0x8ffad000..0x8ffadf30] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture) at java.lang.Object.wait(Object.java:485) at org.apache.mina.common.support.DefaultIoFuture.join(DefaultIoFuture.java:86) - locked <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture) at quickfix.mina.IoSessionResponder.disconnect(IoSessionResponder.java:44) at quickfix.Session.disconnect(Session.java:1369) at quickfix.mina.AbstractIoHandler.exceptionCaught(AbstractIoHandler.java:82) at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.exceptionCaught(AbstractIoFilterChain.java:695) at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794) at org.apache.mina.common.IoFilterAdapter.exceptionCaught(IoFilterAdapter.java:78) at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423) at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54) at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794) at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.exceptionCaught(AbstractIoFilterChain.java:611) at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423) at org.apache.mina.common.support.AbstractIoFilterChain.fireExceptionCaught(AbstractIoFilterChain.java:407) at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:293) at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:241) at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$500(SocketIoProcessor.java:44) at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:563) at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:43) at java.lang.Thread.run(Thread.java:619)
When you run this on a vanilla Quickfix/J-1.2.1 distribution everything works. However, it starts breaking using the head (as of 709) revision.
Linux->Mac and Linux->linux works, it seems that only when you enter Windows into the equation it starts to misbehave.
I'm inclined to think this is a QFJ and not a MINA issue, since it works fine with 1.2.1 but not with SVN revision.
Also, i've tried various versions of MINA - 1.0.3, 1.0.4 and 1.1.1, and the app still deadlocks in the same way.
I've looked at the code, and it seems that there are 2 different code paths that happen.
When a non-Windows initiator disconnects, the xxx.mina.xxx.DefaultCloseFuture.setValue() is hit, after which we get into the quickfix.mina.IoSessionResponder.disconnect() and sail right through the close().join() code.
However, with a Windows iniatiator, we go straight into the IoSessionResponder, and it seems that the close0() function is never hit so we never mark the future as closing and hang as a result.
I saw something related in https://issues.apache.org/jira/browse/DIRMINA-261 but since we are not specifying SO_LINGER (or SocketLinger) i don't think it applies. i tried setting SocketLinger to be 0, but that didn't change the behaviour.
I can reproduce this consistently, so would like any advice on how to go about further debugging or fixing this.
Attachments
Issue Links
- is related to
-
QFJ-220 QFJ hangs in IoSessionResponder.disconnect() while waiting for scheduled messages to be written
- Closed