Uploaded image for project: 'QuickFIX/J'
  1. QuickFIX/J
  2. QFJ-215

QFJ deadlocks in Session.disconnect() code when a Windows client disconnects from Linux or Mac server

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Default
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.3.0
    • Component/s: Engine
    • Labels:
      None
    • Environment:
      rev 709

      Description

      I'm encountering a situation (in the HEAD code) where I get a deadlock when i disconnect from a QFJ acceptor running on Linux or Mac OS X from a Windows machine.
      I can reproduce this using the sample executor/banai applications, so i don't think that it has anything to do with my application code.

      Repro:
      Run the executor app on a Mac machine
      Setup Banzai to connect from a Windows machine to the acceptor on a Mac or a Linux box. this has to be a separate box, not localhost. VMWare Windows -> Linux host works.

      The first time the connection will go through successfully. Now, press Ctrl-C to disconnect the Banzai process - the "acceptor" side registers the disconnect messages:
      <20070726-02:30:40, FIX.4.0:EXEC->BANZAI, event> (Responding to logon request)
      Jul 25, 2007 7:30:43 PM quickfix.mina.AbstractIoHandler exceptionCaught
      SEVERE: socket exception (/10.1.9.204:1747): Connection reset by peer
      <20070726-02:30:43, FIX.4.4:EXEC->BANZAI, event> (Disconnecting)

      Now, try connecting Banzai again. on the Banzai side, you see the logon initiation messages:
      <20070726-02:33:10, FIX.4.1:BANZAI->EXEC, event> (Initiated logon request)
      <20070726-02:33:10, FIX.4.4:BANZAI->EXEC, outgoing> (8=FIX.4.4☺9=66☺35=A☺34=14☺4
      9=BANZAI☺52=20070726-02:33:10.312☺56=EXEC☺98=0☺108=30☺10=007☺)

      On the 2nd try, when you see Banzai trying to log on, there are no messages showing up on the Executor side. Doing a Ctrl-\ to get a stack trace yields the following:

      "SocketAcceptorIoProcessor-0.0" prio=10 tid=0x905e1800 nid=0x2e7f in Object.wait() [0x8ffad000..0x8ffadf30]
         java.lang.Thread.State: WAITING (on object monitor)
              at java.lang.Object.wait(Native Method)
              - waiting on <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture)
              at java.lang.Object.wait(Object.java:485)
              at org.apache.mina.common.support.DefaultIoFuture.join(DefaultIoFuture.java:86)
              - locked <0xb1de2338> (a org.apache.mina.common.support.DefaultCloseFuture)
              at quickfix.mina.IoSessionResponder.disconnect(IoSessionResponder.java:44)
              at quickfix.Session.disconnect(Session.java:1369)
              at quickfix.mina.AbstractIoHandler.exceptionCaught(AbstractIoHandler.java:82)
              at org.apache.mina.common.support.AbstractIoFilterChain$TailFilter.exceptionCaught(AbstractIoFilterChain.java:695)
              at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
              at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
              at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794)
              at org.apache.mina.common.IoFilterAdapter.exceptionCaught(IoFilterAdapter.java:78)
              at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
              at org.apache.mina.common.support.AbstractIoFilterChain.access$1100(AbstractIoFilterChain.java:54)
              at org.apache.mina.common.support.AbstractIoFilterChain$EntryImpl$1.exceptionCaught(AbstractIoFilterChain.java:794)
              at org.apache.mina.common.support.AbstractIoFilterChain$HeadFilter.exceptionCaught(AbstractIoFilterChain.java:611)
              at org.apache.mina.common.support.AbstractIoFilterChain.callNextExceptionCaught(AbstractIoFilterChain.java:423)
              at org.apache.mina.common.support.AbstractIoFilterChain.fireExceptionCaught(AbstractIoFilterChain.java:407)
              at org.apache.mina.transport.socket.nio.SocketIoProcessor.read(SocketIoProcessor.java:293)
              at org.apache.mina.transport.socket.nio.SocketIoProcessor.process(SocketIoProcessor.java:241)
              at org.apache.mina.transport.socket.nio.SocketIoProcessor.access$500(SocketIoProcessor.java:44)
              at org.apache.mina.transport.socket.nio.SocketIoProcessor$Worker.run(SocketIoProcessor.java:563)
              at org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:43)
              at java.lang.Thread.run(Thread.java:619)
      

      When you run this on a vanilla Quickfix/J-1.2.1 distribution everything works. However, it starts breaking using the head (as of 709) revision.
      Linux->Mac and Linux->linux works, it seems that only when you enter Windows into the equation it starts to misbehave.

      I'm inclined to think this is a QFJ and not a MINA issue, since it works fine with 1.2.1 but not with SVN revision.
      Also, i've tried various versions of MINA - 1.0.3, 1.0.4 and 1.1.1, and the app still deadlocks in the same way.

      I've looked at the code, and it seems that there are 2 different code paths that happen.
      When a non-Windows initiator disconnects, the xxx.mina.xxx.DefaultCloseFuture.setValue() is hit, after which we get into the quickfix.mina.IoSessionResponder.disconnect() and sail right through the close().join() code.
      However, with a Windows iniatiator, we go straight into the IoSessionResponder, and it seems that the close0() function is never hit so we never mark the future as closing and hang as a result.

      I saw something related in https://issues.apache.org/jira/browse/DIRMINA-261 but since we are not specifying SO_LINGER (or SocketLinger) i don't think it applies. i tried setting SocketLinger to be 0, but that didn't change the behaviour.

      I can reproduce this consistently, so would like any advice on how to go about further debugging or fixing this.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                admin Steve Bate
                Reporter:
                toli Toli Kuznets
              • Votes:
                0 Vote for this issue
                Watchers:
                0 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: