Uploaded image for project: 'QuickFIX/J'
  1. QuickFIX/J
  2. QFJ-54

Occasional NPE in block method during acceptor start.

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Default
    • Resolution: Fixed
    • Affects Version/s: 1.0.1, 1.0.2
    • Fix Version/s: 1.0.3
    • Component/s: Networking
    • Labels:
      None

      Description

      From Toli Kuznets...

      I have a basic app that uses QFJ to communicate. I have a bunch of uni
      tests running continiously, and they've been passing with the app
      linked to 1.0.0-final of QuickFix/J.

      However, as soon as i tried switching to 1.0.2 i get the following error:
      Exception in thread "QFJ Socket Acceptor 126f75b" java.lang.NullPointerException
      at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
      at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
      at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
      at java.lang.Thread.run(Unknown Source)

      There's no nested exception, and it's a thread created by QFJ and not my app.

      Switching the code back to using 1.0.0-final fixes the problem. I
      never used 1.0.1 so i can't tell if the problem manifests itself there
      as well.

      Any ideas on what could be causing this?
      This is happening both on Ubuntu Linux and MacOSX. Not even sure how
      to debug this since the exception is coming from a QFJ/mina code.

      -------------------------------------------------------------------------------------------------------------------------------
      From Joerg Thoennes...

      Steve, the same happened to me. Today I got with version 1.0.1 this error:

      [2006-08-15 17:41:47,472] [INFO ] [quickfix.mina.acceptor.AcceptorIoHandler] (SocketAcceptor-0) MINA
      session created: /192.168.1.8:57611
      [2006-08-15 17:41:47,573] [DEBUG] [FixLogFactory] (AnonymousIoService-1-2)
      FIX.4.2:FixGateway->FixClient: incoming:
      8=FIX.4.2|9=62|35=5|34=1|49=FixClient|52=20060815-15:41:47.540|56=FixGateway|10=091|
      Exception in thread "QFJ Socket Acceptor 992bae" java.lang.NullPointerException
      at
      quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
      at
      quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
      at
      quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
      at java.lang.Thread.run(Thread.java:595)

      If I can reproduce this error tomorrow, I will send you more details.

        Attachments

          Activity

          Hide
          admin Steve Bate added a comment -

          I've found a way that this error could occur. This is the scenario...

          1. An acceptor
          2. A new connection is accepted but not associated with a session
          3. A message, other than a logon, is received on the connection.

          It's easy to check for this condition and ignore the message with a warning, but
          is this scenario consistent with the context where you were seeing this error?

          Show
          admin Steve Bate added a comment - I've found a way that this error could occur. This is the scenario... 1. An acceptor 2. A new connection is accepted but not associated with a session 3. A message, other than a logon, is received on the connection. It's easy to check for this condition and ignore the message with a warning, but is this scenario consistent with the context where you were seeing this error?
          Hide
          jthoennes Jörg Thönnes added a comment -

          My scenaria is as follows:

          1. An initiator process continuously tries to connect to the acceptor.
          2. The acceptor is starting up, has been partially blocked in a debugger or is rejecting logons (throw RejectLogon in fromAdmin()).

          Up to now, I did no further analysis, but my guess is that this is somehow related to the RejectLogon. Does this make sense?
          I wonder what the message other than Logon could be in my scenario.

          Show
          jthoennes Jörg Thönnes added a comment - My scenaria is as follows: 1. An initiator process continuously tries to connect to the acceptor. 2. The acceptor is starting up, has been partially blocked in a debugger or is rejecting logons (throw RejectLogon in fromAdmin()). Up to now, I did no further analysis, but my guess is that this is somehow related to the RejectLogon. Does this make sense? I wonder what the message other than Logon could be in my scenario.
          Hide
          toli Toli Kuznets added a comment -

          Steve,

          I think what you described is similar to what is happening with me.
          I've collected some more information, and i'm attaching a bit more logging.

          In my unit test i created an Acceptor which starts up, and then i create an initiator which immediately tries to connect to the acceptor.

          The acceptor is basically a thin wrapper around quickix.SocketAcceptor, and the intiator is just a wrapper around quickfix.SocketInitiator (code is at http://trac.marketcetera.org/trac.fcgi/browser/platform/trunk/core/src/main/java/org/marketcetera/quickfix/QuickFIXInitiator.java).

          Looking at the log above the exception, i'm now seeing that it looks like Exchange (acceptor) stars up, but then rejects the incoming message:
          <20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Session FIX.4.2:test-exchange->test-sender schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
          <20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Created session: FIX.4.2:test-exchange->test-sender)
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session FIX.4.2:test-sender->test-exchange schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange)
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message)
          02:01:01,106 DEBUG [main] marketcetera.exchange.ExchangeMainTest$MyQFInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:test-sender->test-exchange
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Created session: FIX.4.2:test-sender->test-exchange)
          <20060816-09:01:01, FIX.4.2:test-exchange->test-sender, incoming> (8=FIX.4.29=6735=534=149=test-sender52=20060816-09:01:01.23756=test-exchange10=120)
          Exception in thread "QFJ Socket Acceptor 1e97f9f" java.lang.NullPointerException
          at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
          at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
          at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
          at java.lang.Thread.run(Unknown Source)

          For example, i don't see these 2 lines when the tests pass:
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange)
          <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message)

          From the stacktrace at time of bug, it appears that my unit test is waiting for the onLogon() event to happen in the Initator.

          stack-trace attached, and more logging (above included) is also attached.

          But overall, it seems that it could be similar to what Steve is describing.
          We are using
          ResetOnLogout=Y
          ResetOnDisconnect=Y
          SendResetSeqNumFlag=Y

          if that makes any difference.

          Show
          toli Toli Kuznets added a comment - Steve, I think what you described is similar to what is happening with me. I've collected some more information, and i'm attaching a bit more logging. In my unit test i created an Acceptor which starts up, and then i create an initiator which immediately tries to connect to the acceptor. The acceptor is basically a thin wrapper around quickix.SocketAcceptor, and the intiator is just a wrapper around quickfix.SocketInitiator (code is at http://trac.marketcetera.org/trac.fcgi/browser/platform/trunk/core/src/main/java/org/marketcetera/quickfix/QuickFIXInitiator.java ). Looking at the log above the exception, i'm now seeing that it looks like Exchange (acceptor) stars up, but then rejects the incoming message: <20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Session FIX.4.2:test-exchange->test-sender schedule is daily, 00:00:00 UTC - 00:00:00 UTC) <20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Created session: FIX.4.2:test-exchange->test-sender) <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session FIX.4.2:test-sender->test-exchange schedule is daily, 00:00:00 UTC - 00:00:00 UTC) <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange) <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message) 02:01:01,106 DEBUG [main] marketcetera.exchange.ExchangeMainTest$MyQFInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:test-sender->test-exchange <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Created session: FIX.4.2:test-sender->test-exchange) <20060816-09:01:01, FIX.4.2:test-exchange->test-sender, incoming> (8=FIX.4.29=6735=534=149=test-sender52=20060816-09:01:01.23756=test-exchange10=120) Exception in thread "QFJ Socket Acceptor 1e97f9f" java.lang.NullPointerException at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131) at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70) at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86) at java.lang.Thread.run(Unknown Source) For example, i don't see these 2 lines when the tests pass: <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange) <20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message) From the stacktrace at time of bug, it appears that my unit test is waiting for the onLogon() event to happen in the Initator. stack-trace attached, and more logging (above included) is also attached. But overall, it seems that it could be similar to what Steve is describing. We are using ResetOnLogout=Y ResetOnDisconnect=Y SendResetSeqNumFlag=Y if that makes any difference.
          Hide
          toli Toli Kuznets added a comment -

          Thread dump of the unit test that has a race-condition exhibiting the NPE

          Show
          toli Toli Kuznets added a comment - Thread dump of the unit test that has a race-condition exhibiting the NPE
          Hide
          toli Toli Kuznets added a comment -

          Log file for the unit test exhibiting the NPE

          Show
          toli Toli Kuznets added a comment - Log file for the unit test exhibiting the NPE
          Hide
          toli Toli Kuznets added a comment -

          Not sure if this helps, but i have another set of logs and a use case

          1. start my exchange simulator
          2. connect to it from 2 places (web app that displays the simulator) and an OMS (that sends a few sample order to seed the market).
          everything is peachy.
          3. connect with 2nd OMS (different senderCompID) and it generates the NPE.
          The simulator knows about that session (it's in simulator's config file).

          here's the log from sending (OMS) side:
          <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session FIX.4.2:sender-1501-OMS->MRKTC-EXCH schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
          <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session state is not current; resetting FIX.4.2:sender-1501-OMS->MRKTC-EXCH)
          <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (No responder, not sending message)
          11:28:00,712 DEBUG [main] marketcetera.quickfix.QuickFIXInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:sender-1501-OMS->MRKTC-EXCH
          <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Created session: FIX.4.2:sender-1501-OMS->MRKTC-EXCH)
          Aug 18, 2006 11:28:00 AM quickfix.mina.initiator.InitiatorIoHandler sessionCreated
          INFO: MINA session created: /10.1.9.112:52297
          <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Disconnecting)

          and on the simulator side, the request triggers the NPE and subsequently all the other previous 2 connections get dropped:
          Aug 18, 2006 11:28:01 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated
          INFO: MINA session created: /66.92.9.46:52297
          <20060818-18:28:01, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=6835=534=149=sender-1501-OMS52=20060818-18:28:00.92856=MRKTC-EXCH10=083)
          Exception in thread "QFJ Socket Acceptor ef137d" java.lang.NullPointerException
          at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
          at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
          at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
          at java.lang.Thread.run(Thread.java:595)
          <20060818-18:28:12, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, incoming> (8=FIX.4.29=6635=034=849=MRKTC-DISPLAY52=20060818-18:28:12.78856=MRKTC-EXCH10=141)
          <20060818-18:28:13, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, outgoing> (8=FIX.4.29=6735=034=1349=MRKTC-EXCH52=20060818-18:28:13.21356=MRKTC-DISPLAY10=170)
          Aug 18, 2006 11:28:16 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated
          INFO: MINA session created: /66.92.9.46:52298
          <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=8635=A34=149=sender-1501-OMS52=20060818-18:28:15.93056=MRKTC-EXCH98=0108=30141=Y10=164)
          <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Accepting session FIX.4.2:MRKTC-EXCH->sender-1501-OMS from /66.92.9.46:52298)
          <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Acceptor heartbeat set to 30 seconds)
          <20060818-18:28:25, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, incoming> (8=FIX.4.29=7535=034=1849=MRKTC-DEMO-LOADER-OMS52=20060818-18:28:25.00156=MRKTC-EXCH10=187)
          <20060818-18:28:26, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, outgoing> (8=FIX.4.29=7535=034=1949=MRKTC-EXCH52=20060818-18:28:26.21256=MRKTC-DEMO-LOADER-OMS10=193)
          <20060818-18:28:27, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Disconnecting)

          I think at this point i'm rolling back to 1.0.0

          steve, can you tell me what the patch may be? that way i can patch it locally and test

          Show
          toli Toli Kuznets added a comment - Not sure if this helps, but i have another set of logs and a use case 1. start my exchange simulator 2. connect to it from 2 places (web app that displays the simulator) and an OMS (that sends a few sample order to seed the market). everything is peachy. 3. connect with 2nd OMS (different senderCompID) and it generates the NPE. The simulator knows about that session (it's in simulator's config file). here's the log from sending (OMS) side: <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session FIX.4.2:sender-1501-OMS->MRKTC-EXCH schedule is daily, 00:00:00 UTC - 00:00:00 UTC) <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session state is not current; resetting FIX.4.2:sender-1501-OMS->MRKTC-EXCH) <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (No responder, not sending message) 11:28:00,712 DEBUG [main] marketcetera.quickfix.QuickFIXInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:sender-1501-OMS->MRKTC-EXCH <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Created session: FIX.4.2:sender-1501-OMS->MRKTC-EXCH) Aug 18, 2006 11:28:00 AM quickfix.mina.initiator.InitiatorIoHandler sessionCreated INFO: MINA session created: /10.1.9.112:52297 <20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Disconnecting) and on the simulator side, the request triggers the NPE and subsequently all the other previous 2 connections get dropped: Aug 18, 2006 11:28:01 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated INFO: MINA session created: /66.92.9.46:52297 <20060818-18:28:01, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=6835=534=149=sender-1501-OMS52=20060818-18:28:00.92856=MRKTC-EXCH10=083) Exception in thread "QFJ Socket Acceptor ef137d" java.lang.NullPointerException at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131) at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70) at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86) at java.lang.Thread.run(Thread.java:595) <20060818-18:28:12, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, incoming> (8=FIX.4.29=6635=034=849=MRKTC-DISPLAY52=20060818-18:28:12.78856=MRKTC-EXCH10=141) <20060818-18:28:13, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, outgoing> (8=FIX.4.29=6735=034=1349=MRKTC-EXCH52=20060818-18:28:13.21356=MRKTC-DISPLAY10=170) Aug 18, 2006 11:28:16 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated INFO: MINA session created: /66.92.9.46:52298 <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=8635=A34=149=sender-1501-OMS52=20060818-18:28:15.93056=MRKTC-EXCH98=0108=30141=Y10=164) <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Accepting session FIX.4.2:MRKTC-EXCH->sender-1501-OMS from /66.92.9.46:52298) <20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Acceptor heartbeat set to 30 seconds) <20060818-18:28:25, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, incoming> (8=FIX.4.29=7535=034=1849=MRKTC-DEMO-LOADER-OMS52=20060818-18:28:25.00156=MRKTC-EXCH10=187) <20060818-18:28:26, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, outgoing> (8=FIX.4.29=7535=034=1949=MRKTC-EXCH52=20060818-18:28:26.21256=MRKTC-DEMO-LOADER-OMS10=193) <20060818-18:28:27, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Disconnecting) I think at this point i'm rolling back to 1.0.0 steve, can you tell me what the patch may be? that way i can patch it locally and test
          Hide
          admin Steve Bate added a comment -

          Toli, I will be away from my development computer until Tuesday. From memory, the patch was to quickfix.mina.acceptor.AcceptorIoHandler. Look for a section of code that processes logons before they are forwarded to the session. After that code the message is forwarded to the event handling strategy. The problem is that when a non-logon message is received before any session has been established, the quickfixSession is null. Check for the null session before forwarding it to even handling strategy. I don't have the code here so I can't give you more exact information. Looking at your log, it appears you are receiving a logout immediately after the network connection is established (before the logon). That would trigger the NPE.

          Show
          admin Steve Bate added a comment - Toli, I will be away from my development computer until Tuesday. From memory, the patch was to quickfix.mina.acceptor.AcceptorIoHandler. Look for a section of code that processes logons before they are forwarded to the session. After that code the message is forwarded to the event handling strategy. The problem is that when a non-logon message is received before any session has been established, the quickfixSession is null. Check for the null session before forwarding it to even handling strategy. I don't have the code here so I can't give you more exact information. Looking at your log, it appears you are receiving a logout immediately after the network connection is established (before the logon). That would trigger the NPE.
          Hide
          admin Steve Bate added a comment -

          I've added another task to test the RejectLogon issues that Joerg described to be sure the current changes address that problem.

          Show
          admin Steve Bate added a comment - I've added another task to test the RejectLogon issues that Joerg described to be sure the current changes address that problem.

            People

            • Assignee:
              admin Steve Bate
              Reporter:
              toli Toli Kuznets
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: