[QFJ-54] Occasional NPE in block method during acceptor start. Created: 17/Aug/06  Updated: 18/Sep/08  Resolved: 04/Sep/06

Status: Closed
Project: QuickFIX/J
Component/s: Networking
Affects Version/s: 1.0.1, 1.0.2
Fix Version/s: 1.0.3

Type: Bug Priority: Default
Reporter: Toli Kuznets Assignee: Steve Bate
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File logFile.txt     Text File threadDump.txt    

 Description   

From Toli Kuznets...

I have a basic app that uses QFJ to communicate. I have a bunch of uni
tests running continiously, and they've been passing with the app
linked to 1.0.0-final of QuickFix/J.

However, as soon as i tried switching to 1.0.2 i get the following error:
Exception in thread "QFJ Socket Acceptor 126f75b" java.lang.NullPointerException
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
at java.lang.Thread.run(Unknown Source)

There's no nested exception, and it's a thread created by QFJ and not my app.

Switching the code back to using 1.0.0-final fixes the problem. I
never used 1.0.1 so i can't tell if the problem manifests itself there
as well.

Any ideas on what could be causing this?
This is happening both on Ubuntu Linux and MacOSX. Not even sure how
to debug this since the exception is coming from a QFJ/mina code.

-------------------------------------------------------------------------------------------------------------------------------
From Joerg Thoennes...

Steve, the same happened to me. Today I got with version 1.0.1 this error:

[2006-08-15 17:41:47,472] [INFO ] [quickfix.mina.acceptor.AcceptorIoHandler] (SocketAcceptor-0) MINA
session created: /192.168.1.8:57611
[2006-08-15 17:41:47,573] [DEBUG] [FixLogFactory] (AnonymousIoService-1-2)
FIX.4.2:FixGateway->FixClient: incoming:
8=FIX.4.2|9=62|35=5|34=1|49=FixClient|52=20060815-15:41:47.540|56=FixGateway|10=091|
Exception in thread "QFJ Socket Acceptor 992bae" java.lang.NullPointerException
at
quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
at
quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
at
quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
at java.lang.Thread.run(Thread.java:595)

If I can reproduce this error tomorrow, I will send you more details.



 Comments   
Comment by Steve Bate [ 17/Aug/06 ]

I've found a way that this error could occur. This is the scenario...

1. An acceptor
2. A new connection is accepted but not associated with a session
3. A message, other than a logon, is received on the connection.

It's easy to check for this condition and ignore the message with a warning, but
is this scenario consistent with the context where you were seeing this error?

Comment by Jörg Thönnes [ 17/Aug/06 ]

My scenaria is as follows:

1. An initiator process continuously tries to connect to the acceptor.
2. The acceptor is starting up, has been partially blocked in a debugger or is rejecting logons (throw RejectLogon in fromAdmin()).

Up to now, I did no further analysis, but my guess is that this is somehow related to the RejectLogon. Does this make sense?
I wonder what the message other than Logon could be in my scenario.

Comment by Toli Kuznets [ 18/Aug/06 ]

Steve,

I think what you described is similar to what is happening with me.
I've collected some more information, and i'm attaching a bit more logging.

In my unit test i created an Acceptor which starts up, and then i create an initiator which immediately tries to connect to the acceptor.

The acceptor is basically a thin wrapper around quickix.SocketAcceptor, and the intiator is just a wrapper around quickfix.SocketInitiator (code is at http://trac.marketcetera.org/trac.fcgi/browser/platform/trunk/core/src/main/java/org/marketcetera/quickfix/QuickFIXInitiator.java).

Looking at the log above the exception, i'm now seeing that it looks like Exchange (acceptor) stars up, but then rejects the incoming message:
<20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Session FIX.4.2:test-exchange->test-sender schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
<20060816-09:01:00, FIX.4.2:test-exchange->test-sender, event> (Created session: FIX.4.2:test-exchange->test-sender)
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session FIX.4.2:test-sender->test-exchange schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange)
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message)
02:01:01,106 DEBUG [main] marketcetera.exchange.ExchangeMainTest$MyQFInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:test-sender->test-exchange
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Created session: FIX.4.2:test-sender->test-exchange)
<20060816-09:01:01, FIX.4.2:test-exchange->test-sender, incoming> (8=FIX.4.29=6735=534=149=test-sender52=20060816-09:01:01.23756=test-exchange10=120)
Exception in thread "QFJ Socket Acceptor 1e97f9f" java.lang.NullPointerException
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
at java.lang.Thread.run(Unknown Source)

For example, i don't see these 2 lines when the tests pass:
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (Session state is not current; resetting FIX.4.2:test-sender->test-exchange)
<20060816-09:01:01, FIX.4.2:test-sender->test-exchange, event> (No responder, not sending message)

From the stacktrace at time of bug, it appears that my unit test is waiting for the onLogon() event to happen in the Initator.

stack-trace attached, and more logging (above included) is also attached.

But overall, it seems that it could be similar to what Steve is describing.
We are using
ResetOnLogout=Y
ResetOnDisconnect=Y
SendResetSeqNumFlag=Y

if that makes any difference.

Comment by Toli Kuznets [ 18/Aug/06 ]

Thread dump of the unit test that has a race-condition exhibiting the NPE

Comment by Toli Kuznets [ 18/Aug/06 ]

Log file for the unit test exhibiting the NPE

Comment by Toli Kuznets [ 18/Aug/06 ]

Not sure if this helps, but i have another set of logs and a use case

1. start my exchange simulator
2. connect to it from 2 places (web app that displays the simulator) and an OMS (that sends a few sample order to seed the market).
everything is peachy.
3. connect with 2nd OMS (different senderCompID) and it generates the NPE.
The simulator knows about that session (it's in simulator's config file).

here's the log from sending (OMS) side:
<20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session FIX.4.2:sender-1501-OMS->MRKTC-EXCH schedule is daily, 00:00:00 UTC - 00:00:00 UTC)
<20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Session state is not current; resetting FIX.4.2:sender-1501-OMS->MRKTC-EXCH)
<20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (No responder, not sending message)
11:28:00,712 DEBUG [main] marketcetera.quickfix.QuickFIXInitiator (QuickFIXInitiator.java:165) - new session created: FIX.4.2:sender-1501-OMS->MRKTC-EXCH
<20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Created session: FIX.4.2:sender-1501-OMS->MRKTC-EXCH)
Aug 18, 2006 11:28:00 AM quickfix.mina.initiator.InitiatorIoHandler sessionCreated
INFO: MINA session created: /10.1.9.112:52297
<20060818-18:28:00, FIX.4.2:sender-1501-OMS->MRKTC-EXCH, event> (Disconnecting)

and on the simulator side, the request triggers the NPE and subsequently all the other previous 2 connections get dropped:
Aug 18, 2006 11:28:01 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated
INFO: MINA session created: /66.92.9.46:52297
<20060818-18:28:01, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=6835=534=149=sender-1501-OMS52=20060818-18:28:00.92856=MRKTC-EXCH10=083)
Exception in thread "QFJ Socket Acceptor ef137d" java.lang.NullPointerException
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:131)
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:70)
at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:86)
at java.lang.Thread.run(Thread.java:595)
<20060818-18:28:12, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, incoming> (8=FIX.4.29=6635=034=849=MRKTC-DISPLAY52=20060818-18:28:12.78856=MRKTC-EXCH10=141)
<20060818-18:28:13, FIX.4.2:MRKTC-EXCH->MRKTC-DISPLAY, outgoing> (8=FIX.4.29=6735=034=1349=MRKTC-EXCH52=20060818-18:28:13.21356=MRKTC-DISPLAY10=170)
Aug 18, 2006 11:28:16 AM quickfix.mina.acceptor.AcceptorIoHandler sessionCreated
INFO: MINA session created: /66.92.9.46:52298
<20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, incoming> (8=FIX.4.29=8635=A34=149=sender-1501-OMS52=20060818-18:28:15.93056=MRKTC-EXCH98=0108=30141=Y10=164)
<20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Accepting session FIX.4.2:MRKTC-EXCH->sender-1501-OMS from /66.92.9.46:52298)
<20060818-18:28:16, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Acceptor heartbeat set to 30 seconds)
<20060818-18:28:25, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, incoming> (8=FIX.4.29=7535=034=1849=MRKTC-DEMO-LOADER-OMS52=20060818-18:28:25.00156=MRKTC-EXCH10=187)
<20060818-18:28:26, FIX.4.2:MRKTC-EXCH->MRKTC-DEMO-LOADER-OMS, outgoing> (8=FIX.4.29=7535=034=1949=MRKTC-EXCH52=20060818-18:28:26.21256=MRKTC-DEMO-LOADER-OMS10=193)
<20060818-18:28:27, FIX.4.2:MRKTC-EXCH->sender-1501-OMS, event> (Disconnecting)

I think at this point i'm rolling back to 1.0.0

steve, can you tell me what the patch may be? that way i can patch it locally and test

Comment by Steve Bate [ 19/Aug/06 ]

Toli, I will be away from my development computer until Tuesday. From memory, the patch was to quickfix.mina.acceptor.AcceptorIoHandler. Look for a section of code that processes logons before they are forwarded to the session. After that code the message is forwarded to the event handling strategy. The problem is that when a non-logon message is received before any session has been established, the quickfixSession is null. Check for the null session before forwarding it to even handling strategy. I don't have the code here so I can't give you more exact information. Looking at your log, it appears you are receiving a logout immediately after the network connection is established (before the logon). That would trigger the NPE.

Comment by Steve Bate [ 04/Sep/06 ]

I've added another task to test the RejectLogon issues that Joerg described to be sure the current changes address that problem.

Generated at Sat Nov 23 07:40:31 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.