[QFJ-54] Occasional NPE in block method during acceptor start. Created: 17/Aug/06 Updated: 18/Sep/08 Resolved: 04/Sep/06 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | Networking |
Affects Version/s: | 1.0.1, 1.0.2 |
Fix Version/s: | 1.0.3 |
Type: | Bug | Priority: | Default |
Reporter: | Toli Kuznets | Assignee: | Steve Bate |
Resolution: | Fixed | Votes: | 0 |
Labels: | None |
Attachments: | logFile.txt threadDump.txt |
Description |
From Toli Kuznets... I have a basic app that uses QFJ to communicate. I have a bunch of uni However, as soon as i tried switching to 1.0.2 i get the following error: There's no nested exception, and it's a thread created by QFJ and not my app. Switching the code back to using 1.0.0-final fixes the problem. I Any ideas on what could be causing this? ------------------------------------------------------------------------------------------------------------------------------- Steve, the same happened to me. Today I got with version 1.0.1 this error: [2006-08-15 17:41:47,472] [INFO ] [quickfix.mina.acceptor.AcceptorIoHandler] (SocketAcceptor-0) MINA If I can reproduce this error tomorrow, I will send you more details. |
Comments |
Comment by Steve Bate [ 17/Aug/06 ] |
I've found a way that this error could occur. This is the scenario... 1. An acceptor It's easy to check for this condition and ignore the message with a warning, but |
Comment by Jörg Thönnes [ 17/Aug/06 ] |
My scenaria is as follows: 1. An initiator process continuously tries to connect to the acceptor. Up to now, I did no further analysis, but my guess is that this is somehow related to the RejectLogon. Does this make sense? |
Comment by Toli Kuznets [ 18/Aug/06 ] |
Steve, I think what you described is similar to what is happening with me. In my unit test i created an Acceptor which starts up, and then i create an initiator which immediately tries to connect to the acceptor. The acceptor is basically a thin wrapper around quickix.SocketAcceptor, and the intiator is just a wrapper around quickfix.SocketInitiator (code is at http://trac.marketcetera.org/trac.fcgi/browser/platform/trunk/core/src/main/java/org/marketcetera/quickfix/QuickFIXInitiator.java). Looking at the log above the exception, i'm now seeing that it looks like Exchange (acceptor) stars up, but then rejects the incoming message: For example, i don't see these 2 lines when the tests pass: From the stacktrace at time of bug, it appears that my unit test is waiting for the onLogon() event to happen in the Initator. stack-trace attached, and more logging (above included) is also attached. But overall, it seems that it could be similar to what Steve is describing. if that makes any difference. |
Comment by Toli Kuznets [ 18/Aug/06 ] |
Thread dump of the unit test that has a race-condition exhibiting the NPE |
Comment by Toli Kuznets [ 18/Aug/06 ] |
Log file for the unit test exhibiting the NPE |
Comment by Toli Kuznets [ 18/Aug/06 ] |
Not sure if this helps, but i have another set of logs and a use case 1. start my exchange simulator here's the log from sending (OMS) side: and on the simulator side, the request triggers the NPE and subsequently all the other previous 2 connections get dropped: I think at this point i'm rolling back to 1.0.0 steve, can you tell me what the patch may be? that way i can patch it locally and test |
Comment by Steve Bate [ 19/Aug/06 ] |
Toli, I will be away from my development computer until Tuesday. From memory, the patch was to quickfix.mina.acceptor.AcceptorIoHandler. Look for a section of code that processes logons before they are forwarded to the session. After that code the message is forwarded to the event handling strategy. The problem is that when a non-logon message is received before any session has been established, the quickfixSession is null. Check for the null session before forwarding it to even handling strategy. I don't have the code here so I can't give you more exact information. Looking at your log, it appears you are receiving a logout immediately after the network connection is established (before the logon). That would trigger the NPE. |
Comment by Steve Bate [ 04/Sep/06 ] |
I've added another task to test the RejectLogon issues that Joerg described to be sure the current changes address that problem. |