[QFJ-71] Initiator's application.onLogout() is not called if Acceptor connection is forcibly closed Created: 18/Sep/06  Updated: 12/Oct/06  Resolved: 22/Sep/06

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.0.3
Fix Version/s: 1.0.4

Type: Bug Priority: Major
Reporter: Rob Gilliam Assignee: Steve Bate
Resolution: Fixed Votes: 0
Labels: None
Environment:

Java2 RE 1.4.2, Windows XP, Eclipse 3.1


Issue Links:
Relates
relates to QFJ-82 Reconnection not working in 1.0.3 be... Closed

 Description   

Set up two QuickFIX/J applications (one as acceptor, on as initiator) and connect them together, then terminate/kill the acceptor application (i.e. as if it had crashed) - the inititator application gets the "Disconnecting" event but its implementation of application.onLogout() is not called

Diagnostic investigation suggests that Session.disconnect() is not returning from the line

responder.disconnect();

Terminating the initiator application in the same way doesn't have the same problem - the acceptor gets the "Disconnecting" event, followed by a call to its implementation of application.onLogout() as expected.

I tried re-building both applications on QFJ 1.0.2 and doing the same tests; this seems to work OK for both sides, so I guess something's changed between the two releases.



 Comments   
Comment by Christian Braeuner [ 19/Sep/06 ]

Hi, I've come across the same problem.
This seems to have been fixed in mina.0.9.5 library. Is there a chance that we can get a quickfix that is compiled with the latest mina release?
Thanks,
Christian

Comment by Steve Bate [ 20/Sep/06 ]

I want to upgrade to 0.9.5 (or even 1.0 if it's released in time) for the QFJ 1.1 release. I'm currently not sure if there are compatibility issues between 0.9.3 and 0.9.5 that require changes to QFJ? Do either of you have time to investigate the needed changes (if any)? I'd basically like to be sure the unit and acceptance tests pass with the 0.9.5 MINA version.

Comment by Rob Gilliam [ 21/Sep/06 ]

I have an issue with the "upgrade to mina 0.9.5 and it'll go away" approach, which is that it risks addressing the symptom, rather than the cause.

As I said above, this problem is not evident when I run the same applications with QFJ 1.0.2 (also based on mina 0.9.3), so something has changed between the two releases of QuickFIX/J, and that's a concern.

As for doing the testing against mina 0.9.5: unfortunately I really don't have the time myself right now and can't see it happening in the immediate future. My project isn't live yet and getting the last few features implemented and working is more pressing than handling the un-expected disconnects right now (sorry).

Comment by Steve Bate [ 21/Sep/06 ]

I've found the source of the problem. In the IoSessionResponder.disconnect() method, the MINA close operation is now followed by a join so that disconnect will not return until the asynchronous close operation is complete. Apparently, calling minaSession.close().join() blocks the calling thread and doesn't allow the asynchronous event to be processed. This isn't expected behavior from my perspective but I need to check with the MINA team to see if this is what they'd expect.

The problem still exists with 0.9.5 so I'm not sure if Christian is seeing a different issue or not.

To restore the 1.0.2 behavior you can remove the join() call after the close call in the disconnect method.

Comment by Steve Bate [ 22/Sep/06 ]

ALthough the MINA behavior related to close().join() is not considered a bug, it is error prone and can lead to unexpected thead blocking depending on the thread calling the code. Although we'd prefer a synchronous return from close(), it's not strictly necessary and given the danger of thread blocking I'm removing the join calls in the SVN branch and the trunk.

Comment by Brad Harvey [ 22/Sep/06 ]

I think the reason it happens on the initiator but not acceptor is because MINA's thread pool filter was removed from the initiator (QFJ-34).

I tried commenting out the ThreadModel.MANUAL line in IoSessionInitiator and it seemed to solve this problem - Responder.disconnect() was called in one of the AnonymousIoService threads instead of the main SocketConnectorIoProcessor thread.

The MINA issue causing the thread leak was resolved in 0.9.5 so potentially the default ThreadModel could be used again if join() is the more desirable behaviour.

Comment by Steve Bate [ 22/Sep/06 ]

OK, that's good to know. It makes sense that if a thread pool is used then the close() and join() won't happen in the io processor thread. I added the joins because I thought it was logically more correct behavior. However, allowing the close to be asynchronous wasn't causing any known problems so it's probably ok to leave them out. Thanks again for the information.

Generated at Sat Nov 23 11:32:03 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.