[QFJ-849] SocketInitiator does not stop properly Created: 10/Jun/15 Updated: 17/Jul/17 Resolved: 22/Mar/16 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | Engine |
Affects Version/s: | 1.6.0 |
Fix Version/s: | 1.6.2 |
Type: | Bug | Priority: | Critical |
Reporter: | Or Ming Chun | Assignee: | Christoph John |
Resolution: | Fixed | Votes: | 4 |
Labels: | QuickfixJ, session | ||
Environment: |
Tested on Mac & Ubuntu |
Attachments: | jstack-162-SNAPSHOT-20160312.txt quickfixJ_1.6.1_TD.txt socket.initiator.stop(false).txt socket.initiator.stop(true)-original-before-and-after.txt socket.initiator.stop(true)-original.txt socket.initiator.stop(true).txt | ||||||||
Issue Links: |
|
Description |
When creating a simple Quickfixj Initiator app and after the session has successfully been created and Logon, I called initiator.stop() and exit the program. But the JVM hang and cannot exit after ~60s |
Comments |
Comment by Christoph John [ 11/Jun/15 ] |
Two things:
|
Comment by GARY LO [ 11/Jun/15 ] |
Hi Christoph John, 1. Same as stop(false), it will hangs about 60s (with no logging) before exit 2. Both stop(false) & stop(true) thread dump is attach during the hangs period |
Comment by Christoph John [ 11/Jun/15 ] |
Hmm, how do you exit the program? There is no trace of any QFJ-related code in the thread dump? For reference, could you create a thread dump while the program is running? |
Comment by GARY LO [ 11/Jun/15 ] |
Sorry that I provided wrong log that used my customized quickfixJ version that try to shun down all the executor service when I can initiator.stop() Now, fall back to 1.6.0 original version, problem still exist |
Comment by Christoph John [ 11/Jun/15 ] |
Still can't see nothing QFJ-related in the dump, not even user code. Only Java classes. Could you please provide a dump while the process is running? |
Comment by GARY LO [ 11/Jun/15 ] |
I am using "jstack -l [pid]" to get the thread dump, is that what you want? added thread dump of before & after initiator.stop(true) |
Comment by Christoph John [ 12/Jun/15 ] |
Yes, jstack is correct. If your program does exit after 60 seconds it seems that it waits for the thread in the ScheduledThreadPoolExecutor to die. But actually the QFJ Timer thread is a daemon thread and should not keep the JVM from shutting down. Could you try with a newer / older / just another JVM version? I have not encountered this problem yet. |
Comment by Or Ming Chun [ 12/Jun/15 ] |
we are working or Java 1.8 $ java -version is this information helpful? |
Comment by Christoph John [ 12/Jun/15 ] |
How do you exit your program? Could you try with a newer / older / just another JVM version? |
Comment by Or Ming Chun [ 12/Jun/15 ] |
does it matter on what jvm version I use to build the quickfixj jar? I am using Java 1.8 too build quickfixj too |
Comment by Christoph John [ 12/Jun/15 ] |
I don't think that it matters. IMHO it could be dependent on the JVM that is used to run the program. It is only a guess since I never saw this error. But it could be also dependent on other things in your code/environment/etc. But I assume it is a simple program that is run on the command line and not something that is running in Spring, an app server, an OSGi environment or the like??! |
Comment by Rohit [ 01/Sep/15 ] |
We see the same thing after we upgraded to QuickFix 1.6.1. |
Comment by Steve Nardone [ 03/Sep/15 ] |
I was able to resolve the problem by adding this statement to IoSessionInitiator.stop(): synchronized void stop() { // Shutdown executor created by ioConnector (indirectly). This disposes the executor created by the default constructor of NioSocketConnector (mina) instantiated in the IoConnector constructor. That executor (ExecutorService.newCachedThreadPool) has a 60s keepAliveTime. During shutdown that thread is blocked waiting until the next interval to check for more tasks. |
Comment by Martin Vrábel [ 09/Sep/15 ] |
Also spotted this issue both in version 1.6.1 and 1.7.0-SNAPSHOT |
Comment by Rohit [ 25/Sep/15 ] |
I was able to workaround this by avoiding the call to SocketInitiator.Stop() This extends to SocketAcceptors as well if (socketInitiator != null) { - socketInitiator.stop(); + Iterator<SessionID> sessionIds = socketInitiator.getSessions().iterator(); + while (sessionIds.hasNext()) { + SessionID sessionId = sessionIds.next(); + Session.lookupSession(sessionId).logout("user requested"); + } + while (socketInitiator.isLoggedOn()) { + try { + Thread.sleep(500); + } catch (InterruptedException e) { + //Obfuscated + } + } + acceptor = null; } - //Obfuscated + //Obfuscated + //Obfuscated + } //ObfuscatedApplication = null; |
Comment by Christoph John [ 25/Sep/15 ] |
Hi, what about Steve Nardone 's solution a few comments above? Looks cleaner to me. Although I have not tested it yet. |
Comment by Rohit [ 25/Sep/15 ] |
I haven't tried that out yet. I am pulling the QuickFixJ 1.6.1 dependency straight out of the maven repository and not building my own . |
Comment by Christoph John [ 29/Dec/15 ] |
Hi Steve Nardone, Rohit, Martin Vrábel: could you test if that works for you? Thanks, |
Comment by Colin DuPlantis [ 12/Mar/16 ] |
We have encountered this issue, too, in 1.6.1, but only on machines where the "is_production_machine" flag is set to true. We have not been able to reproduce the problem outside of production. The production machine is a 16-core RedHat distro based on 2.6. Our test machines tend to be 3.x based on other distros, and no more than 8 cores, generally 2 or 4. Not sure if this is relevant in the reproduction or not. We built and deployed the latest 1.6.2-SNAPSHOT to our (Marketcetera) repo, which has this fix included, but this did not solve the problem. The problem seems to occur when you're trying to shutdown the initiator while it's busy receiving messages. I've uploaded a relevant jstack output that shows the problem in 1.6.2-SNAPSHOT. |
Comment by Christoph John [ 14/Mar/16 ] |
Hi Colin, thanks for the stack trace. It helps a lot. It seems that this is a similar issue as described in this comment: http://www.quickfixj.org/jira/browse/QFJ-854?focusedCommentId=12737&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12737 Edit: I will try to integrate this into 1.6.2-SNAPSHOT in the next days and let you know. |
Comment by Colin DuPlantis [ 14/Mar/16 ] |
It looks like we were able to work around the problem by manually disconnecting each session in the initiator before stopping it. |
Comment by Christoph John [ 16/Mar/16 ] |
I have made the promised changes. Could you please check if it resolves your problem? |
Comment by Colin DuPlantis [ 16/Mar/16 ] |
The problem wasn't 100% reproducible. With that caveat, we were unable to reproduce the problem at all with the changes. |
Comment by Christoph John [ 22/Mar/16 ] |
Fair enough. Please feel free to report if problem appears again. |