[QFJ-700] QFJ Timer and QFJ message processor try to execute generateLogout() at the same time, resulting in QFJ Timer to stop Created: 16/Aug/12 Updated: 08/Apr/14 Resolved: 10/Oct/12 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | Engine |
Affects Version/s: | 1.5.2 |
Fix Version/s: | None |
Type: | Bug | Priority: | Critical |
Reporter: | Martijn Stokhof | Assignee: | Christoph John |
Resolution: | Duplicate | Votes: | 0 |
Labels: | QFJ, SessionSchedule, Timer, session | ||
Environment: |
Microsoft Windows [Version 6.1.7601] java version "1.7.0_03" |
Attachments: | thread_dump_QFJ-700-1-after.mdd.txt thread_dump_QFJ-700-2-after.sfx.txt | ||||||||||||
Issue Links: |
|
Description |
Our application runs 24/7. We have started using a weekly SessionSchedule (Sunday 17:02 - Friday 16:59:50). In the out-of-schedule period, several clients try to connect, every 2 seconds, sending Logon messages to our application. As far as I know, the QFJ Timer regularly checks if the Session is already within schedule and if not, resets the Session. In this out-of-schedule period, it happens that both QFJ Timer and QFJ message processor try to execute Session.generateLogout(...) at the same time (within the same millisecond). QFJ timer does that as result from the regular Session-check, QFJ MessageProcessor as a result from the received Logon message. None of the two is able to generate the logout and QFJ Timer stops checking the Session(s) regularly. At the SessionSchedule start, the Session(s) is not valid because it has not been reset since the unsuccesful Session.generateLogout(...). No client can Logon because "Logon rejected: quickfix.RejectLogon: Logon attempt not within session time". This is a critical problem: we have to stop and restart our application to reset all Sessions. To investigate and pinpoint this, I have added several debug-statements to get to this conclusion. I would like to stress that we experience this every out-of-schedule-period where clients try to logon continously and the Session.generateLogout(...) is called within the same millisecond. It is reproducable, but you might have to wait for several hours to see this happening. I cannot oversee the whole issue yet, but it seems that some sort of synchronisation is lacking the generateLogout in this case. I am able to provide log files and help you investigating this. |
Comments |
Comment by Steve Bate [ 17/Aug/12 ] |
Are there any exceptions in the log? |
Comment by Martijn Stokhof [ 17/Aug/12 ] |
Hi Steve, No, I forgot to mention that, no exceptions in any of the log files. |
Comment by Christoph John [ 17/Aug/12 ] |
Hi Martijn, |
Comment by Martijn Stokhof [ 17/Aug/12 ] |
Hi Chris, I will do my best. As said, it takes some time to reproduce the problem, but eventually it will occur. When I have more information I will attach it here. Thank you. Martijn. |
Comment by Martijn Stokhof [ 17/Aug/12 ] |
Thread-dump 1 and 2, both from after the problem has occured. MDD and SFX are acronyms for two of our applications. |
Comment by Martijn Stokhof [ 17/Aug/12 ] |
I see a deadlock in both Thread-dumps. These follow (well, kind of ) what I described in the problem description at a first glance. |
Comment by Christoph John [ 17/Aug/12 ] |
thanks, looks promising. I wonder if this is the same issue as in |
Comment by Christoph John [ 11/Sep/12 ] |
Did you have a chance to reproduce this with a trunk build? |
Comment by Martijn Stokhof [ 11/Sep/12 ] |
Hi Chris, I have not tried. I am not connected to the Quickfix repository for a trunk build. Can you tell me where to download the latest trunk build or send it to me by email, preferrably with source code included so I can add some extra debug statements? I will then use that build for some extra tests. Thank you. Kind regards, Martijn Stokhof. |
Comment by Martijn Stokhof [ 17/Sep/12 ] |
Hi Chris, Good news: this does not happen anymore with quickfix-trunk build. I have used a quickfix-trunk build the last couple of days. The SessionSchedule ended on Friday and restarted on Sunday (48 hours later). As described in the original text of this issue, a client continuously tried to logon. Previously we have seen that this caused the QFJ Timer to stop. The latest test turned out to be successful: the QFJ Timer has not stopped and at the start of the SessionSchedule, the client logged on immediately, as where previously the client was refused to logon because the time was not in the SessionSchedule. We will continue to monitor this, using the Quickfix-trunk build. So far so good! Thank you for your help and suggestions so far. Kind regards, Martijn Stokhof. |
Comment by Christoph John [ 17/Sep/12 ] |
hi Martijn, |