[QFJ-668] Disconnecting: Timed out waiting for heartbeat, after exchanging the Test messages. Created: 15/Feb/12  Updated: 04/Nov/16  Resolved: 09/Aug/12

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Abcedf12345 Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: QuickfixJ
Environment:

Linux and Windows


Issue Links:
Duplicate
duplicates QFJ-624 Invalid "Timed out waiting for heartb... Closed
is duplicated by QFJ-759 "Timed out waiting for heartbeat" aft... Closed

 Description   

Looks like many blogs are around this, but no resolution. Here is log.

<20120214-20:00:09, FIX.4.4:COMP1_OUT->COMP2_IN, outgoing> (8=FIX.4.4^A9=84^A35=1^A34=1154^A49=COMP1_OUT^A52=20120214-20:00:09.303^A56=COMP2_IN^A112=TEST^A10=098^A)
<20120214-20:00:09, FIX.4.4:COMP1_OUT->COMP2_IN, event> (Sent test request TEST)
<20120214-20:00:09, FIX.4.4:COMP1_OUT->COMP2_IN, incoming> (8=FIX.4.4^A9=83^A35=0^A34=998^A49=COMP2_IN^A52=20120214-20:00:08.908^A56=COMP1_OUT^A112=TEST^A10=073^A)
<20120214-20:00:10, FIX.4.4:COMP1_IN->COMP2_OUT, outgoing> (8=FIX.4.4^A9=74^A35=0^A34=942^A49=COMP1_IN^A52=20120214-20:00:10.303^A56=COMP2_OUT^A10=026^A)
<20120214-20:00:14, FIX.4.4:COMP1_IN->COMP2_OUT, incoming> (8=FIX.4.4^A9=74^A35=0^A34=976^A49=COMP2_OUT^A52=20120214-20:00:13.776^A56=COMP1_IN^A10=050^A)
<20120214-20:00:29, FIX.4.4:COMP1_IN->COMP2_OUT, outgoing> (8=FIX.4.4^A9=83^A35=1^A34=943^A49=COMP1_IN^A52=20120214-20:00:29.302^A56=COMP2_OUT^A112=TEST^A10=055^A)
<20120214-20:00:29, FIX.4.4:COMP1_IN->COMP2_OUT, event> (Sent test request TEST)
<20120214-20:00:29, FIX.4.4:COMP1_IN->COMP2_OUT, incoming> (8=FIX.4.4^A9=83^A35=0^A34=977^A49=COMP2_OUT^A52=20120214-20:00:28.906^A56=COMP1_IN^A112=TEST^A10=070^A)
<20120214-20:00:36, FIX.4.4:COMP1_OUT->COMP2_IN, error> (Disconnecting: Timed out waiting for heartbeat)



 Comments   
Comment by Christoph John [ 15/Feb/12 ]

Can you reproduce this issue reliably? If yes, also with the current version (1.5.2)? Can you supply a test case?

Moreover, I noticed there is already QFJ-624 with a similar issue so I will close this ticket as a duplicate.

Comment by Abcedf12345 [ 15/Feb/12 ]

Looks like QFJ-624 related to version 1.3.0.
However I don't think it is reproduciable, happens occasionally, following the exchange of "Test" messageses as depcited in log.

Comment by Christoph John [ 15/Feb/12 ]

Yes, QFJ-624 relates to version 1.3.0. But since the issue is still open I would guess that the problem was inherited since then.
Are you able to provide a little more context (e.g. the full log file)? What is your QFJ configuration? Does the problem go away after the Logout?

Comment by Abcedf12345 [ 15/Feb/12 ]

Correct, problem goes away in subsequent login.

QFJ configuration is:
[default]
ConnectionType=acceptor
StartTime=00:00:00
EndTime=00:00:00

[session]
BeginString=FIX.4.4
SenderCompID=COMP1_IN
TargetCompID=COMP2_OUT
SocketAcceptPort=XXXXXX
RefreshOnLogon=Y
UseDataDictionary=Y
DataDictionary=/apps/fix/fg/fix44.xml
FileStorePath=/apps/fix/msg_store/COMP1_COMP2

[session]
BeginString=FIX.4.4
SenderCompID=COMP2_IN
TargetCompID=COMP1_OUT
SocketAcceptPort=XXXXXX
RefreshOnLogon=Y
UseDataDictionary=Y
DataDictionary=/apps/fix/fg/fix44.xml
FileStorePath=/apps/fix/msg_store/COMP2_COMP1

Comment by Christoph John [ 16/Feb/12 ]

OK, again. Can you test if the problem also occurs with the most current QF/J version 1.5.2? Can you provide the full log file? Or at least all non-business messages?

Comment by Abcedf12345 [ 16/Feb/12 ]

Log trail after the above one:

===========================
15:00:36.715 [SocketAcceptorIoProcessor-0.0] INFO q.mina.acceptor.AcceptorIoHandler - MINA session created: /10.163.40.238:53725
<20120214-20:00:40, FIX.4.4:COMP1_OUT->COMP2_IN, incoming> (8=FIX.4.4^A9=86^A35=A^A34=999^A49=COMP2_IN^A52=20120214-20:00:40.231^A56=COMP1_OUT^A98=0^A108=30^A10=086^A)
<20120214-20:00:40, FIX.4.4:COMP1_OUT->COMP2_IN, event> (Accepting session FIX.4.4:COMP1_OUT->COMP2_IN from /10.163.40.238:53725)
<20120214-20:00:40, FIX.4.4:COMP1_OUT->COMP2_IN, event> (Acceptor heartbeat set to 30 seconds)
<20120214-20:00:41, FIX.4.4:COMP1_OUT->COMP2_IN, error> (quickfix.SessionException Logon state is not valid for message (MsgType=0))
<20120214-20:00:41, FIX.4.4:COMP1_OUT->COMP2_IN, error> (Disconnecting: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=0))
================

Comment by Abcedf12345 [ 17/Feb/12 ]

Unfortunately it is not reproduciable to try with new version

Comment by another day [ 21/Aug/12 ]

Hi, I am also looking for a solution on this. I consider using ThreadedSocketInitiator instead of SocketInitiator.
Any opinions on this ?

Any other ideas ? I have no clue how to reproduce this behaviour, it occurs with the latest version of QuickFix/J, albeit rarely.

Thanks!

Comment by Enrico Petracchi [ 07/May/13 ]

I have the same issue on one environment with about 10 fix session.
A few days ago all sessions detected an heartbeat missing condition while they were receiving about 10 messages/sec.
I added some logs to the code related to heartbeat missing detection and I found all sessions detecting a lastReceivedTime= 90 seconds plus some milliseconds. All log timestamps are correct (no drift observed); it seems the lastReceivedTime variable has been modified adding 90 seconds or the time read from System.currentTimeMillis was wrong. But the strange things is it happened on all session
Still investigating....

Comment by Andy Tai [ 04/Nov/16 ]

May I know if you have found any solution to this issue? We just encountered it again.
Why did QFJ send a test request in the first place?
It seemed that QFJ will not be able to accept reconnection request, after test request is sent out.
If QFJ cannot handle test request properly, can we disable it or delay it?

Thanks.

Comment by Christoph John [ 04/Nov/16 ]

Hi Andy Tai,
which version of QFJ are you using? How many FIX sessions are handled in your application? How often does this occur? Do you have log files that you could attach?

You could set DisableHeartBeatCheck=Y to completely disable heartbeat/test request checking. You could also set TestRequestDelayMultiplier=1 (default is 0.5) which means that the TestRequest will be sent some seconds later.

Chris.

Generated at Sat Nov 23 11:23:44 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.