[QFJ-759] "Timed out waiting for heartbeat" after receiving and sending TestRequest message Created: 13/Nov/13  Updated: 04/Nov/16  Resolved: 13/Feb/14

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.5.0
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Wongsakorn Chantrapornsyl Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: QuickfixJ, testRequest, timeout
Environment:

Microsoft Windows Server


Issue Links:
Duplicate
duplicates QFJ-624 Invalid "Timed out waiting for heartb... Closed
duplicates QFJ-668 Disconnecting: Timed out waiting for ... Closed

 Description   

The FIX application received the disconnection event from the FIX QuickJ after received and sent the TestRequest message so that the FIX application cannot reply the TestRequest message back to the FIX Server. Then the connection was disconnected.

From the FIX application log, it seems that FIX QuickJ received the HeartBeat message at sequence number 31934 (HeartBeat is not diaplayed in the log) but somehow FIX QuickJ does not reply the HeartBeat message back to the FIX Server.

Message log
----------------------------------------------------------------------------------------------
nformation 2013/11/12 10:04:12 Event Initiated logon request
Information 2013/11/12 10:04:11 Event Disconnecting
Warning 2013/11/12 10:04:11 Event Disconnecting: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=AE)
Warning 2013/11/12 10:04:11 Event quickfix.SessionException Logon state is not valid for message (MsgType=AE)
Information 2013/11/12 10:04:11 Event Sent test request TEST
Information 2013/11/12 10:04:11 Message <outgoing> (8=FIXT.1.19=15435=A34=3240849=TMSQ03226252=20131112-01:04:11.61456=TR MATCHING57=FXM142=TRFXMJP53776xxx98=0108=4553=DCUA554=*****789=319331137=91407=10010=055)
Information 2013/11/12 10:04:11 Event No responder, not sending message: 8=FIXT.1.19=10435=134=3240749=TMSQ03226252=20131112-01:04:11.11456=TR MATCHING57=FXM142=TRFXMJP53776xxx112=TEST10=167
Information 2013/11/12 10:04:11 Event Disconnecting
Warning 2013/11/12 10:04:11 Event Disconnecting: Timed out waiting for heartbeat
Information 2013/11/12 10:04:11 Message <outgoing> (8=FIXT.1.19=10435=134=3240749=TMSQ03226252=20131112-01:04:11.11456=TR MATCHING57=FXM142=TRFXMJP53776xxx112=TEST10=167)
Information 2013/11/12 10:04:10 Message <incoming> (8=FIXT.1.19=9035=134=319351128=949=TR MATCHING56=TMSQ03226252=20131112-01:04:10.409112=HBTO-3193510=223)
Information 2013/11/12 10:04:04 Message <incoming> (8=FIXT.1.19=65635=AE57=efx50=FXM34=319331128=949=TR MATCHING56=TMSQ03226252=20131112-01:04:04.6411003=267623544487=0573=0856=2325=Y828=54150=F880=29102046217=52d16b65-2909-0000-0000-0014c23f2fa5423=2075=2013111255=aud/usd167=FXSPOT60=20131112-01:04:04.51364=2013111415=AUD120=USD231=100000032=1854=131=0.9351056=9350001116=21117=BTMU TOKYO1118=C1119=131120=21121=TMSQ1122=251121=FTUA1122=21117=CITIBANK LONDON.1118=C1119=561120=11121=CITV1122=25552=154=21630=11631=01632=11633=5761634=USD1158=21164=5781=1782=SETTLEMENT THROUGH CLS783=C784=271164=4781=1782=CLS783=C784=271057=N37=10708811=1:298:1816938=110=117)
----------------------------------------------------------------------------------------------

We need to check with the FIX QuickJ why the FIX QuickJ does not response the HeartBeat message and notify the disconnection event improperly.



 Comments   
Comment by Wongsakorn Chantrapornsyl [ 13/Nov/13 ]

OS: Windows Server 2008 R2 (64-bit) SP1

Comment by Christoph John [ 15/Nov/13 ]

Is this behaviour reproducible? Looks to me like QFJ noticed the heartbeat timeout at the same time as the counterparty and dropped the connection and because of this did not reply back to the TestRequest. As you can see QFJ even did not send out its own TestRequest because the connection dropped before:

Information 2013/11/12 10:04:11 Event No responder, not sending message: 8=FIXT.1.19=10435=134=3240749=TMSQ03226252=20131112-01:04:11.11456=TRMATCHING57=FXM142=TRFXMJP53776xxx112=TEST10=167

Comment by Wongsakorn Chantrapornsyl [ 15/Nov/13 ]

We check the sniffer log. The server sent the heartbeat at 01:04:08 but there is no response from the QFJ. Then the server sent TestRequest at 01:04:10 and the disconnection is initiation by QFJ at 01:04:11.

Comment by Wongsakorn Chantrapornsyl [ 15/Nov/13 ]

It is not reproducible. It occur only once.

However, we worry that it may occur again anytime.

Comment by Christoph John [ 15/Nov/13 ]

Unfortunately, there are no milliseconds in the log. It might be that the testrequest was received at 01:04:10.999 and QFJ disconnected at 01:04.11.000. Since the disconnection is handled by a different thread than the message processing thread this might occur.

If this happened only once I would not really worry about it.

Generated at Sat Nov 23 07:21:19 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.