[QFJ-334] New messages are interleaved with old ones during a counterparty replay request Created: 14/Aug/08  Updated: 15/Nov/12  Resolved: 09/Sep/08

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.3.1
Fix Version/s: None

Type: Bug Priority: Major
Reporter: JamesM Assignee: Unassigned
Resolution: Won't Fix Votes: 0
Labels: None
Environment:

QuickFIX 1.3.1, Java 1.5.0_12, Linux RHEL 4.4.2



 Description   

We are testing a case where a counterparty disconnects from our acceptor when we are filling their execution requests. In this case, when they reconnect, we have a large number of messages for them (ie: our sender sequence number is larger than what they expected), and they initiate a resend request, as expected.

What surprised us was that if we are still sending new messages, those messages are interleaved with the resends. This causes our FIX client to drop the connection because it receives an out of sequence message during the replay.

From our log files (we are using a custom MessageStore implementation)

11:58:41,363 INFO Store - Get request from 20315 to 26359
11:58:41,501 INFO Store - set: sequence 26383 :: 8=FIX.4.1 [rest removed]

From the event log:

Thu Aug 14 11:58:11 JST 2008 Disconnecting
Thu Aug 14 11:58:40 JST 2008 Accepting session FIX.4.1:JPN10000XXXX->JPN123456789 from /192.168.1.2:33299
Thu Aug 14 11:58:40 JST 2008 Acceptor heartbeat set to 30 seconds
Thu Aug 14 11:58:40 JST 2008 Refreshing message/state store at logon
Thu Aug 14 11:58:40 JST 2008 Received logon request
Thu Aug 14 11:58:40 JST 2008 Responding to logon request
Thu Aug 14 11:58:41 JST 2008 Received ResendRequest FROM: 20315 TO: 999999
Thu Aug 14 11:58:41 JST 2008 Resending Message: 20315
Thu Aug 14 11:58:41 JST 2008 Resending Message: 20316
Thu Aug 14 11:58:41 JST 2008 Resending Message: 20317
Thu Aug 14 11:58:41 JST 2008 Resending Message: 20318
Thu Aug 14 11:58:41 JST 2008 Resending Message: 20319

However, the sequence of messages the counterparty received was: 21035, 26383, 21036. The counterparty immediately disconnected upon receiving 26383, because it was out of sequence; they were expecting 20316.

Doesn't this seem strange? I know that in the inbound direction [ie: if we, the acceptor, were the one making the resend request] we would queue 26383 in QuickFIX and get it after the resend is complete. Or is it just something that all FIX clients have to be able to cope with, getting new messages interleaved in a resend request, and then queuing them until they are ready to deal with it?



 Comments   
Comment by JamesM [ 15/Aug/08 ]

I was able to prevent this problem by adding an additional event to SessionStateListener so that my application is notified when a resend starts, and when it finishes.

Using this, I was able to lock out the 'send' method of our application so that it doesn't attempt to send messages during this time.

It seems like there may be a race condition for the socket; new messages seem to be interleaved between socket write() calls with those being sent by the resend. While the window for this race condition to occur is very small, it definitely seems to be a possibility if the message rate is high enough during the processing of the counterparty's resend request.

Comment by Steve Bate [ 15/Aug/08 ]

That's an interesting solution. Like you said, this is something the receiving FIX engine would usually handle. A sequence number that's too high should not cause a disconnect. If that high sequence number arrives during resend processing, then it would usually be queued until the gap is filled. A FIX engine doesn't necessarily require that the messages are received in order, but it will guarantee that it won't deliver the messages to the application out of order.

Generated at Sat Nov 23 13:57:57 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.