[QFJ-252] Quickfix/J stops sending tag 141 on logon Created: 27/Sep/07 Updated: 09/Jan/08 Resolved: 22/Dec/07 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | Engine |
Affects Version/s: | 1.2.1 |
Fix Version/s: | None |
Type: | Bug | Priority: | Default |
Reporter: | André Malenfant | Assignee: | Unassigned |
Resolution: | Cannot Reproduce | Votes: | 0 |
Labels: | None |
Description |
After several logon attemps, quickfix/J stops sending tag 141 and starts increasing the sequence number which has for effect to received a msg sequence error when the logon is eventually responded. See the log excerpt below and note that there are numerous logon attemps before and after it: 8=FIX.4.39=8035=A34=149=cgiwl352=20070905-10:12:44.51056=bofaCgiwlTest98=0108=30141=Y10=154 |
Comments |
Comment by Toli Kuznets [ 27/Sep/07 ] |
Andre, |
Comment by André Malenfant [ 27/Sep/07 ] |
ResetOnLogout not specified (N) |
Comment by André Malenfant [ 27/Sep/07 ] |
Sorry, ResetOnDisconnect=Y |
Comment by Steve Bate [ 06/Oct/07 ] |
This won't be enough information for me to effectively diagnose the issue. If you start the Banzai example program with a similar configuration and with no corresponding acceptor program, do you see the same behavior? |
Comment by André Malenfant [ 08/Jan/08 ] |
We can reproduce this problem almost systematically on a production environment and this is a big problem for us as we cannot rely on automatic reconnection if connections are dropped. Most of the time, the flag is not sent for a period of time and then, all of the suden, it is sent. Sometimes it can take serveral minutes and close to an hour for the flag to be sent. I understand that you don't have enough info to troubleshoot but can you give us ideas on what data to collect (I was about to say debugging level to help investigate? |
Comment by Steve Bate [ 08/Jan/08 ] |
André, Well, I did make a suggestion back in October. This doesn't appear to be a reconnect problem. You can't send a logon message if you aren't connected. Apparently, the reconnect is working fine. For the sequence reset, QFJ will only sent send 141=Y when the expected sequence numbers are 1 for both sender and target. Notice that in your original log example the 141=Y stopped being sent when the sequence number was greater than 1. The ResetOnDisconnect flag resets the sequence numbers every time there is a disconnect and this should cause the 141=Y to be sent. Are the logon attempts timing out? That should cause a disconnect, but I'm curious. In any case, it's seems /very/ strange to be doing a sequence reset on disconnect in a production environment. Are you doing anything in your code related to logon messages (for example, explicitly initiated them or modifying them in some way)? |
Comment by André Malenfant [ 08/Jan/08 ] |
Steve, I did not try your suggestion in October because I cannot reproduce the problem in my development environment with my app. I can only reproduce in prod. The only difference beeing that the remote application is different and I have no control over it. Sorry for the confusion I meant when a logout occurs (whether scheduled or because of a connection drop) quickfixj will eventually start to issue logon messages (after the connection is reestablished, of course) and will not send the flag in all cases. I cannot find any pattern that could help understand the conditions for it. In my original log, quickfixj sends the flag 3 times and then, all of the suden, starts incrementing the sequence number and stops sending the flag. Why is this so? Why would it start incrementing the sequence number without having received a logon response from the other party? Nothing happens in between except quickfixj retrying a logon every mintue. The other case we see is the opposite. Quickfixj does not send the sequence reset flag even tough the sequence number has been reset on disconnect. Which obviously creates an issue and the other party denies the login because of the numbers beeing desynchronized. Eventually, after some time, quickfixj resets the number and starts to send the flag. Sometimes, to speed up the process I restart my app until this occurs without having done anything else than restarting. The reason that we want to reset is because the data that is transmitted is transent and thus cannot be resent. There is no point in recovering lost messages and we simply want to reset the sequence numbers when we disconnect. We are not doing anything special with regard to the login. We let quickfixj handle that. The only message we create and send explicitely are application level messages. This is a real life production issue, and it is problematic for us. |
Comment by Steve Bate [ 08/Jan/08 ] |
> In my original log, quickfixj sends the flag 3 times and then, all of the suden, starts incrementing the Like I said, QFJ will only send the sequence resets when the sender/target sequence numbers are 1. The fourth and subsequent messages in your attached log has a sequence number greater than 1. Which message store are you using? Are you using the PersistMessages=N session option? This might eliminate the need to hack the sequence numbers when no messages should ever be resent (like with market data). Can you create a timeline that includes both the logon message events and connect/disconnect events? This might give some clues. |
Comment by André Malenfant [ 08/Jan/08 ] |
Agreed but why did quickfixj started incrementing the sequence between too logon attempt? The sender was never involved in the log above. I will document more clearly an occurance of the problem and get back to you. Here is my config (I scrambled some sensitive data...): [DEFAULT] [SESSION] Thanks |
Comment by André Malenfant [ 09/Jan/08 ] |
Steve, the config I sent previously is not valid, this was a test config I used to try different settings. You should read: SendResetSeqNumFlag=Y We had the issue on 17 apps yesterday night. Here is an example: (timestamps in Europe\London) 1- It starts with a scheduled logout coming from the other party which I changed here to DEF We know have set ResetOnDisconnect=Y since it reduces the number of restarts we need to do to correct the situation to one. 8=FIX.4.39=8335=549=DEF56=ABC34=288152=20080108-22:04:5958=Scheduled logout10=107 ... around an hour later ... 8=FIX.4.39=12135=534=281249=ABC52=20080108-23:16:45.88356=DEF58=MsgSeqNum too low, expecting 2882 but received 13510=009 |
Comment by André Malenfant [ 09/Jan/08 ] |
Here is a working logout/login sequence coming from the same app with the same config: 8=FIX.4.39=8235=549=DEF56=ABC34=37052=20071120-22:04:5958=Scheduled logout10=043 |
Comment by Steve Bate [ 09/Jan/08 ] |
First, I assume that you are using QFJ 1.2.1, given the "affects version". Please tell me if that's not correct. For future comments, please replace the SOH character in the FIX message log records with a delimiter (like '|'). The delimeters are currently being lost and it makes it much more difficult and time consuming to interpret the log entries. The SendResetSeqNumFlag has no effect in QFJ 1.2.1. It remained in the documentation from an earlier version but it should not have been in the 1.2.1 documentation. I didn't mention this earlier because you were also using ResetOnDisconnect and I didn't want to confuse the issue. I recommend you remove SendResetSeqNumFlag from your configuration file to avoid any further confusion. You also did not give me the timeline of logon messages and connection events. The logs of logon messages alone do not provide enough information to diagnose the problem since I don't know when the connection is established and broken. For example, I'm not sure if there is a disconnect between Please keep in mind that free support on QFJ depends on the availability of my spare time. There are organizations that can provide commercial support for QFJ and they are listed on the web site. A few of these are in Europe. |