[QFJ-947] Logout message with additional field is not logged Created: 26/Mar/18  Updated: 27/Mar/18

Status: Open
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.6.4
Fix Version/s: None

Type: Bug Priority: Default
Reporter: Nikolai Kulakov Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Linux



 Description   

I came across a situation where a Logout messages that have some additional field are not logged anywhere, so it seems that the connection is breaking before the Logout message receiving. Only from our counterparty I found out that they are sending a Logout messages indeed, and my own network traffic analyzing confirmed it.
Logged messages:

8=FIX.4.2^A9=80^A35=A^A34=1^A49=fxw2^A52=20180326-14:16:20.879^A56=JLQD^A98=0^A108=30^A554=password^A10=215^A
8=FIX.4.2^A9=80^A35=A^A34=2^A49=fxw2^A52=20180326-14:16:23.840^A56=JLQD^A98=0^A108=30^A554=password^A10=207^A
...

Logged events:

<2018-03-26 10:16:19.751 -0400> <Session FIX.4.2:fxw2->JLQD:MARKET_DATA schedule is weekly, SUN 21:30:00-UTC - SUN 21:00:00-UTC (weekly, SUN 17:30:00-EDT - SUN 17:00:00-EDT)>
<2018-03-26 10:16:19.752 -0400> <Created session: FIX.4.2:fxw2->JLQD:MARKET_DATA>
<2018-03-26 10:16:19.833 -0400> <Configured socket addresses for session: [/some_ip:port]>
<2018-03-26 10:16:19.863 -0400> <MINA session created: local=/172.20.30.34:33369, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/some_ip:port>
<2018-03-26 10:16:20.882 -0400> <Initiated logon request>
<2018-03-26 10:16:20.885 -0400> <Disconnecting: Socket exception (/some_ip:port): java.io.IOException: Connection reset by peer>
<2018-03-26 10:16:22.888 -0400> <MINA session created: local=/172.20.30.34:33372, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/some_ip:port>
<2018-03-26 10:16:23.842 -0400> <Initiated logon request>
<2018-03-26 10:16:23.845 -0400> <Disconnecting: Socket exception (/some_ip:port): java.io.IOException: Connection reset by peer>
...

I read a pure network traffic and had seen that immediately after out Logon we actually receive the following Logout messages:

8=FIX.4.2^A9=0109^A35=5^A34=88^A49=JLQD^A52=20180326-14:16:20.886^A56=fxw2^A789=51^A58=MsgSeqNum too low, expecting 51 but received 1^A10=075^A
8=FIX.4.2^A9=0109^A35=5^A34=89^A49=JLQD^A52=20180326-14:16:23.846^A56=fxw2^A789=51^A58=MsgSeqNum too low, expecting 51 but received 2^A10=076^A
...

I think, this incoming Logout messages must be logged as well as our outcoming Logon messages, but this does not happen. I set logging level to the minimal possible values, but this did not help: there are no this Logout messages anywhere.
The additional tag 789 was added to FIX dictionary, so I think it should be processed correctly. But at any case, whether this Logout message valid or not, it should be presented in logs.
Thanks!



 Comments   
Comment by Nikolai Kulakov [ 26/Mar/18 ]

Of course, it would be great to see this Logout messages not only in the logs, but and in the quickfix.Application#fromAdmin as it usually happens - it allows us to handle this Logout message and set the expected sequence number (I already did this with some other counterparties).

Comment by Christoph John [ 26/Mar/18 ]

I think the problem is that the session is disconnected at 2018-03-26 10:16:20.885 (see event log) but the Logout message is coming in at 20180326-14:16:20.886 (SendingTime on message in your network log). This means the session is disconnected 1ms before the counterparty sends the Logout message (given that both your and the counterparty's clock is in sync).

This can't be a general problem because this kind of disconnection (message seqnum too low) happens quite often with some counterparties and the messages show up in the log files. I don't know of any occurrences with current QuickFIX/J versions.

So unless we can reproduce this in a unit test we cannot do anything about it.

Sorry, but I can only repeat my statement from the other issues: please please please write to the mailing list before opening possible "bug" tickets. That way we can clarify if there really is a bug before issues are opened which in the end is saving your and our time.

Thanks,
Chris.

Comment by Nikolai Kulakov [ 27/Mar/18 ]

Please, take a look to this network dump of the incoming messages (on the our side):

10:16:19.846538 IP some_ip.port > our_ip.port: Flags [S.], seq 28365822, ack 2573233766, win 32000, options [mss 1380,nop,wscale 8], length 0
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  0030 0000 4000 3406 90c8 cfef 1bda ac14  [email protected].........
        0x0020:  1e22 6204 8259 01b0 d3fe 9960 6e66 7012  ."b..Y.....`nfp.
        0x0030:  7d00 8f83 0000 0204 0564 0103 0308       }........d....
10:16:20.885507 IP some_ip.port > our_ip.port: Flags [P.], seq 1:134, ack 103, win 4096, length 133
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  00ad 0000 4000 3406 904b cfef 1bda ac14  [email protected]......
        0x0020:  1e22 6204 8259 01b0 d3ff 9960 6ecc 5018  ."b..Y.....`n.P.
        0x0030:  1000 216b 0000 383d 4649 582e 342e 3201  ..!k..8=FIX.4.2.
        0x0040:  393d 3031 3039 0133 353d 3501 3334 3d38  9=0109.35=5.34=8
        0x0050:  3801 3439 3d4a 4c51 4401 3532 3d32 3031  8.49=JLQD.52=201
        0x0060:  3830 3332 362d 3134 3a31 363a 3230 2e38  80326-14:16:20.8
        0x0070:  3836 0135 363d 6678 7732 0137 3839 3d35  86.56=fxw2.789=5
        0x0080:  3101 3538 3d4d 7367 5365 714e 756d 2074  1.58=MsgSeqNum.t
        0x0090:  6f6f 206c 6f77 2c20 6578 7065 6374 696e  oo.low,.expectin
        0x00a0:  6720 3531 2062 7574 2072 6563 6569 7665  g.51.but.receive
        0x00b0:  6420 3101 3130 3d30 3735 01              d.1.10=075.
10:16:20.885543 IP some_ip.port > our_ip.port: Flags [R.], seq 134, ack 103, win 0, length 0
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  0028 0000 4000 3406 90d0 cfef 1bda ac14  .([email protected].........
        0x0020:  1e22 6204 8259 01b0 d484 9960 6ecc 5014  ."b..Y.....`n.P.
        0x0030:  0000 3711 0000 0000 0000 0000            ..7.........
10:16:22.887575 IP some_ip.port > our_ip.33372: Flags [S.], seq 4175399090, ack 3053759955, win 32000, options [mss 1380,nop,wscale 8], length 0
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  0030 0000 4000 3406 90c8 cfef 1bda ac14  [email protected].........
        0x0020:  1e22 6204 825c f8df 88b2 b604 add3 7012  ."b..\........p.
        0x0030:  7d00 878b 0000 0204 0564 0103 0308       }........d....
10:16:23.845393 IP some_ip.port > our_ip.33372: Flags [P.], seq 1:134, ack 103, win 4096, length 133
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  00ad 0000 4000 3406 904b cfef 1bda ac14  [email protected]......
        0x0020:  1e22 6204 825c f8df 88b3 b604 ae39 5018  ."b..\.......9P.
        0x0030:  1000 1b6f 0000 383d 4649 582e 342e 3201  ...o..8=FIX.4.2.
        0x0040:  393d 3031 3039 0133 353d 3501 3334 3d38  9=0109.35=5.34=8
        0x0050:  3901 3439 3d4a 4c51 4401 3532 3d32 3031  9.49=JLQD.52=201
        0x0060:  3830 3332 362d 3134 3a31 363a 3233 2e38  80326-14:16:23.8
        0x0070:  3436 0135 363d 6678 7732 0137 3839 3d35  46.56=fxw2.789=5
        0x0080:  3101 3538 3d4d 7367 5365 714e 756d 2074  1.58=MsgSeqNum.t
        0x0090:  6f6f 206c 6f77 2c20 6578 7065 6374 696e  oo.low,.expectin
        0x00a0:  6720 3531 2062 7574 2072 6563 6569 7665  g.51.but.receive
        0x00b0:  6420 3201 3130 3d30 3736 01              d.2.10=076.
10:16:23.845427 IP some_ip.port > our_ip.33372: Flags [R.], seq 134, ack 103, win 0, length 0
        0x0000:  000c 2909 731b 001f 9edf 3ebf 0800 4500  ..).s.....>...E.
        0x0010:  0028 0000 4000 3406 90d0 cfef 1bda ac14  .([email protected].........
        0x0020:  1e22 6204 825c f8df 8938 b604 ae39 5014  ."b..\...8...9P.
        0x0030:  0000 2f19 0000 0000 0000 0000            ../.........
...

As I see, there are constantly repeating 3 records:

  1. Probably, a connect message.
  2. Then, about one second later, the Logout message.
  3. Then, immediately, one more message, probably disconnect.

I think that the time between our and the counterparty sides is not so good synchronized to rely on this (in fact, we have some problems with time synchronization on the used machine).
I guess, the problem may be in the too little delay between receiving the Logout and the disconnect messages. Also, it can be in the using of non-standard tag 789. Of course, it can be in something else, but it exists - that is the reason because I wrote this in the issues tracker, not to bother you, of course.

Thanks and my apologize if I'm using not the best procedure of issues notifying,
Nikolai.

Comment by Christoph John [ 27/Mar/18 ]

Hi Nikolai,
there is no need to apologize. I just wanted to stress the fact that it is hard to fix issues that cannot be reproduced reliably. We would need some kind of reproducable unit test to look deeper into this issue. I must admit that I never heard of this problem before but of course this does not mean that it is not possible to happen. May be related to a couple of things.

Thanks,
Chris.

Generated at Sat Nov 23 01:03:22 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.