[QFJ-916] Reconnection feature does not work under some circumstances Created: 18/Feb/17  Updated: 17/Oct/18  Resolved: 17/Oct/18

Status: Closed
Project: QuickFIX/J
Component/s: Engine, Networking
Affects Version/s: 1.6.3
Fix Version/s: 2.1.0

Type: Bug Priority: Default
Reporter: Arsentii Nerushev Assignee: Unassigned
Resolution: Fixed Votes: 0
Labels: QuickfixJ, Reconnect
Environment:

Java 1.8.0u66 x64, Windows 8.1, no network proxy


Issue Links:
Relates
relates to QFJ-942 Connectors sometimes hanging in call ... Closed

 Description   

Had an application running from morning but on the evening checkup I observed strange behavior of the reconnection feature. I believe there was some connectivity issue, the application missed a heartbeat so the session had been disconnected and logged out. I don't really understand what happened next, but it seems the connection itself (with the remote host) was restored but the messages weren't going anywhere as the session's state wasn't 'logged on' at the moment, and the Engine wasn't trying to log the session on again.
I tried to reproduce the issue by manually closing the connection slightly before the heartbeat check, but the Engine had actually reset the connection after I opened it again seeing the disconnection message.

I've got some logs from both cases (see below). reconnectionInterval = 30.
I've noticed that in the second ("OK") case there is a line

"d.q.m.i.IoSessionInitiator [QFJ Timer] - [...] - reset IoConnector"

whereas there's none such in the first case, but a couple of hours of digging the quickfix/j's sources haven't brought me anywhere. The line appears when the reconnectionTask (which runs at a schedule with 1 second interval) of quickfix.mina.initiator.IoSessionInitiator is at the 'resetIoConnector()' method and the condition

ioSession != null && Boolean.TRUE.equals(ioSession.getAttribute("QFJ_RESET_IO_CONNECTOR"))

is true by the moment. So, either ioSession is null or its "reset" attribute is not set to true. The 'disconnect' method of quickfix.Session has a line 'responder.disconnect();' which leads us to quickfix.mina.IoSessionResponder's lines:

ioSession.closeOnFlush();
ioSession.setAttribute("QFJ_RESET_IO_CONNECTOR", Boolean.TRUE);

So, the attribute is definitely set, but as for the nullifying the ioSession's reference I can't really say anything. All I know is that the ioSession's reference is being read & written in quickfix.mina.initiator.IoSessionInitiator :: resetIoConnector() method at these two possible sections:

private void pollConnectFuture() {
            try {
                connectFuture.awaitUninterruptibly(CONNECT_POLL_TIMEOUT);
                if (connectFuture.getSession() != null) { // probably true
                    ioSession = connectFuture.getSession();
                    connectionFailureCount = 0;
                    lastConnectTime = System.currentTimeMillis();
                    connectFuture = null;
                } else {
                    fixSession.getLog().onEvent(
                            "Pending connection not established after "
                                    + (System.currentTimeMillis() - lastReconnectAttemptTime)
                                    + " ms.");
                }
            } catch (Throwable e) {
                handleConnectException(e);
            }
        }
}

and

        private void resetIoConnector() {
            if (ioSession != null && Boolean.TRUE.equals(ioSession.getAttribute("QFJ_RESET_IO_CONNECTOR"))) {
                try {
                    setupIoConnector();
                    log.info("[" + fixSession.getSessionID() + "] - reset IoConnector");
                    if (connectFuture != null) {
                        connectFuture.cancel();
                    }
                    connectFuture = null;
                    ioSession = null;
                } catch (Throwable e) {
                    log.error("[" + fixSession.getSessionID() + "] - Exception during resetIoConnector call", e);
                }
            }
        }

The latter one is definitely not the case as this method have never been invoked in the issue (we don't see the

"d.q.m.i.IoSessionInitiator [QFJ Timer] - [...] - reset IoConnector"

line in the log).

It's also unlikely that ioSession is set to null in the former section, as the value comes from connectFuture.getSession() which, if being null, would lead to logging of a "Pending connection not established" line. I thought maybe connectFuture.getSession() at some circumstances returns another IoSession object reference, for which there's no "QFJ_RESET_IO_CONNECTOR" attribute set, but it would be too weird for MINA. So, it's likely not the case, and the only reason the resetIoConnector()'s check fails is that the reconnectionTask's run() method is no more being invoked by ScheduledExecutor.

ioSession have also never been closed as it would fire the public void sessionClosed(IoSession ioSession) callback in quickfix.mina.AbstractIoHandler, which in our case it wouldn't, even though quickfix.Session :: disconnect -> quickfix.mina.IoSessionResponder :: disconnect ->
ioSession.closeOnFlush() has been invoked.

I know that a bug is hard to fix when you can't reproduce the issue, so I've instrumented the quickfix/j source code in some places, and if this issue comes again, there will be more information available.

Here's a chunk of the log from the issue:

18:05:08.608 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=60 35=0 34=987 49=mySenderCompID 52=20170216-15:04:44.955 56=FG 10=245 
18:05:08.608 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=61 35=0 49=FG 56=mySenderCompID 34=1031 52=20170216-15:05:02.450 10=004 
18:05:48.648 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=82 35=0 34=988 49=mySenderCompID 52=20170216-15:05:42.889 56=FG 112=20170216-15:03:21 10=049 
18:06:42.007 ERROR quickfixj.errorEvent [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Disconnecting: Timed out waiting for heartbeat
18:07:02.056 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=1 34=989 49=mySenderCompID 52=20170216-15:06:18.777 56=FG 112=TEST 10=024 
18:07:07.852 INFO  c.r.c.c.f.m.s.SessionManager [QFJ Message Processor]: the session has been logged out.
18:07:07.852 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: No responder, not sending message: 8=FIX.4.4 9=69 35=1 34=989 49=mySenderCompID 52=20170216-15:06:18.777 56=FG 112=TEST 10=024 
18:07:24.587 INFO  quickfixj.event [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: Already disconnected: Socket exception (/there-was-exchange-ip:port): java.io.IOException: Software caused connection abort
18:07:36.418 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: Sent test request TEST
18:10:45.383 ERROR quickfixj.errorEvent [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: quickfix.SessionException Logon state is not valid for message (MsgType=0)
18:11:53.890 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=0)
18:15:08.548 ERROR quickfixj.errorEvent [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: quickfix.SessionException Logon state is not valid for message (MsgType=1)
18:20:24.634 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=1)
18:27:38.224 ERROR quickfixj.errorEvent [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: quickfix.SessionException Logon state is not valid for message (MsgType=0)
18:29:40.972 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Already disconnected: Verifying message failed: quickfix.SessionException: Logon state is not valid for message (MsgType=0)

Here's a chunk of the log from the case when I manually switched off the connection:

00:57:35.829 INFO  q.DefaultSessionSchedule [main] - [FIX.4.4:mySenderCompID->FG] daily, 07:00:00-UTC - 07:00:00-UTC
00:57:35.858 INFO  quickfixj.event [main] - FIX.4.4:mySenderCompID->FG: Session FIX.4.4:mySenderCompID->FG schedule is daily, 07:00:00-UTC - 07:00:00-UTC
00:57:35.858 INFO  quickfixj.event [main] - FIX.4.4:mySenderCompID->FG: Created session: FIX.4.4:mySenderCompID->FG
00:57:35.858 INFO  c.r.c.c.f.m.s.SessionManager [main]: the session has been created.
00:57:35.865 INFO  q.mina.NetworkingOptions [main] - Socket option: SocketTcpNoDelay=true
00:57:35.870 INFO  q.mina.NetworkingOptions [main] - Socket option: SocketSynchronousWrites=false
00:57:35.870 INFO  q.mina.NetworkingOptions [main] - Socket option: SocketSynchronousWriteTimeout=30000
00:57:35.939 INFO  d.q.m.i.IoSessionInitiator [main] - [FIX.4.4:mySenderCompID->FG] [/there-was-exchange-ip:port]
00:57:35.940 INFO  quickfix.SocketInitiator [main] - SessionTimer started
00:57:35.943 INFO  quickfix.SocketInitiator [QFJ Message Processor] - Started QFJ Message Processor
00:57:35.988 INFO  q.m.i.InitiatorIoHandler [NioProcessor-2] - MINA session created for FIX.4.4:mySenderCompID->FG: local=/there-was-my-ip:60450, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/there-was-exchange-ip:port
00:57:36.975 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=71 35=A 34=35 49=mySenderCompID 52=20170218-21:57:36.965 56=FG 98=0 108=30 10=234 
00:57:36.980 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: Initiated logon request
00:57:37.003 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-2] - Processing a MESSAGE_RECEIVED for session 1
00:57:37.006 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=71 35=A 49=FG 56=mySenderCompID 34=35 52=20170218-21:58:55.258 98=0 108=30 10=231 
00:57:37.007 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 49=FG 56=mySenderCompID 34=36 52=20170218-21:58:55.258 7=33 16=0 10=119 
00:57:37.016 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received logon
00:57:37.016 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: MsgSeqNum too high, expecting 34 but received 35: 8=FIX.4.4 9=71 35=A 34=35 49=FG 52=20170218-21:58:55.258 56=mySenderCompID 98=0 108=30 10=231 
00:57:37.016 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Enqueued at pos 35: 8=FIX.4.4 9=71 35=A 34=35 49=FG 52=20170218-21:58:55.258 56=mySenderCompID 98=0 108=30 10=231 
00:57:37.017 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 34=36 49=mySenderCompID 52=20170218-21:57:37.017 56=FG 7=34 16=0 10=112 
00:57:37.017 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent ResendRequest FROM: 34 TO: 34
00:57:37.017 INFO  c.r.c.c.f.m.s.SessionManager [QFJ Message Processor]: the session has been logged on.
00:57:37.025 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Enqueued at pos 36: 8=FIX.4.4 9=69 35=2 34=36 49=FG 52=20170218-21:58:55.258 56=mySenderCompID 7=33 16=0 10=119 
00:57:37.025 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received ResendRequest FROM: 33 TO: infinity
00:57:37.028 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=102 35=4 34=33 43=Y 49=mySenderCompID 52=20170218-21:57:37.027 56=FG 122=20170218-21:57:37.027 36=37 123=Y 10=040 
00:57:37.028 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent SequenceReset TO: 37
00:57:37.033 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-2] - Processing a MESSAGE_RECEIVED for session 1
00:57:37.033 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=98 35=4 49=FG 56=mySenderCompID 34=34 43=Y 52=20170218-21:58:55.294 122=20170218-21:58:55 123=Y 36=37 10=072 
00:57:37.033 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 49=FG 56=mySenderCompID 34=37 52=20170218-21:58:55.294 7=33 16=0 10=120 
00:57:37.034 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: ResendRequest for messages FROM 34 TO 34 has been satisfied.
00:57:37.034 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received SequenceReset FROM: 34 TO: 37
00:57:37.034 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received ResendRequest FROM: 33 TO: infinity
00:57:37.035 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=102 35=4 34=33 43=Y 49=mySenderCompID 52=20170218-21:57:37.035 56=FG 122=20170218-21:57:37.035 36=37 123=Y 10=038 
00:57:37.035 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent SequenceReset TO: 37
00:57:37.099 INFO  OUTGOING [FIX Gate Mail Thread] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=139 35=D 34=37 49=mySenderCompID 52=20170218-21:57:37.097 56=FG 1=U51 11=0 38=1 40=2 44=17000 54=1 55=SRH7 59=0 60=20170219-00:57:37.096 1300=F 10=244 
00:57:37.125 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-2] - Processing a MESSAGE_RECEIVED for session 1
00:57:37.125 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=231 35=8 49=FG 56=mySenderCompID 34=38 52=20170218-21:58:55.377 37=0 198=F:0 526=$01$ 11=0 17=3617265603 150=8 39=8 103=99 55=SRH7 54=1 38=1 40=2 151=1 14=0 6=0 60=20170218-21:58:55.376 58=3;Session inactive. 20018=[51000-3617265603-0] 10=090 
00:57:42.123 DEBUG c.r.c.c.f.m.s.SessionManager [FIX Gate Mail Thread]: done processing an outgoing trading message.
00:57:42.131 INFO  OUTGOING [FIX Gate Mail Thread] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=117 35=q 34=38 49=mySenderCompID 52=20170218-21:57:42.122 56=FG 1=U51 11=1F 55=SRH7 60=20170219-00:57:42.122 530=1 1300=F 10=079 
00:57:42.149 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-2] - Processing a MESSAGE_RECEIVED for session 1
00:57:42.149 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=186 35=r 49=FG 56=mySenderCompID 34=39 52=20170218-21:59:00.409 11=1F 37=exec-201702181824434 530=1 531=1 533=0 58=0;Operation successful. 60=20170218-21:59:00.408 20018=[51000-3617265614-0] 10=123 
00:58:11.992 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-2] - Processing a MESSAGE_RECEIVED for session 1
00:58:11.992 INFO  INCOMING [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=59 35=0 49=FG 56=mySenderCompID 34=40 52=20170218-21:59:30.252 10=179 
00:58:12.948 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=59 35=0 34=39 49=mySenderCompID 52=20170218-21:58:12.941 56=FG 10=191 
00:58:42.948 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=59 35=0 34=40 49=mySenderCompID 52=20170218-21:58:42.941 56=FG 10=186 
00:58:57.948 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=68 35=1 34=41 49=mySenderCompID 52=20170218-21:58:57.941 56=FG 112=TEST 10=212 
00:58:57.949 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: Sent test request TEST
00:59:01.895 ERROR quickfixj.errorEvent [NioProcessor-2] - FIX.4.4:mySenderCompID->FG: Disconnecting: Socket exception (/there-was-exchange-ip:port): java.io.IOException: Connection reset by peer
00:59:01.903 INFO  c.r.c.c.f.m.s.SessionManager [NioProcessor-2]: the session has been logged out.
00:59:02.563 INFO  d.q.m.i.IoSessionInitiator [QFJ Timer] - [FIX.4.4:mySenderCompID->FG] - reset IoConnector
00:59:02.565 ERROR quickfixj.errorEvent [QFJ Timer] - FIX.4.4:mySenderCompID->FG: java.net.NoRouteToHostException during connection to /there-was-exchange-ip:port: java.net.NoRouteToHostException: No route to host: no further information (Next retry in 30000 milliseconds)
00:59:32.713 INFO  q.m.i.InitiatorIoHandler [NioProcessor-12] - MINA session created for FIX.4.4:mySenderCompID->FG: local=/there-was-my-ip:60459, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/there-was-exchange-ip:port
00:59:32.941 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=71 35=A 34=42 49=mySenderCompID 52=20170218-21:59:32.941 56=FG 98=0 108=30 10=224 
00:59:32.949 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: Initiated logon request
00:59:32.984 INFO  quickfix.Session [QFJ Message Processor] - [FIX.4.4:mySenderCompID->FG] Disconnecting: Encountered END_OF_STREAM
00:59:32.992 INFO  c.r.c.c.f.m.s.SessionManager [QFJ Message Processor]: the session has been logged out.
00:59:33.724 INFO  d.q.m.i.IoSessionInitiator [QFJ Timer] - [FIX.4.4:mySenderCompID->FG] - reset IoConnector
01:00:02.879 INFO  q.m.i.InitiatorIoHandler [NioProcessor-22] - MINA session created for FIX.4.4:mySenderCompID->FG: local=/there-was-my-ip:60480, class org.apache.mina.transport.socket.nio.NioSocketSession, remote=/there-was-exchange-ip:port
01:00:02.941 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=71 35=A 34=43 49=mySenderCompID 52=20170218-22:00:02.941 56=FG 98=0 108=30 10=209 
01:00:02.952 INFO  quickfixj.event [QFJ Timer] - FIX.4.4:mySenderCompID->FG: Initiated logon request
01:00:02.981 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-22] - Processing a MESSAGE_RECEIVED for session 3
01:00:02.991 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=71 35=A 49=FG 56=mySenderCompID 34=43 52=20170218-22:01:21.219 98=0 108=30 10=209 
01:00:02.992 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 49=FG 56=mySenderCompID 34=44 52=20170218-22:01:21.219 7=40 16=0 10=095 
01:00:02.992 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received logon
01:00:03.002 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: MsgSeqNum too high, expecting 41 but received 43: 8=FIX.4.4 9=71 35=A 34=43 49=FG 52=20170218-22:01:21.219 56=mySenderCompID 98=0 108=30 10=209 
01:00:03.002 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Enqueued at pos 43: 8=FIX.4.4 9=71 35=A 34=43 49=FG 52=20170218-22:01:21.219 56=mySenderCompID 98=0 108=30 10=209 
01:00:03.003 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 34=44 49=mySenderCompID 52=20170218-22:00:03.003 56=FG 7=41 16=0 10=086 
01:00:03.003 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent ResendRequest FROM: 41 TO: 42
01:00:03.003 INFO  c.r.c.c.f.m.s.SessionManager [QFJ Message Processor]: the session has been logged on.
01:00:03.003 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Enqueued at pos 44: 8=FIX.4.4 9=69 35=2 34=44 49=FG 52=20170218-22:01:21.219 56=mySenderCompID 7=40 16=0 10=095 
01:00:03.013 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received ResendRequest FROM: 40 TO: infinity
01:00:03.014 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=102 35=4 34=40 43=Y 49=mySenderCompID 52=20170218-22:00:03.014 56=FG 122=20170218-22:00:03.014 36=45 123=Y 10=249 
01:00:03.014 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent SequenceReset TO: 45
01:00:03.022 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-22] - Processing a MESSAGE_RECEIVED for session 3
01:00:03.022 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=98 35=4 49=FG 56=mySenderCompID 34=41 43=Y 52=20170218-22:01:21.270 122=20170218-22:01:21 123=Y 36=45 10=027 
01:00:03.022 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=2 49=FG 56=mySenderCompID 34=45 52=20170218-22:01:21.270 7=40 16=0 10=093 
01:00:03.022 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received SequenceReset FROM: 41 TO: 45
01:00:03.023 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Received ResendRequest FROM: 40 TO: infinity
01:00:03.023 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=102 35=4 34=40 43=Y 49=mySenderCompID 52=20170218-22:00:03.023 56=FG 122=20170218-22:00:03.023 36=45 123=Y 10=249 
01:00:03.023 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: Sent SequenceReset TO: 45
01:00:33.049 DEBUG o.a.m.f.c.ProtocolCodecFilter [NioProcessor-22] - Processing a MESSAGE_RECEIVED for session 3
01:00:33.049 INFO  INCOMING [NioProcessor-22] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=59 35=0 49=FG 56=mySenderCompID 34=46 52=20170218-22:01:51.258 10=182 
01:00:33.060 INFO  quickfixj.event [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: ResendRequest for messages FROM 41 TO 42 has been satisfied.
01:00:33.070 INFO  OUTGOING [QFJ Message Processor] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=59 35=0 34=45 49=mySenderCompID 52=20170218-22:00:33.070 56=FG 10=172 


 Comments   
Comment by Christoph John [ 20/Feb/17 ]

Hi Arsentii Nerushev,

thanks for the detailed description. I think it would also be sensible to have some stack traces when the issue occurs the next time.
One question regarding the chunk of log lines when the issue occurred: there were no more messages than what you pasted above? I.e. there was nothing going on except for the "Logon state is not valid" message every few minutes? To me it looks like the connection went down every couple of minutes.

Thanks,
Chris.

Comment by Arsentii Nerushev [ 20/Feb/17 ]

@Christoph John
No, there were no more messages besides those ones. I logged in on the machine around 6:20 PM to check how my app's doing and found this on the log. Maybe there was no connection to the remote FIX gate at the moment – sadly, I don't remember and I didn't restart the app to check it out. The Internet connection with the machine was definitely OK, as I could connect to it remotely.
So, I would understand if the log contained some lines about a broken connection and the Engine's heroic attempts to restore it ASAP, but, alas, – there were none.
If you have some recommendations about where I should place extra logging – I'm all ears.

Comment by Christoph John [ 21/Feb/17 ]

Hi Arsentii Nerushev,

what did you do to correct the issue in the problematic case? Did you have to restart the whole application? What are the memory settings for your application and how much memory do you have available?
It is just wild guesswork but from looking at the logs it seems to me that there might be something fishy with the whole java process. Maybe some heavy GC was going on, or the OS tried to swap something out to disk which took very long, or that a thread was completely stuck. The following message supports this assumption:

18:07:02.056 INFO  OUTGOING [QFJ Timer] - FIX.4.4:mySenderCompID->FG: 8=FIX.4.4 9=69 35=1 34=989 49=mySenderCompID 52=20170216-15:06:18.777 56=FG 112=TEST 10=024 

The log time stamp is 18:07:02 but the sending time on the message is 15:06:18, so almost a minute earlier. (I assume you are UTC+3)

When the problem occurs again I would like to take a look at the following:

  • several stack dumps with jstack <pid>
  • also output of jmap -heap <pid>
  • and output of jstat -gcutil <pid> 1000 10

Edit: I would also like to know how much memory, disk and swap is free at that moment. I don't know the commands for this on Windows, but I suppose you do.

Thanks,
Chris.

Comment by Christoph John [ 17/Oct/18 ]

I think this has been fixed with QFJ-942. Although I cannot tell without a stack trace.
Feel free to re-open when you have new information.

Generated at Sat Nov 23 01:30:55 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.