[QFJ-892] Deadlock : quickfix.SessionState.setLastSentTime(SessionState.java:145) Created: 16/May/16  Updated: 04/Oct/17  Resolved: 03/Oct/17

Status: Closed
Project: QuickFIX/J
Component/s: None
Affects Version/s: 1.5.3
Fix Version/s: None

Type: Bug Priority: Default
Reporter: Yoni Touitou Assignee: Christoph John
Resolution: Not a bug Votes: 0
Labels: deadlock
Environment:

Linux



 Description   

Hi team,

We have an production issue because of deadlock in quickfixj.
I searched in your JIRA and found this ticket : QFJ-645 that is very similar issue but i did not understand the solution.
Can someone advise ?

2016-05-13 13:49:07,162 [FrequentSched-1] ERROR TaUtils detectDeadLocks - DeadLock detected "TA-CRX-JPMC-WL_trading" Id=447 WAITING on java.util.concurrent.locks.ReentrantLock$NonfairSync@366414ce owned by "TMS2InFCStoneUS_streaming" Id=364
at sun.misc.Unsafe.park(Native Method)
waiting on java.util.concurrent.locks.ReentrantLock$NonfairSync@366414ce
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
at quickfix.SessionState.lockSenderMsgSeqNum(SessionState.java:338)
...
2016-05-13 13:49:07,162 [FrequentSched-1] ERROR TaUtils detectDeadLocks - DeadLock detected "TMS2InFCStoneUS_streaming" Id=364 BLOCKED on quickfix.Session@2f7fe578 owned by "TA-CRX-JPMC-WL_trading" Id=447
at quickfix.SessionState.setLastSentTime(SessionState.java:145)
blocked on quickfix.Session@2f7fe578
at quickfix.Session.initializeHeader(Session.java:680)
at quickfix.Session.sendRaw(Session.java:2304)
at quickfix.Session.send(Session.java:2402)
at quickfix.Session.sendToTarget(Session.java:636)
at com.tradair.tnet.driver.fix.client.FixClient.sendRequest(FixClient.java:462)
at com.tradair.tnet.driver.Driver.handleMessages(Driver.java:408)
at com.tradair.tnet.driver.Driver.run(Driver.java:299)
...



 Comments   
Comment by Christoph John [ 13/Apr/17 ]

I think we cannot do anything about it without an exact description how to reproduce or a unit test. Did you try to send a message to the same Session from two different threads??

Comment by Yoni Touitou [ 01/Oct/17 ]

Hi Christoph,

Sorry for the delay.

We are disconnecting the trading session when the streaming session is disconnecting.
Is it a problem ? is it a bad practice ?

If it's a bad practive, we would like to disconnect all the sessions of a user if one of them is disconnected ? What is the correct way to do this ?

This is our current code for disconnect all the sessions of user called in the onLogout method in the Application interface

Session streaming = sessionData.getStreamingSession();
		Session trading = sessionData.getTradingSession();
		logger.info("--> logout Streaming....");
		sendLogout(streaming, reason);

		logger.info("--> logout Trading....");
		sendLogout(trading, reason);
		
		sleep(1000);
		logger.info("--> disconnect Streaming....");
		try {
			if (streaming != null) {
				streaming.disconnect(reason, true);
			}
		} catch(IOException ex) {
			logger.error("error while disconnect streaming", ex);
		}

		logger.info("--> disconnect trading....");
		try {
			if (trading != null) {
				trading.disconnect(reason, true);
			}
		} catch(IOException ex){
			logger.error("error while disconnect trading", ex);
		}
Comment by Christoph John [ 01/Oct/17 ]

The "Session" that you are getting via getStreamingSession() or getTradingSession() is a quickfix.Session, right??
What does the code of the sendLogout() method that you are calling look like?

I noticed that you are using a rather old version of QFJ (1.5.3). At least that is what you entered into the issue description. There were some concurrency fixes done since that version, e.g. QFJ-738 (which looks a little like the problem you are having) or QFJ-790. Maybe you could try a more recent version. 1.6.4 is the current version.

Apart from that I would not suggest to call Session.disconnect() from different threads and also to not send any Session-level messages by yourself (e.g. Logon, Logout).
Actually it should be enough if you called Session.logout(). That should trigger a Logout and disconnect of the FIX session. If you want to re-enable the Session then just call Session.logon().

Hope that helps,
Chris.

Comment by Yoni Touitou [ 02/Oct/17 ]

Hi Christophe,
Yes this is a QuickFix session instance.

The method sendLogout() above is called from the onLogout method from Quickfix.Application interface

@Override
	public void onLogout(SessionID arg0) {
		Thread.currentThread().setName(arg0.getTargetCompID());
		fixApi.handleUserDisconnect(arg0);
	}

the fixApi.handleUserDisconnect() disconnect the user in our system and disconnect all the sessions for the current user (Streaming and trading).
As you mentionned, i send a logout message to the streaming session via the trading thread. Is it a problem ?

Thanks for your assistance

Comment by Christoph John [ 03/Oct/17 ]

Umm, yes, as I was suggesting in my earlier comment: you shouldn't call disconnect() from a separate thread. Actually, you shouldn't call it at all and leave the whole session management / connect / disconnect stuff to QFJ. Please use Session.logon() and logout().

Chris

Comment by Yoni Touitou [ 04/Oct/17 ]

Thanks christoph,

So to clarify, if i must disconnect the streaming session via the trading thread, I need to remove the disconnect method call and using the logout method only ?
If i understand you think that the deadlock is caused by the disconnect method call from another thread ?

THanks

Comment by Christoph John [ 04/Oct/17 ]

Yes, as I was suggesting in my earlier two comments you should remove the disconnect() call and use the logout() method only. If you got an Initiator, do not forget to call logon() afterwards, otherwise it will not try to reconnect. If you got an Acceptor, the you do not need to call logon() afterwards.

The deadlock is probably caused by either the call of the disconnect() method from another thread or the call to sendLogout() while another message is still trying to be sent/received.

If you have further questions please direct them to the mailing list, since JIRA is the bug tracker. Thank you.
https://lists.sourceforge.net/lists/listinfo/quickfixj-users

Regards,
Chris.

Generated at Sat Nov 23 01:22:23 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.