[QFJ-923] FileStore is leaking file handles Created: 20/Apr/17  Updated: 27/Jul/17  Resolved: 19/Jul/17

Status: Closed
Project: QuickFIX/J
Component/s: Engine
Affects Version/s: 1.6.2, 1.6.3
Fix Version/s: 1.6.4

Type: Bug Priority: Critical
Reporter: Constantin Florescu Assignee: Christoph John
Resolution: Fixed Votes: 0
Labels: Reconnect, logon, timeout
Environment:

All


Issue Links:
Relates
relates to QFJ-552 Message Stores expected to be thread ... Open
relates to QFJ-762 Message stores can become corrupted o... Open

 Description   

The FileStore like all the other stores are not synchronized.
The access to this store, at times, can be done by 2 threads concurrently.

When a session is about to be estabilished, logon request is sent to the server, if the request is not answered by the server in within the timeout period the connection is closed and the reconnect timer starts.

The problem is that when the Timeout Timer [QFJ Timer] fires it also releases [QFJ Message Processor] thread which tries to process EndOfFille/Stream event, both threads will try to reset the MessageStore on next().

During the reset() the FileStore is (re)initialized which causes the following actions: close existing files, open the files, read stores.

When done in parallel, one thread may try to open or read while the other is closing the files resulting in:
1. Read corrupted data from message stores
2. Leak file handles

Some proof:
16:18:28.763 [QFJ Timer] INFO quickfix.FileStore - initialize() called from.
at quickfix.FileStore.initialize(FileStore.java:111) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.FileStore.reset(FileStore.java:465) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.SessionState.reset(SessionState.java:382) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.resetState(Session.java:2503) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.disconnect(Session.java:1968) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.next(Session.java:1808) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.mina.SessionConnector$SessionTimerTask.run(SessionConnector.java:278) [quickfixj-core-1.6.2.jar:1.6.2]

And from:
16:18:28.788 [QFJ Message Processor] INFO quickfix.FileStore - initialize() called from.
at quickfix.FileStore.initialize(FileStore.java:111) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.FileStore.reset(FileStore.java:465) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.SessionState.reset(SessionState.java:382) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.resetState(Session.java:2503) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.disconnect(Session.java:1968) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.next(Session.java:882) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.next(Session.java:1109) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.mina.SingleThreadedEventHandlingStrategy$SessionMessageEvent.processMessage(SingleThreadedEventHandlingStrategy.java:144) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.mina.SingleThreadedEventHandlingStrategy.block(SingleThreadedEventHandlingStrategy.java:91) [quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.mina.SingleThreadedEventHandlingStrategy$1.run(SingleThreadedEventHandlingStrategy.java:125) [quickfixj-core-1.6.2.jar:1.6.2]

Out of file handles:
ERROR [QFJ Timer] quickfix.SocketInitiator Error during timer processing
quickfix.RuntimeError: java.io.FileNotFoundException: [/..../Session].header (Too many open files)
at quickfix.SessionState.reset(SessionState.java:384) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.resetState(Session.java:2499) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.disconnect(Session.java:1967) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.Session.next(Session.java:1808) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.mina.SessionConnector$SessionTimerTask.run(SessionConnector.java:278) [quickfixj-core-1.6.2.jar:1.6.2]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_74]
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) [?:1.8.0_74]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_74]
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_74]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_74]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_74]
Caused by: java.io.FileNotFoundException: [/.../Session].header (Too many open files)
at java.io.FileOutputStream.open0(Native Method) ~[?:1.8.0_74]
at java.io.FileOutputStream.open(FileOutputStream.java:270) ~[?:1.8.0_74]
at java.io.FileOutputStream.<init>(FileOutputStream.java:213) ~[?:1.8.0_74]
at java.io.FileOutputStream.<init>(FileOutputStream.java:133) ~[?:1.8.0_74]
at quickfix.FileStore.initializeMessageIndex(FileStore.java:198) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.FileStore.initializeCache(FileStore.java:121) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.FileStore.initialize(FileStore.java:116) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.FileStore.reset(FileStore.java:442) ~[quickfixj-core-1.6.2.jar:1.6.2]
at quickfix.SessionState.reset(SessionState.java:382) ~[quickfixj-core-1.6.2.jar:1.6.2]
... 11 more



 Comments   
Comment by Constantin Florescu [ 20/Apr/17 ]

It is related to:
http://www.quickfixj.org/jira/browse/QFJ-762
http://www.quickfixj.org/jira/browse/QFJ-552

Comment by Christoph John [ 21/Apr/17 ]

Hi, I'm wondering if this specific issue (leaking file handles) can be fixed like QFJ-645, i.e. only allowing one reset at a time. Of course, this is no solution for the missing synchronization.
But in the next release the synchronization should be introduced anyway, see https://github.com/quickfix-j/quickfixj/pull/75

Comment by Constantin Florescu [ 24/Apr/17 ]

Unfortunately QFJ-645 won't solve my problem.

The synchronized function reset() in class Session is not called in my case. (See stack traces)
In my case resetState() should be synchronized to avoid the problem.

Comment by Christoph John [ 24/Apr/17 ]

I know that QFJ-645 won't solve your problem. I was just trying to suggest that QFJ-923 could be fixed in a similar way (NB: without synchronization).

Comment by Constantin Florescu [ 24/Apr/17 ]

Sorry, I misunderstood you. "can be fixed like QFJ-645". Ofcourse it would.
BTW, compareAndSet is a way to synchronize, but let's not get into philosophical discussions.

The following updated resetState() method fixes my problem. I just hope there are no other places where the "Stores" are accessed concurrently.

    private void resetState() {
        if (!isResettingState.compareAndSet(false, true)) {
            return;
        }
        try {
            state.reset();
            stateListener.onReset();
        } finally {
            isResettingState.set(false);
        }
    }
Comment by Christoph John [ 24/Apr/17 ]

OK, I meant literally the keyword "synchronize"
As said, there is a pull request that should address the synchronization of the filestore at least.

Comment by Constantin Florescu [ 24/Apr/17 ]

For me it is not urgent because I have a work around.
It was for others to let them know there is an issue and what to do.

About the aforementioned pull request #75, it contains only one file, FileLog, however the issue is in FileStore (MessageStore implementors in general).

Is this the right pull request?

Comment by Christoph John [ 24/Apr/17 ]

Sorry, you are completely right. I did not remember correctly that the pull request is only about the FileLog.
Is your workaround to use a custom message store with synchronized methods?

Comment by Constantin Florescu [ 24/Apr/17 ]

No. It is just synchronizing the resetState as above.

The downside is that I do not know if that is the only concurrency issue when accessing a MessageStore.
The upside is that the MessageStore is mostly unsynchronized which means no performance hit.

Generated at Sat Nov 23 07:35:02 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.