Capture file taken where network errors occured:
Quickfix mina opens ports 63188, 63201, 63216. The first 2 fail, but 63216 succeeds on the TCP level. There is no rst of fin to indicate the end of any of the 3 connections.
Quickfix/mina then opens another connection on 63226 (pck#33) , which gets rejected by CME as port 63216 is already opened successful (pck#37) and CME does not allow more than 1 connection to their gateway.
|
Christian, have you identified the source of all the TCP protocol errors in the packet capture? Like I said before, these types of low-level TCP protocol problems are below the level of QFJ, MINA, or even the JVM (assuming the JVM is using the O/S network stack). In some cases (like "TCP checksum error") it might not really be an error but rather an artifact of interactions between the network interface, the operating system, and the packet capture software (you can Google for "ethereal tcp checksum error"). However, there are other potentially serious errors in the packet capture. Do you have an explanation for those or why you are having network instability problems? Some possibilities for the TCP errors are an operating system bug, bad network interface, incorrectly operating router, incorrect firewall configuration, and so on. The network instability you describe might be aggravating a bug or incorrect configuration in some part of the networking hardware/software. However, I'm not an expert on low-level TCP protocol issues. I do know that QFJ and MINA do have any direct control over whether FIN and RST messages are sent at the TCP level.
Here are some potentially related resources...
http://www.developerweb.net/forum/showthread.php?t=2941
http://www.ethereal.com/lists/ethereal-dev/200406/msg00090.html
|
Hi, sorry, you can ignore the checksum problems. Those are because we haven't got the capacity to capture full packets for an extended period. We only capture the first few bytes of any packet. The more interesting thing is which ports are opened and which ones are properly closed with FIN or RST.
I agree that this is not likely to be a quickfix problem, but very likely to be either the MINA lib or even the JVM. Can you find out whether this is a known issue with MINA ? They have made some changes, but I don't know whether they are related to our problem. I can log the problem on the MINA site, but I don't know how quickfix uses MINA in detail. If they say it is fixed, would you be able to build a new quickfix lib, based on Mina 1.0?
|