[QFJ-956] Checksum validation of incoming messages is not correct in some cases Created: 04/Oct/18 Updated: 24/Nov/18 Resolved: 24/Nov/18 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | None |
Affects Version/s: | 2.1.0 |
Fix Version/s: | None |
Type: | Bug | Priority: | Default |
Reporter: | Valery Fadeev | Assignee: | Unassigned |
Resolution: | Duplicate | Votes: | 0 |
Labels: | None |
Issue Links: |
|
Description |
This code validates the checksum of incoming messages: Message.java: private void validateCheckSum(*String* messageData) throws InvalidMessage { try { // Body length is checked at the protocol layer final int checksum = trailer.getInt(CheckSum.FIELD); if (checksum != *MessageUtils.checksum(messageData)*) { // message will be ignored if checksum is wrong or missing throw MessageUtils.newInvalidMessageException("Expected CheckSum=" + MessageUtils.checksum(messageData) + ", Received CheckSum=" + checksum + " in " + messageData, this); } } catch (final FieldNotFound e) { throw MessageUtils.newInvalidMessageException("Field not found: " + e.field + " in " + messageData, this); } } And in MessageUtils calculation ends up here: public static int checksum(*Charset charset*, String data, boolean isEntireMessage) { if (CharsetSupport.isStringEquivalent(charset)) { // optimization - skip charset encoding int sum = 0; int end = isEntireMessage ? data.lastIndexOf("\00110=") : -1; int len = end > -1 ? end + 1 : data.length(); for (int i = 0; i < len; i++) { sum += data.charAt(i); } return sum & 0xFF; // better than sum % 256 since it avoids overflow issues } return checksum(*data.getBytes(charset)*, isEntireMessage); } So the problem here is that calculation happens NOT on raw network bytes but on a java String that is later transformed to bytes using CharsetSuport's charset, which can be not the same what original message producer used, and produce different bytes. So in practice this can cause to situation where message with correct checksum will be ignored because was read by QFJ using incorrect charset. Looks like I'm witnessing this situation. Use case is to use non-unicode symbol in message and different charsets on producer-consumer sides. |
Comments |
Comment by Christoph John [ 08/Oct/18 ] |
Hmm, I am not 100% sure about this but IIRC the only encoding that has to be supported by FIX is ISO-8859-1. But not sure to be honest. Cheers, |
Comment by amichair [ 22/Nov/18 ] |
iirc the proper way is to support encoded fields, which many engines don't (including QFJ, I believe there's an issue open for that somewhere). Treating full messages as being strings in a different charset, and setting this charset globally, is not by the spec, but it's a practical workaround that is used in several engines (including QFJ). You have to check with your counter-party what charset/encoding they use and make sure they also apply it globally to the full messages on their side, and set it in CharsetSupport on your side, and that should work. If they do use individual encoded fields, and you find yourself contributing an implementation of it for QFJ, that would be great btw the reason the global workaround works is because the other fields and the FIX messages themselves (tag numbers, delimiters etc.) are all in ASCII, and nearly all charset encodings are backwards-compatible with ASCII, so they remain intact during the conversions and all is well. This is true for most multibyte encodings (which use one or more bytes per character) such as UTF-8 as well. However, this is not true of double-byte charsets, since by definition they use two bytes per character whereas ASCII uses one byte per character, so the conversions will mess up the parsing of the FIX message structure. If you search JIRA for all the older bugs that mention charsets and encodings and the ones they link to, you'll get a better picture of what works and what doesn't work and why (that's what I did at the time). |
Comment by Christoph John [ 24/Nov/18 ] |
Closing as duplicate of QFJ-789. |