[QFJ-956] Checksum validation of incoming messages is not correct in some cases Created: 04/Oct/18  Updated: 24/Nov/18  Resolved: 24/Nov/18

Status: Closed
Project: QuickFIX/J
Component/s: None
Affects Version/s: 2.1.0
Fix Version/s: None

Type: Bug Priority: Default
Reporter: Valery Fadeev Assignee: Unassigned
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates QFJ-789 Fully support alternate encodings (ch... Open

 Description   

This code validates the checksum of incoming messages:

Message.java:

private void validateCheckSum(*String* messageData) throws InvalidMessage {
        try {
            // Body length is checked at the protocol layer
            final int checksum = trailer.getInt(CheckSum.FIELD);
            if (checksum != *MessageUtils.checksum(messageData)*) {
                // message will be ignored if checksum is wrong or missing
                throw MessageUtils.newInvalidMessageException("Expected CheckSum=" + MessageUtils.checksum(messageData)
                        + ", Received CheckSum=" + checksum + " in " + messageData, this);
            }
        } catch (final FieldNotFound e) {
            throw MessageUtils.newInvalidMessageException("Field not found: " + e.field + " in " + messageData, this);
        }
    }

And in MessageUtils calculation ends up here:

public static int checksum(*Charset charset*, String data, boolean isEntireMessage) {
        if (CharsetSupport.isStringEquivalent(charset)) { // optimization - skip charset encoding
            int sum = 0;
            int end = isEntireMessage ? data.lastIndexOf("\00110=") : -1;
            int len = end > -1 ? end + 1 : data.length();
            for (int i = 0; i < len; i++) {
                sum += data.charAt(i);
            }
            return sum & 0xFF; // better than sum % 256 since it avoids overflow issues
        }
        return checksum(*data.getBytes(charset)*, isEntireMessage);
    }

So the problem here is that calculation happens NOT on raw network bytes but on a java String that is later transformed to bytes using CharsetSuport's charset, which can be not the same what original message producer used, and produce different bytes.

So in practice this can cause to situation where message with correct checksum will be ignored because was read by QFJ using incorrect charset.

Looks like I'm witnessing this situation. Use case is to use non-unicode symbol in message and different charsets on producer-consumer sides.



 Comments   
Comment by Christoph John [ 08/Oct/18 ]

Hmm, I am not 100% sure about this but IIRC the only encoding that has to be supported by FIX is ISO-8859-1. But not sure to be honest.
Of course most FIX engines support other charsets but I guess you'll have to check with your counterparty which charset to use. I guess you already discovered how you can change the charset: https://www.quickfixj.org/usermanual/2.1.0//usage/charset.html
Double byte charsets should also work qith QFJ for some time now. But never tested it by myself.

Cheers,
Chris.

Comment by amichair [ 22/Nov/18 ]

iirc the proper way is to support encoded fields, which many engines don't (including QFJ, I believe there's an issue open for that somewhere). Treating full messages as being strings in a different charset, and setting this charset globally, is not by the spec, but it's a practical workaround that is used in several engines (including QFJ).

You have to check with your counter-party what charset/encoding they use and make sure they also apply it globally to the full messages on their side, and set it in CharsetSupport on your side, and that should work. If they do use individual encoded fields, and you find yourself contributing an implementation of it for QFJ, that would be great

btw the reason the global workaround works is because the other fields and the FIX messages themselves (tag numbers, delimiters etc.) are all in ASCII, and nearly all charset encodings are backwards-compatible with ASCII, so they remain intact during the conversions and all is well. This is true for most multibyte encodings (which use one or more bytes per character) such as UTF-8 as well. However, this is not true of double-byte charsets, since by definition they use two bytes per character whereas ASCII uses one byte per character, so the conversions will mess up the parsing of the FIX message structure.

If you search JIRA for all the older bugs that mention charsets and encodings and the ones they link to, you'll get a better picture of what works and what doesn't work and why (that's what I did at the time).

Comment by Christoph John [ 24/Nov/18 ]

Closing as duplicate of QFJ-789.

Generated at Sat Nov 23 07:25:38 UTC 2024 using JIRA 7.5.2#75007-sha1:9f5725bb824792b3230a5d8716f0c13e296a3cae.