[QFJ-382] Foreign Language Support - Multibyte Characters - Chinese Created: 09/Dec/08 Updated: 02/Nov/15 Resolved: 09/Jun/14 |
|
Status: | Closed |
Project: | QuickFIX/J |
Component/s: | Engine |
Affects Version/s: | 1.3.3 |
Fix Version/s: | 1.6.0 |
Type: | Improvement | Priority: | Default |
Reporter: | Jason Aubrey | Assignee: | amichair |
Resolution: | Fixed | Votes: | 3 |
Labels: | encoding | ||
Environment: |
All |
Attachments: | Changes.zip | ||||||||||||||||||||||||||||
Issue Links: |
|
Description |
I need QFJ to support Chinese characters. So I modified my working copy to add this functionality/tests. I could simply commit the changes but I don't have write access to the repository. I'll just post the relevant changes here for now. It'd be nice if I could simply add all the diffs as attachments to this message. Message.java
StringBuffer sb = new StringBuffer();
Field.java
FieldTest.java
} |
Comments |
Comment by Jason Aubrey [ 09/Dec/08 ] |
The revision number of my working copy is 892 (was head revision last week at least). |
Comment by Steve Bate [ 09/Dec/08 ] |
Hi Jason, Thanks for the patches. Have you verified that the checksum calculations work with these changes? The current calculation sums characters which are assumed to be 1-byte. This assumption is made to avoid the need to transcode the message string to bytes for the purpose of calculating the checksum. |
Comment by Jason Aubrey [ 09/Dec/08 ] |
Hi Steve, I think there may have been some checksum related exceptions initially when sending multibyte characters due to how the buffer was allocated (based on character counts instead of byte count). However, I didn't modify the checksum code (shown below) since it still works in the same basic way. private int checkSum(String s) { return (sum + 1) % 256; The only difference in behavior is that each character's value can be much larger than simple ASCII values. For example in utf-8, "\u65E0\u6548\u7684\u7528" which is equivalent to "无效的用" has four characters that are each four hex digits long. So if each of these were FFFF then the sum would be 4 * FFFF = 3FFFC (262,140 in base 10). Given that the sum is stored as an integer the only risk seems to be overflow, which would occur after 2,147,483,647. With four byte character encoding, the overflow would only occur after 8,192 characters (i.e. 2,147,483,647 / 262,140 ) and this assumes each character is FFFF which it would likely not be. I don't think this is a concern though. If it were a concern, 'sum' could be stored as a larger type. I didn't give any thought to the '% 256' logic since I figured it's unique enough. |
Comment by amichair [ 09/Jun/14 ] |
The above analysis is incorrect, since the checksum should be performed on the encoded bytes, not the source (UTF-16) characters. btw, to avoid an overflow you can use '& 0xFF' instead of '% 256'. In any case, this is now fixed - thanks for the patches, which helped along the way. Currently setting a charset via CharsetSupport should work with any charset that is a superset of ASCII, which luckily is most of them. |
Comment by Kou Jun [ 02/Nov/15 ] |
is there any sample code to send and receive Chinese characters ? |