[JDEV] UTF 8 problems in jabber stream...

David Waite mass at akuma.org
Wed Sep 25 20:22:16 CDT 2002


kmq at gmx.at wrote:

>Hi,
>as most of you know i amd developing my new client using the wxWindows workframe.
>App is working fine so far, even UTF-8 conversion worked.
>
>Whatever yesterday i noticed that some UTF-8 data that is sent from 
>ns at neutralstone.net to my client causes thw wxString conversion functions to fail 
>(including buffer overflow).
>
>First I thought there might be a bug in the wxWindows UTF 8 conversion functions and i 
>mailed the developers of wxWindows.
>You can follow the discussion here: http://lists.wxwindows.org/pipermail/wx-users/2002-
>September/025284.html
>
>In his last mail the author:
>
>  
>
>>>fails:
>>>[<display>2002?09?25? 07?39?57?</display>] ==
>>>[<display>2002Õ¦¦09µ£ê25µùÑ 07µÖé39Õêå57þºÆ</display>]
>>>3c 64 69 73 70 6c 61 79 3e 32 30 30 32 e5 b9 b4 30 39 e6 9c 88 32 35 e6 97 a5 20 30
>>>37 e6 99 82 33 39 e5 88 86 35 37 e7 a7 92 3c 2f 64 69 73 70 6c 61 79
>>>      
>>>
>> The sequence "e5 b9" (i.e. the bytes following "2002") is invalid UTF-8 to
>>the best of my knowledge. It could still be nice if the resulting string
>>
The sequence is "e5 b9 b4", I believe. This works out to be 


byte #1: 1110 0101
byte #2: 10 111001
byte #3: 10 110100

Which works out to be the bit pattern 01011110 01110100, or U+5E74 (the 
CJK Unified Ideograph ?). The rest also appears to be a legal UTF-8 string.

>>was NUL-terminated though, but I'm not sure if it's worth doing this.
>>In any case, your real problem is that you get invalid UTF-8 input and I
>>don't know why does it happen.
>>
>> Regards,
>>VZ/
>>
There is a bug in older versions of expat which allows some invalid 
UTF-8 through. I do not know if the open-source server has been upgraded 
to a proper version.

-David Waite




More information about the JDev mailing list