[JDEV] Writings from the Journal of TCharron

Jon A. Cruz joncruz at geocities.com
Fri Aug 6 11:23:43 CDT 1999


Scott Robinson wrote:

> It also leaves problems for internationalization later on. That's been shown
> before. Either way, we already noted that UTF-8 and UTF-16 (as stated in the
> XML spec) will be our default.

Just remember, this problem is not just for internationalization. We have this problem today, with English as
long as we want to support more than just MS Windows.   ;-)

For most western systems, it would be best to stick with just UTF-8. For far-east languages it might be a little
more efficient data wise to send in UTF-16, but that might really add complexity (byte order issues and all). For
our initial stuff, try doing UTF-8. Then for our immediate targets, here are the main charsets we'd have to be
working with:

MS-Windows: Code-Page 1252
Macintosh: Mac Roman
Unix/Linux: ISO-8859-1

ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP1252.TXT
ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/ROMAN.TXT
ftp://ftp.unicode.org/Public/MAPPINGS/ISO8859/8859-1.TXT

Should each client support 3 different encodings, or just one? Just one is the logical choice, and also prevents
need for re-writing the clients with each new encoding supported. I presented links to those mapping tables, but
I know for sure on Windows and Linux there are system calls to convert from the current default charset to
Unicode. I'm not sure about the Mac.

But there is an extra problem with Macs. How the encoding is done is dependent on the system region code. The
originating Mac has this, but the receiver does not. This is yet another reason to convert to UTF-8 while still
on the originating Mac. See ftp://ftp.unicode.org/Public/MAPPINGS/VENDORS/APPLE/README.TXT for details.

--
"My new computer's got the clocks, it rocks
But it was obsolete before I opened the box" - W.A.Y.







More information about the JDev mailing list