[JDEV] Writings from the Journal of TCharron

Jon A. Cruz joncruz at geocities.com
Wed Aug 4 11:38:50 CDT 1999


(Note: my terms might not be the most technically accurate, but this is to convey
a good overview)

Basically, you can think of Unicode as having a character set that contains just
about all the characters you'd want to ever use, and maybe then some (there are
contingents working hard on getting Tolkien's Tengwar and Cirth, and StarTrek
Klingon in).

You can then think of actually storing this large character set using different
encodings. UTF-8 and UTF-16 would be the two most common of these. UTF-16 has the
advantage of all characters being 16-bit. UTF-8 is variable length, and has the
advantage that the 7-bit US-ASCII range is preserved as-is in 8-bit characters.

Given that commands and such would be handy to be tested via telnet, that
standard English stays one-byte, etc., it probably best to standardize on UTF-8
being the one encoding to be used over the wire. Internally, the clients can be
recommended to use UTF-16, or whatever is most efficient to them, but only UTF-8
should be allowed to be exchanged. For UI input and output, the client might
convert to and from a platform-specific charset and encoding, but then go
straight to Unicode for all manipulation.

One side-effect of standardizing the charset to Unicode would be that security
things such as passwords would be easy to handle on different systems.

On MS Windows, COM works by stating that all strings are Unicode. Period. Also,
MS Offices does all it's work internally as Unicode, and converts whenever it
needs to get data in or out of a Windows system call. (this is because Windows 9x
has all the Unicode versions of API calls present but stubbed to return errors.)
I mention this as an example of "gee, a company that mangles and avoids standards
as much as they do still complies in this area, so maybe we should too".


arh14 at cornell.edu wrote:

> On Tue, 3 Aug 1999, Scott Robinson wrote:
>
> > If both are bad, then what is the "correct" solution? In my mind, Jabber
> > _cannot_ be released without international support.
> >
> > Scott.
> >
> > * Thomas D. Charron translated into ASCII [Sat, Jul 31, 1999 at 11:49:53AM =
> > -0700][<IJBOEKFFLEBPEAAA at my-deja.com>]
> > > >I would focus on the "must accept." I'm fine with accepting UTF-8 and
> > > >UTF-16, however (and this is the reason they included a standard for pas=
> > sing
> > > >encoding) we should also be able to handle internationalization. As the
>
> Weren't the UTF encodings designed for internationalization?  Can't
> Jabber be standardized to UTF-16 or something?  The size of messages is
> typically negligable (and on-the-fly compression would send that down
> even more).  Clients would be responsible for displaying the UTF-16 chars
> whatever way they want (or perhaps include a flag in the message), via
> plugins or something (like Winamp's language packages).
>
> > > >example was given, what would the Korean Jabber user think? Answer: they
> > > >wouldn't use Jabber...
> > >=20
> > >   The problem I can as far as I can see it is, unless we convert from cha=
> > rset to charset, we can't really provide for inter-charset communications..=
>
> If the Jabber core (on the client) was standardized to some charset, then
> couldn't it translate the charset to the standard if the user insisted on
> using some input method with a non-standard charset?  If both outgoing
> messages (client-controlled), and all incoming messages to the client
> (server-controlled) are in the same charset there is no problem.
>
> Aaron
>
> >   Switching charsets midstream =3D bad.  Throwing off a new expat object fo=
> > r each packet, IMHO, also =3D bad..
> > >=20
> > >   But I'm also not experienced at ALL in internationalization..  Heck, wh=
> > en I need to ./configure I always ./configure --disable-nls..  ;-P
> > > ---
> > > Thomas Charron
> > >=20
> > >=20
> > > >
> > > >[snap]
> > > >>   I know there's more that I'm forgetting..
> > > >
> > > >Everyone does! ;)
> > > >
> > > >> ---
> > > >> Thomas Charron
> > > >>=20
> > > >>=20
> > > >> --=3D=3D Sent via Deja.com http://www.deja.com/ =3D=3D--
> > > >> Share what you know. Learn what you don't.
> > > >
> > >=20
> > >=20
> > > --=3D=3D Sent via Deja.com http://www.deja.com/ =3D=3D--
> > > Share what you know. Learn what you don't.
> >
>
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev

--
"My new computer's got the clocks, it rocks
But it was obsolete before I opened the box" - W.A.Y.







More information about the JDev mailing list