[JDEV] Writings from the Journal of TCharron

Scott Robinson scott at tranzoa.com
Wed Aug 4 12:07:42 CDT 1999


What you said is along the lines in my head. I'll spew my thoughts some
more, since they have been a bit more refined.

First off, since we love being able to debug manually with telnet, the C/S
MUST support ASCII. Moreover, since UTF-8 has ASCII and it is the XML
standard, therefore the C/S should support UTF-8. There is really nothing
suprising here, but I'll just put that down.

Second, I was waiting for the proper time to discuss UNICODE... which was to
be my suggestion. Personally, and I'll admit I have not yet screwed around
with expat, although I've received the vibes it is quite difficult to change
charsets in mid-stream, I believe that since the XML standard allows for a
charset different from UTF-8, that the C/S should be able to use that
particular feature. I would note, that if the C/S cannot understand UNICODE
(just an example) there should be a way of saying it. ala HTTP's "Accept:
charset/ascii, charset/utf-8" and "Deny: charset/unicode".

Standardizing on UNICODE, though, might be a way to go. I'm not sure, but if
the C/S plain receives/sends ASCII, it could just convert inside and
everyone could be happy.

The following comments are certified werid.

Scott.

* Jon A. Cruz translated into ASCII [Wed, Aug 04, 1999 at 09:38:50AM -0700][<37A86C9A.426C8D66 at geocities.com>]
> (Note: my terms might not be the most technically accurate, but this is to convey
> a good overview)
> 
> Basically, you can think of Unicode as having a character set that contains just
> about all the characters you'd want to ever use, and maybe then some (there are
> contingents working hard on getting Tolkien's Tengwar and Cirth, and StarTrek
> Klingon in).
> 
> You can then think of actually storing this large character set using different
> encodings. UTF-8 and UTF-16 would be the two most common of these. UTF-16 has the
> advantage of all characters being 16-bit. UTF-8 is variable length, and has the
> advantage that the 7-bit US-ASCII range is preserved as-is in 8-bit characters.
> 
> Given that commands and such would be handy to be tested via telnet, that
> standard English stays one-byte, etc., it probably best to standardize on UTF-8
> being the one encoding to be used over the wire. Internally, the clients can be
> recommended to use UTF-16, or whatever is most efficient to them, but only UTF-8
> should be allowed to be exchanged. For UI input and output, the client might
> convert to and from a platform-specific charset and encoding, but then go
> straight to Unicode for all manipulation.
> 
> One side-effect of standardizing the charset to Unicode would be that security
> things such as passwords would be easy to handle on different systems.
> 
> On MS Windows, COM works by stating that all strings are Unicode. Period. Also,
> MS Offices does all it's work internally as Unicode, and converts whenever it
> needs to get data in or out of a Windows system call. (this is because Windows 9x
> has all the Unicode versions of API calls present but stubbed to return errors.)
> I mention this as an example of "gee, a company that mangles and avoids standards
> as much as they do still complies in this area, so maybe we should too".
> 
> 
> arh14 at cornell.edu wrote:
> 
> > On Tue, 3 Aug 1999, Scott Robinson wrote:
> >
> > > If both are bad, then what is the "correct" solution? In my mind, Jabber
> > > _cannot_ be released without international support.
> > >
> > > Scott.
> > >
> > > * Thomas D. Charron translated into ASCII [Sat, Jul 31, 1999 at 11:49:53AM =
> > > -0700][<IJBOEKFFLEBPEAAA at my-deja.com>]
> > > > >I would focus on the "must accept." I'm fine with accepting UTF-8 and
> > > > >UTF-16, however (and this is the reason they included a standard for pas=
> > > sing
> > > > >encoding) we should also be able to handle internationalization. As the
> >
> > Weren't the UTF encodings designed for internationalization?  Can't
> > Jabber be standardized to UTF-16 or something?  The size of messages is
> > typically negligable (and on-the-fly compression would send that down
> > even more).  Clients would be responsible for displaying the UTF-16 chars
> > whatever way they want (or perhaps include a flag in the message), via
> > plugins or something (like Winamp's language packages).
> >
> > > > >example was given, what would the Korean Jabber user think? Answer: they
> > > > >wouldn't use Jabber...
> > > >=20
> > > >   The problem I can as far as I can see it is, unless we convert from cha=
> > > rset to charset, we can't really provide for inter-charset communications..=
> >
> > If the Jabber core (on the client) was standardized to some charset, then
> > couldn't it translate the charset to the standard if the user insisted on
> > using some input method with a non-standard charset?  If both outgoing
> > messages (client-controlled), and all incoming messages to the client
> > (server-controlled) are in the same charset there is no problem.
> >
> > Aaron
> >
> > >   Switching charsets midstream =3D bad.  Throwing off a new expat object fo=
> > > r each packet, IMHO, also =3D bad..
> > > >=20
> > > >   But I'm also not experienced at ALL in internationalization..  Heck, wh=
> > > en I need to ./configure I always ./configure --disable-nls..  ;-P
> > > > ---
> > > > Thomas Charron
> > > >=20
> > > >=20
> > > > >
> > > > >[snap]
> > > > >>   I know there's more that I'm forgetting..
> > > > >
> > > > >Everyone does! ;)
> > > > >
> > > > >> ---
> > > > >> Thomas Charron
> > > > >>=20
> > > > >>=20
> > > > >> --=3D=3D Sent via Deja.com http://www.deja.com/ =3D=3D--
> > > > >> Share what you know. Learn what you don't.
> > > > >
> > > >=20
> > > >=20
> > > > --=3D=3D Sent via Deja.com http://www.deja.com/ =3D=3D--
> > > > Share what you know. Learn what you don't.
> > >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> 
> --
> "My new computer's got the clocks, it rocks
> But it was obsolete before I opened the box" - W.A.Y.
> 
> 
> 
> 
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 240 bytes
Desc: not available
URL: <https://www.jabber.org/jdev/attachments/19990804/634b95b5/attachment-0002.pgp>


More information about the JDev mailing list