[JDEV] Writings from the Journal of TCharron

arh14 at cornell.edu arh14 at cornell.edu
Wed Aug 4 13:12:46 CDT 1999


I think I've deduced that we agree entirely.  *Letting* the XML doc be 
encoded in various formats, while it doesn't necessarily help us now, 
doesn't hurt anything (as long as everybody reads the encoding header on 
the doc and complies).  This is separate from the encoding of the 
actual messages, which should always be allowed to be variable, and is 
facilitated by a concise message 'encoding="foo"' attribute.

Aaron

----

On Wed, 4 Aug 1999, Scott Robinson wrote:

> Interleaved response.
> 
> Scott.
> 
> * arh14 at cornell.edu translated into ASCII [Wed, Aug 04, 1999 at 01:23:24PM =
> -0400][<Pine.SOL.3.91.990804131428.3558D-100000 at travelers.mail.cornell.edu>]
> >=20
> > On Wed, 4 Aug 1999, Scott Robinson wrote:
> >=20
> > > What you said is along the lines in my head. I'll spew my thoughts some
> > > more, since they have been a bit more refined.
> > >=20
> > > First off, since we love being able to debug manually with telnet, the =
> C/S
> > > MUST support ASCII. Moreover, since UTF-8 has ASCII and it is the XML
> > > standard, therefore the C/S should support UTF-8. There is really nothi=
> ng
> > > suprising here, but I'll just put that down.
> > >=20
> > > Second, I was waiting for the proper time to discuss UNICODE... which w=
> as to
> > > be my suggestion. Personally, and I'll admit I have not yet screwed aro=
> und
> > > with expat, although I've received the vibes it is quite difficult to c=
> hange
> > > charsets in mid-stream, I believe that since the XML standard allows fo=
> r a
> >=20
> > Sorry if I'm thick, but what would be the reason for switching=20
> > charsets in mid-stream of document parsing?  Wouldn't the entire XML doc =
> be=20
> > normalized to one standard, and, given a message encoding parameter, the=
> =20
> > client would decide what it wants to do with the normalized characters?  =
> My=20
> > understanding is that the XML markup itself should never deviate from a=
> =20
> > pre-stated charset, but the CDATA might (which, really, the parser doesn'=
> t=20
> > care about, right?).  If a standard is set, it will ultimately be the=20
> > client's responsibility to make sure all outgoing messages are=20
> > normalized, and all incoming messages are reconstituted in their favorite=
> =20
> > Star Trek dialect.
> >=20
> 
> Hmm. Let me think a sec.
> 
> Ok, I'm about to make an idioitic comment, but it's only because I'm the
> kinda guy that thinks this way. I see no reason not to allow for alternate
> characters in XML. I'll allow the point that it would only cause confusion
> later on and gives no functionality; however, in some future bizarre
> universe everyone _could_ be sending data across whatever we use instead of
> sockets in some strange charset. I would build in the functionality for the
> _XML_ (not CDATA) being in alternate charsets.
> 
> Moving to the current CDATA topic... I believe many messages ago the
> suggestion for adding a package for specifying what charset the CDATA is in
> was made. There were arguments again, but they were
> anti-internationalization ones. The only alternative given was a tag. A
> <message charset=3D"charset/unicode>...</message> solution is the nicest one
> in my mind.
> 
> > > charset different from UTF-8, that the C/S should be able to use that
> > > particular feature. I would note, that if the C/S cannot understand UNI=
> CODE
> > > (just an example) there should be a way of saying it. ala HTTP's "Accep=
> t:
> > > charset/ascii, charset/utf-8" and "Deny: charset/unicode".
> >=20
> > Should you really rely on the facility of XML to use different charsets? =
> =20
> > Really the only thing that needs to change charsets is the CDATA of=20
> > users' messages.  The markup itself never needs to deviate from a set=20
> > standard encoding.  This standard encoding should be broad enough to be=
> =20
> > able to store every other encoding clients might want to use.  You don't=
> =20
> > want to change the nature of the messenger based on the characteristics=
> =20
> > of the message (if that makes any sense).
> >=20
> 
> I believe my drivel was becoming overlapping. Let me seperate. The
> "messenger" should be able to support different charsets and the "message"
> inside should be able to be completely different.
> 
> > >=20
> > > Standardizing on UNICODE, though, might be a way to go. I'm not sure, b=
> ut if
> > > the C/S plain receives/sends ASCII, it could just convert inside and
> > > everyone could be happy.
> > >=20
> > > The following comments are certified werid.
> > >=20
> > > Scott.
> >=20
> > an interloper,
> > Aaron
> >=20
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> 




More information about the JDev mailing list