[JDEV] Character Encodings and Languages thread

Lindsay.Marshall at newcastle.ac.uk Lindsay.Marshall at newcastle.ac.uk
Fri Jul 30 03:39:02 CDT 1999


> The problem is that we are dealing with an XML document here, the
> conversation between the client and server is just a normal streaming XML
> document.  I don't think it's possible to change character encodings on
> the fly within the document, it would be like changing byte-order randomly
> in any protocol stream.  This would cause problems implementing clients
> and most importantly, there aren't any XML parsers that would support this
> type of thing.  So adding an encoding="" to each message or tag wouldn't
> be feasible.

Sorry, but this is nonsense. We are *only* talking about CDATA here,
that is 8 bit bytes. The XML parser simply pulls these out without
interpretation (apart from escaped characters) and gives them to you.
(Well, at least that's what my XML parser does!). I am free to
interpret those bytes in anyway I chose. The encoding is only relevant
to the rendering software, it has nothing to do with the parser at
all. Implementing it is essentially trivial. 

> But, we can still support the required international charset functionality
> I believe.  When each client connects to the server, it identifies in the
> opening <?xml ?> tag it's character encoding.  By default it's UTF-8, but
> the server will and should support a range of other common encodings for
> clients to specify.  The *entire stream* is then encoded in what was
> specified.  The server is normally going to be sending back the default
> UTF-8 encoding.  I'm not sure what it would take to support this, but
> there might be value in adding a server option to change the default
> outgoing encoding, so that servers that are primarily international can
> use the most common encoding for data sent back to those clients. 

This doesn't help at all. What I want to be able to do is to
communicate with my friends in Korea in korean, my friends in japan in
Japanese and use English here. I want to do this over a single message
stream. Not everyone will have UTF-8 support on their machines.

> What this means is that the server is going to have to translate
> internally between different encodings.  This is where things start to get
> a little fuzzy for me... Is there a library out there for doing this sort
> of thing, is it not common in other software?

The server need do nothing at all if you allow the encoding attribute -
it simply passes on the bytes and the client deals with it (or not).
Having the server do anything is just silly. Always remember that it is
just CDATA!!!
 
> and part of XML:
> 	http://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag

Thanks for the ref. I knew it was all well defined.

L.
-- 
http://catless.ncl.ac.uk/Lindsay




More information about the JDev mailing list