[JDEV] Character Encodings and Languages thread
Lindsay.Marshall at newcastle.ac.uk
Lindsay.Marshall at newcastle.ac.uk
Fri Jul 30 03:39:02 CDT 1999
> The problem is that we are dealing with an XML document here, the
> conversation between the client and server is just a normal streaming XML
> document. I don't think it's possible to change character encodings on
> the fly within the document, it would be like changing byte-order randomly
> in any protocol stream. This would cause problems implementing clients
> and most importantly, there aren't any XML parsers that would support this
> type of thing. So adding an encoding="" to each message or tag wouldn't
> be feasible.
Sorry, but this is nonsense. We are *only* talking about CDATA here,
that is 8 bit bytes. The XML parser simply pulls these out without
interpretation (apart from escaped characters) and gives them to you.
(Well, at least that's what my XML parser does!). I am free to
interpret those bytes in anyway I chose. The encoding is only relevant
to the rendering software, it has nothing to do with the parser at
all. Implementing it is essentially trivial.
> But, we can still support the required international charset functionality
> I believe. When each client connects to the server, it identifies in the
> opening <?xml ?> tag it's character encoding. By default it's UTF-8, but
> the server will and should support a range of other common encodings for
> clients to specify. The *entire stream* is then encoded in what was
> specified. The server is normally going to be sending back the default
> UTF-8 encoding. I'm not sure what it would take to support this, but
> there might be value in adding a server option to change the default
> outgoing encoding, so that servers that are primarily international can
> use the most common encoding for data sent back to those clients.
This doesn't help at all. What I want to be able to do is to
communicate with my friends in Korea in korean, my friends in japan in
Japanese and use English here. I want to do this over a single message
stream. Not everyone will have UTF-8 support on their machines.
> What this means is that the server is going to have to translate
> internally between different encodings. This is where things start to get
> a little fuzzy for me... Is there a library out there for doing this sort
> of thing, is it not common in other software?
The server need do nothing at all if you allow the encoding attribute -
it simply passes on the bytes and the client deals with it (or not).
Having the server do anything is just silly. Always remember that it is
just CDATA!!!
> and part of XML:
> http://www.w3.org/TR/1998/REC-xml-19980210#sec-lang-tag
Thanks for the ref. I knew it was all well defined.
L.
--
http://catless.ncl.ac.uk/Lindsay
More information about the JDev
mailing list