[JDEV] Writings from the Journal of TCharron

Thu Aug 5 11:49:31 CDT 1999

Well, having any document be composed of mixed encodings might cause some
problems, especially when it comes to where the practice differs from the theory.
It starts to add an extra complexity that grows the chance for bugs in processing
and other ways.

For a little hint of the complexity, just read this section of the XML spec:
http://www.w3.org/TR/1998/REC-xml-19980210#sec-guessing
and that's just for the few known encodings for encoding the encoding.

One example is if a document contains an encoding that is not recognized by the
parser. Since the encoding declarations are just plain-text labels, the parser
might not recognize some encodings even if they are support. In any case, if the
parser hits an unrecognized encoding, it can't handle the rest of the document,
and would need to throw an exception. This can be worked around by some form of
content negotiation, but that has problems also.

There are many other things, but just keep in mind the extra complexity that
letting the XML doc be encoded in various formats will bring. Standardizing on
just UTF-8 would be similar to TCP/IP protocols standardizing on network byte
order. It just makes programming so much simpler and error-resistant.

arh14 at cornell.edu wrote:

> I think I've deduced that we agree entirely.  *Letting* the XML doc be
> encoded in various formats, while it doesn't necessarily help us now,
> doesn't hurt anything (as long as everybody reads the encoding header on
> the doc and complies).  This is separate from the encoding of the
> actual messages, which should always be allowed to be variable, and is
> facilitated by a concise message 'encoding="foo"' attribute.
>

--
"My new computer's got the clocks, it rocks
But it was obsolete before I opened the box" - W.A.Y.