[JDEV] International Char Sets..

Thomas D. Charron tcharron at my-deja.com
Thu Jul 29 12:42:50 CDT 1999


  I think these questions in the expat FAQ may give some additional information regarding this conversation..

---
How can I get expat to deal with non-ASCII characters?

By default, expat assumes that documents are encoded in UTF-8. In UTF-8, ASCII characters are represented by a single byte as they would be in ASCII, but non-ASCII characters are represented by a sequence of two or more bytes all with the 8th bit set. The encoding most widely used for European languages is ISO 8859-1 which is not compatible with UTF-8. To use this encoding, expat must be told either by supplying an argument of "iso-8859-1" to XML_ParserCreate, or by starting the document with <?xml version="1.0" encoding="iso-8859-1"?>.

What encodings does expat support?

expat has built in support for the following encodings:

utf-8 
utf-16 
iso-8859-1 
us-ascii 
Additional encodings can be supported by using XML_SetUnknownEncodingHandler


---
Thomas Charron


--== Sent via Deja.com http://www.deja.com/ ==--
Share what you know. Learn what you don't.



More information about the JDev mailing list