[JDEV] XML Requirements for Parsing Jabber Messages

Tijl Houtbeckers thoutbeckers at splendo.com
Mon Nov 4 07:41:02 CST 2002


Tony Cheung <dragonman at asiayeah.com> wrote on 3-11-2002 1:27:20:
>
>Hi All,
>
>I would like to know if there is any specific requirement for parsing 
>Jabber messages in XML? I would either make my own XML parser or use a 
>third party XML parser.
>

Jabber uses XML-streams:

http://www.jabber.org/protocol/xmlstreams.html

>Specifically,
>
>1) Can I transmit BIG5 encoded strings in the XML messages? Or should 
>I 
>  only use unicode with UTF8 encoding?

>From http://www.w3.org/TR/REC-xml:
All XML processors must accept the UTF-8 and UTF-16 encodings of 10646. 
In the XML-streams documentation there is no mention of UTF-16 not 
being supported. 

However the opensource jabber.org server only accepts UTF-8. The jabber.
com does work with UTF-16. 

>
>2) Is there any requirement for name entities?
>3) Is there any requirement for handling DTDs?
>4) Does Jabber require processing instruction?

from the XML-streams documentation:

Restrictions

XML streams are used to transport a subset of XML. Specifically, XML 
streams SHOULD NOT contain processing instructions, non-predefined 
entities (as defined in Section 4.6 of the XML 1.0 specification, 
comments, or DTDs. Any such XML data SHOULD be ignored. 

(wich contains a missing ")" apperently)

>5) What about XML comments or CDATA sections?

With the opensource jabber.org server you can send CDATA to the server, 
wich will "canonicalize" them before it passes them on to other 
clients, so they will be replaced by their PCDATA equivelant. However 
I've not been able to find anything about this in the docs, so other 
servers might not? 

I also haven't tested wether it also does this for XML that comes from 
a component rather then another client. I do vaguely remember something 
about a client crashing cause of CDATA in groupchat. maybe it's in the 
archives somewhere. Or maybe it had to do something with the component 
itself receiving CDATA? 

Does anyone know more about how exactly CDATA is treated in the server 
and what the offical handeling should be? 

>
>6) Do we need strict XML validation?
>

The Jabber Server is not allowed to send you something that's not XML. 
It's not called *near* real time message delivery for nothing ;) It 
does (or should) validate. So you don't need a validating parser. 

Then there is one point you didn't touch: namespaces. While something 
called "namespaces" is being used in the protocol, it isn't entirely in 
line with the W3C recommendation. This actually makes things simpeler 
though cause your parser doesn't have to support them. 

>I understand Jabber wants Jabber clients to be extremely simply. But I 
>think XML parsing is not a simple stuff alone, unless we are only 
>using a subset of XML syntaxes or we simply use a full-blown XML 
>parser. However, some handheld devices, such as J2ME, it may be always 
>be feasible to get a full-blown XML parser.
>
>Thank you very much. Any idea?

As you can see XML-streams are less complex XML itself, it's not that 
hard to write your own parser if you know what you're doing. However I 
recommend you only do this if you have specific needs, since there are 
already many libraries outthere written for this exact purpose. 

-- 
Tijl Houtbeckers
Java/J2ME/GPRS Software Engineer @ Splendo
The Netherlands





More information about the JDev mailing list