[jdev] Help parsing incremental XML

Iain Shigeoka iain at jivesoftware.com
Mon Mar 29 12:53:46 CST 2004


On Mar 27, 2004, at 9:07 AM, Craig Hollabaugh wrote:

> Having intermediate callbacks is the main reason why people
> use a SAX parser. So that is an implementation issue with
> .NET's SAX parser.

It's actually an implementation "feature" of most SAX parsers. Java's 
most popular SAX parsers follow the same pattern; crimson, xerces, and 
the built in sax parser in 1.4+ (which is crimson isn't it?). In any 
case, for efficiency, most SAX parsers read in a buffer load of input 
before parsing it and generating events. until the buffer is filled, 
the sax parser blocks, even if there are complete event tokens already 
in the buffer. With XMPP this is obviously a problem. The buffers are 
almost always bigger than the packet size of XMPP which relies on each 
packet being processed before the next is sent. So you get 'stuck' 
parsers.

On open source parsers, you can dig into the source code and modify the 
parser to use a 1 character buffer. You'll probably want to buffer the 
reader before handing it to such a parser or your performance will go 
through the floor.

Java has pull parsers available that get around the problem and I find 
are much better suited for streaming XML found in XMPP. Pull parsers 
are the mirror opposite of push parsers like SAX; in pull parsing 
events are pulled from the parser by calling methods on the parser when 
you're ready for the next token - as opposed to SAX where the events 
are pushed to you in callbacks. For XMPP, you can simplify your logic 
by handing the parser over to specialized event consumers based on the 
first tag (e.g. see iq tag, give parser to iq handler to read and 
handle, etc). The pull parser I've used quite a lot is: 
http://www.xmlpull.org which has the benefit of being open source, 
small, and wicked fast. BEA is chairing a JCP committee to establish a 
Java pull parsing standard (named stax). You can find it at BEA's site, 
or the JCP site (search for stax).

Finally, since you're working with Java, I have to push Smack. Java, 
open source, simple, extensible, small, and uses xml pull parsing under 
the covers. You may want to check it out. Makes working with XMPP in 
Java a breeze: http://www.jivesoftware.com/xmpp/smack

-iain




More information about the JDev mailing list