[jdev] Help parsing incremental XML

Dr. Craig Hollabaugh craig at hollabaugh.com
Mon Mar 29 13:03:27 CST 2004


Iain,

Thanks for the implementation/real world explaination of SAX parser
operation. From what you've said here, the SAX implementation is far
from the simplistic coverage in many XML books.

Learned something new today, thanks!
Craig
 



On Mon, 2004-03-29 at 11:53, Iain Shigeoka wrote:
> On Mar 27, 2004, at 9:07 AM, Craig Hollabaugh wrote:
> 
> > Having intermediate callbacks is the main reason why people
> > use a SAX parser. So that is an implementation issue with
> > .NET's SAX parser.
> 
> It's actually an implementation "feature" of most SAX parsers. Java's 
> most popular SAX parsers follow the same pattern; crimson, xerces, and 
> the built in sax parser in 1.4+ (which is crimson isn't it?). In any 
> case, for efficiency, most SAX parsers read in a buffer load of input 
> before parsing it and generating events. until the buffer is filled, 
> the sax parser blocks, even if there are complete event tokens already 
> in the buffer. With XMPP this is obviously a problem. The buffers are 
> almost always bigger than the packet size of XMPP which relies on each 
> packet being processed before the next is sent. So you get 'stuck' 
> parsers.
> 
> On open source parsers, you can dig into the source code and modify the 
> parser to use a 1 character buffer. You'll probably want to buffer the 
> reader before handing it to such a parser or your performance will go 
> through the floor.
> 
> Java has pull parsers available that get around the problem and I find 
> are much better suited for streaming XML found in XMPP. Pull parsers 
> are the mirror opposite of push parsers like SAX; in pull parsing 
> events are pulled from the parser by calling methods on the parser when 
> you're ready for the next token - as opposed to SAX where the events 
> are pushed to you in callbacks. For XMPP, you can simplify your logic 
> by handing the parser over to specialized event consumers based on the 
> first tag (e.g. see iq tag, give parser to iq handler to read and 
> handle, etc). The pull parser I've used quite a lot is: 
> http://www.xmlpull.org which has the benefit of being open source, 
> small, and wicked fast. BEA is chairing a JCP committee to establish a 
> Java pull parsing standard (named stax). You can find it at BEA's site, 
> or the JCP site (search for stax).
> 
> Finally, since you're working with Java, I have to push Smack. Java, 
> open source, simple, extensible, small, and uses xml pull parsing under 
> the covers. You may want to check it out. Makes working with XMPP in 
> Java a breeze: http://www.jivesoftware.com/xmpp/smack
> 
> -iain
> 
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> https://jabberstudio.org/mailman/listinfo/jdev
-- 
------------------------------------------------------------
Dr. Craig Hollabaugh, craig at hollabaugh.com
Author of Embedded Linux: Hardware, Software and Interfacing
www.embeddedlinuxinterfacing.com





More information about the JDev mailing list