[jdev] Help parsing incremental XML
Iain Shigeoka
iain at jivesoftware.com
Mon Mar 29 12:53:46 CST 2004
On Mar 27, 2004, at 9:07 AM, Craig Hollabaugh wrote:
> Having intermediate callbacks is the main reason why people
> use a SAX parser. So that is an implementation issue with
> .NET's SAX parser.
It's actually an implementation "feature" of most SAX parsers. Java's
most popular SAX parsers follow the same pattern; crimson, xerces, and
the built in sax parser in 1.4+ (which is crimson isn't it?). In any
case, for efficiency, most SAX parsers read in a buffer load of input
before parsing it and generating events. until the buffer is filled,
the sax parser blocks, even if there are complete event tokens already
in the buffer. With XMPP this is obviously a problem. The buffers are
almost always bigger than the packet size of XMPP which relies on each
packet being processed before the next is sent. So you get 'stuck'
parsers.
On open source parsers, you can dig into the source code and modify the
parser to use a 1 character buffer. You'll probably want to buffer the
reader before handing it to such a parser or your performance will go
through the floor.
Java has pull parsers available that get around the problem and I find
are much better suited for streaming XML found in XMPP. Pull parsers
are the mirror opposite of push parsers like SAX; in pull parsing
events are pulled from the parser by calling methods on the parser when
you're ready for the next token - as opposed to SAX where the events
are pushed to you in callbacks. For XMPP, you can simplify your logic
by handing the parser over to specialized event consumers based on the
first tag (e.g. see iq tag, give parser to iq handler to read and
handle, etc). The pull parser I've used quite a lot is:
http://www.xmlpull.org which has the benefit of being open source,
small, and wicked fast. BEA is chairing a JCP committee to establish a
Java pull parsing standard (named stax). You can find it at BEA's site,
or the JCP site (search for stax).
Finally, since you're working with Java, I have to push Smack. Java,
open source, simple, extensible, small, and uses xml pull parsing under
the covers. You may want to check it out. Makes working with XMPP in
Java a breeze: http://www.jivesoftware.com/xmpp/smack
-iain
More information about the JDev
mailing list