[jdev] Re: Re: Re: Parsing XMPP/Jabber protocol
Heiner Wolf
wolf at bluehands.de
Mon Jan 3 12:49:16 CST 2005
Hi,
>You still have to manage the buffer/frame coming off the socket while
>building the DOM tree. The natural solutions seems to be to layer a
>framing mechanism on top of SAX to manage the network i/o possibly
>exposing a pull style API. SAX may look like a streaming API but it is
>really designed for conserving memory footprint while crunching through
>documents that are "to hand" locally. It's not designed for network data
>streams. I imagine life would be simpler for your fragment approach if
>you were not modeling the entire message stream as a single XML document
>and were working with discrete XML documents instead of child nodes. I
>definitely would not want to be holding onto full DOM trees or some such
>while managing thousands of IM conversations.
In Jabber it is almost guaranteed that stanzas are small relative to the entire stream data volume. My fragment API simply parses all data that comes from the socket. There is no buffering between the socket and the parser. The SAX parser buffers anyway. When I get the SAX callbacks then my fragment parser creates DOM nodes. Once the fragment parser receives endElement() from SAX, it calls the fragment callback with the node. The callback implementation then decides if the node is to be discarded or be added to the node one level higher. In case of Jabber's first level nodes (stanzas), it always decides to discard the data after evaluatiog the node to keep the memory footprint very small. The memory holds:
- the initial <stream:stream> as a data structure,
- the current stanza as a data structure,
- the yet unparsed raw XML
The XML flows through, creates small node structures, which are deleted soon after. Nothing remains in memory. Especially nothing that you could call full DOM trees. Nothing that would hurt in case of 1000s of streams.
SAX is really odd for applications. SAX needs FAX to be useful. I just wondered why teh decision si always between SAX and DOM, although SAX is not exactly what programmers want for stream parsing.
Regards
hw
--
Dr. Klaus H. Wolf
bluehands GmbH & Co.mmunication KG
http://www.bluehands.de/people/hw
+49 (0721) 16108 75
--
Jabber enabled Virtual Presence on the Web: http://www.lluna.de/
Open Source Future History: http://www.galactic-developments.com/
More information about the JDev
mailing list