[jdev] Re: VTD-XML version 1.6

Sat May 20 04:19:19 CDT 2006

On Sat May 20 05:56:19 2006, Justin Karneges wrote:
> On Friday 19 May 2006 20:39, Peter Saint-Andre wrote:
> > But it turns out that streaming XML has some inherent benefits, 
> one of
> > which is that you don't have to create a new parser instance 
> every time
> > you want to send, receive, or route a message.
> 
> More importantly, XMPP-specific parsing code doesn't need to be 
> written.  Any other wire protocol would require writing a parser, 
> but with XMPP you can just throw SAX at it.
> 
> 
Ah, you see I approached XMPP looking for the framing for the 
messages, because every other protocol I deal with has explicit 
framing for the messages.

So, I do string matches to pull out the stanzas, and turn them into 
complete XML documents by wrapping them in the real <stream> and 
faked </stream>, and use DOM on the resultant docs. In other words, I 
treat them as framed messages to pull out and parse, where the 
framing depends on the opening bytes (up to the first space or >). 
Maybe I'm weird, but it seems to work well. :-)

There's a potential problem where you end up finding a closing tag 
that's actually not closing the stanza, because of namespace 
redefinitions or whatever, but that's relatively easy to deal with, 
you just find the next candidate end-of-stanza tag. You get similar 
problems if you want to isolate messages in IMAP, too, where the 
framing changes depending on the type of message.

My favourite benefit to XML streams over XML messages, though, is 
that namespace declarations can be moved out of the messages and into 
the root element. That's very cool for octet-obsessives like me.

(For compression people: Although moving the namespace declarations 
further toward the root of the document tree to remove repetitions is 
simply a representational change, the longevity of the impact 
relative to the stream is large, so you tend to run out of the 
reference length limit for Ziv/Lempel type compressions, and the 
namespace strings themselves are sufficiently long that statistical 
modelling compression algorithms won't have a good enough effect. 
Also, because the namespace declaration strings tend to be 
self-similar, putting them all together makes them compress better, 
too.)

> Granted, I'm also one of those guys that "wouldn't have designed it 
> that way", but I still think XML streams are cool in that geeky 
> sort of way.  Look mom, no parser.
> 
> 
I think I probably would have gone for explicit framing, but I put 
that down to reflex rather than any particularly sound principles. I 
treat the data as if it does have explicit framing anyway, so it 
doesn't actually really matter, and different parsing techniques mean 
that there's advantage in letting the XML do the framing for you in 
the protocol.

> I agree with Peter though, talking about the rationale in 2006 is 
> kind of pointless.

Well, it's pointless from the point of view of XMPP, certainly, but 
it's interesting from a more philosophical protocol design kind of 
way. Which could be pointless, but may not be.

Dave.
-- 
Dave Cridland - mailto:dave at cridland.net - xmpp:dwd at jabber.org
  - acap://acap.dave.cridland.net/byowner/user/dwd/bookmarks/
  - http://dave.cridland.net/
Infotrope Polymer - ACAP, IMAP, ESMTP, and Lemonade