[jdev] parsing xml (xmpp) with ruby

Eric Will rakaur at malkier.net
Wed Oct 1 13:17:19 CDT 2008


On Wed, Oct 1, 2008 at 11:49 AM, Michal 'vorner' Vaner <vorner at ucw.cz> wrote:
> You don't get it. Sax does not need to load the whole document in
> memory. But it needs some information from the parent nodes (like depth,
> namespace declarations, etc). You can't start parsing from the middle.

It was working just fine this way. I changed it though. See below.

> That is the „more mess" I talk about. You need to set up the parser so
> it does not expect to reach the end of document and will wait for next
> data feed.

I could not do this. I had to change the REXML classes to allow me to
change their source (well, add to the same source). These methods did
not exist, so I extended their classes. Now each stream only creates
one parser ever, and adds on to it's internal buffer. The buffer
contains the items that haven't been processed (i.e., it removes them
from the buffer as it consumes them). If there's something already in
the buffer, I add to it.

> If your parser can not do something like this, then you are doomed and
> it won't work. At all (if it sometimes pretends to work, you are unlucky
> enough not to give you straight evidence it is broken).

This was exactly the case, however with my hack above in place, it
works fine. However, I still have an issue. Now when a stanza is
incomplete but well formed (i.e.: a missing end tag, or something) the
parser leaves it in its buffer, and waits for it to be added on to.
This works if the next read() (or read()s) finish that stanza. If
someone is manually sending XML, and never sends an end tag, it will
keep on adding to the buffer forever. What should I do about this? Set
a limit on the buffer? If I limit my read()s to 8192 bytes, should I
limit my parser's buffer to four or five times that? I'm not sure.
Just letting it add forever is a bad thing, as in, DoS.

The problem of receiving a half-stanza that's not well formed is still
here. In that case, it raises an exception. I only have two options,
it seems to me. One is to kick off the client, and one is to ignore
the exception and save the bad xml to the buffer and hope the next
read() fixes it. This is also a DoS problem, if that stanza never gets
fixed it'll keep raising the exception which will keep adding onto the
buffer, which will keep raising the exception...

> --
> If it works, fix it.
>
> Michal 'vorner' Vaner

-- Eric Will // rakaur --



More information about the JDev mailing list