[jdev] parsing xml (xmpp) with ruby
Michal 'vorner' Vaner
vorner at ucw.cz
Wed Oct 1 10:49:27 CDT 2008
Hello
On Wed, Oct 01, 2008 at 11:33:44AM -0400, Eric Will wrote:
> On Wed, Oct 1, 2008 at 11:15 AM, Michal 'vorner' Vaner <vorner at ucw.cz> wrote:
>
> > If you take <stream thenamespace etc><first stanza/> and put it into
> > first parser and then <second stanza/><third stanza> to second and
> > </thind stanza> into another, then you get mess and not data. Or do you
> > reuse it in some other way I do not get?
>
> I'm using a SAX parser. It doesn't care about the structure of the
> overall document. I build the nodes by myself, a tag at a time.
You don't get it. Sax does not need to load the whole document in
memory. But it needs some information from the parent nodes (like depth,
namespace declarations, etc). You can't start parsing from the middle.
> > When a stanza gets split into two chunks, you get even more mess.
>
> I handle this at the moment, but not in the best way. When my parser
> gets to a partial stanza it reads and processes up to the partial
> part, it does one of two bad things. The first one is when i get half
> a tag or something, and it raises an exception saying it's invalid
> XML. The second one is when it lands in the middle of an open tag, but
> everything is well-formed, but there's no closing tag. In this case it
> parses as far as it can, but without closing tags (which is where I
> fire my events) it doesn't DO anything, so it appears to ignore it...
> I'm not sure how to fix this.
That is the „more mess“ I talk about. You need to set up the parser so
it does not expect to reach the end of document and will wait for next
data feed.
> > This is my code when data come. It is C++ and Qt, but you might see:
> >
> > source.setData( text );
> > reader.parseContinue();
>
> REXML doesn't have this. There's no way to change the source except to
> make a new parser instance.
I do not change the source. I just fill the source with more data and
tell the parser it can continue. reader is the parser.
If your parser can not do something like this, then you are doomed and
it won't work. At all (if it sometimes pretends to work, you are unlucky
enough not to give you straight evidence it is broken).
--
If it works, fix it.
Michal 'vorner' Vaner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <https://www.jabber.org/jdev/attachments/20081001/2717a085/attachment-0003.pgp>
More information about the JDev
mailing list