[jdev] Re: Splitting the stream

Matthias Wimmer m at tthias.eu
Wed Nov 1 14:51:10 CST 2006


Michal 'vorner' Vaner schrieb:
> Well, I just think I do not need to _parse_ it if I'm not interested in
> the information there. I only want to split it to parts and feed that to
> different program.

As Alexander already said: I think what you plan to do (you wrote in a
private mail, that you plan to use regular expressions) is parsing as well.

I am not yet sure how you plan to write your regular expressions, but
you should not be able to write a regular expression, that matches any
valid stanza.
With regular expressions you can only express regular languages
(Chompsky type 3). But with the pumping lemma for regular languages, you
can show that the set of all valid stanzas is no regular language.

> I need to make it many small independent programs. Which I think nobody
> yet did. It is just an experiment, maybe it wont work at all, it is
> possible something very flexible may become of it, I just do not know.

What do you plan to do if the component splitted an invalid stanza and
forwarded it to the small program. I think that might cause you problems
as well. You just not just expect, that all you get from a server is
valid. This might make you more vulnerable to attacks.

The overhead you have by using a proven XML parser should not be very
much, and you can be sure, that other people already spent their
thinking in handling all special cases that can occur in XML documents.
E.g. with a native approach in finding the start and end tags you might
get confused, if you receive something like the following: <message
from='foo at example.com' to='bar at example.com' id='123"/>'>... and others.

How do you plan to route the stanzas to the small programs? I guess you
will need some information out of the stanzas for that as well, no? If
you'd parse the stream using an XML parser and generate a DOM- or
DOM-like document, your programs could then use XPath expressions to
subscribe to the stanzas they are interested in, which might be very
handy as well.


Tot kijk
    Matthias

-- 
Matthias Wimmer      Fon +49-700 77 00 77 70
Züricher Str. 243    Fax +49-89 95 89 91 56
81476 München        http://ma.tthias.eu/



More information about the JDev mailing list