[JDEV] Java XML Parsers

Al Sutton al at personalbuddy.com
Sun Dec 30 07:49:21 CST 2001


Daniel,

I'm basing my knowlege on the W3C XML 1.0 Spec as published on the 10th
of Feb 1998.

The spec talks about an XML processor and only mentions parsers in
passing towards the end of the document. The last paragraph of the
introduction reads

"A software module called an XML processor is used to read XML documents
and provide access to their content and structure. It is assumed that an
XML processor is doing its work on behalf of another module, called the
application. This specification describes the required behavior of an
XML processor in terms of how it must read XML data and the information
it must provide to the application.".

I can understand how you see the behaviour of my application as broken
if you are testing it against the specs requirements for an XML
processor, but I'm not claiming it's an XML processor, what I am
claiming is that it's a simplistic system which parses XML documents and
provides a mechanism to access purely the data content (i.e. tags and
character data).

The issue of case sensitivity on tags comes from the parsers original
use in system where the original data had be constructed in a case
insensitive way (I know this is against the spec, but it did happen in a
real world situation). I have uploaded the version of the parser which
can be forced to obey case sensitivity, but I have a couple of users who
prefer having the tags converted to lower case before they are passed to
the application (Again, I know this is against the spec, but as my
development work is paid for by donations I am not in a strong position
to say no).

I have had a couple of queries about support encodings other than ASCII
and UTF-8, but I have referred them to projects such as Xerces in order
to allow my parser to remain compact and simplistic. I rely on the user
correctly setting up the data encoding on their InputStream or Reader 
objects before passing it to my library to deal with other encodings.

I hope this helps you understand where I'm comming from.

Al.



On Sun, 2001-12-30 at 13:03, Daniel Veillard wrote:
> On Sun, Dec 30, 2001 at 12:03:06PM +0000, Al Sutton wrote:
> > Daniel,
> 
>   Al,
> 
> > I think you may be a little confused. I think you'll find that there are
> 
>   Sorry, no, I don't think I am.
> 
> > specs for SAX and DOM parsers for XML, but XML itself is (or at least
> > was originally) purely a data representation format, and as such didn't
> 
>   Right but the spec includes a lot of points that an XML parser MUST
> respect to be considered to be conformant to XML. Your code clearly is not
> and you should not advocate using it as an XML parser. Call it "markup
> parser" if you want but not XML parser because this is not.
> 
> > I fully accept it doesn't support processing directives (such as the
> > <?xml element which is used to detail encoding), and that enforcing all
> 
>   Which is an absolute requirement for an XML parser. How many time did
> you see messages on this list "the server disconnect because I use
> non ASCII characters" like messages ? A server based on your parser would
> not have the same behaviour as the common jabberd using expat.
> 
> > tags are pushed into lower case isn't ideal (and is something that is
> 
> It's just plain broken, sorry. The Jabber protocol is expected to be extensible
> and the extensions are driven by XML (c.f. the XML-RPC, XHTML, ones etc...),
> and all those are case sensitive because they are XML.
> 
> > Myself and others have used my parser in a number of products which
> > handle the jabber protocol and thought it may be of use to Matt.
> 
> It happen to work, to some extent. Your parser though will not generate
> the same output as something based on expat or another XML parser (seem you
> miss the CR/LF normalization which mean you will not pass the same data
> as a conformant parser for some multiline messages for example).
> 
> It's nothing personal, just one should respects the specifications when
> they happen to exist and use conformant code/products when they
> are available.
> 
> Daniel
> 
> 
> -- 
> Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
> veillard at redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
> http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
-- 
Al Sutton
Email: al at personalbuddy.com




More information about the JDev mailing list