[JDEV] Problem with accentuation...

Daniel Veillard veillard at redhat.com
Thu Mar 15 16:59:47 CST 2001


On Thu, Mar 15, 2001 at 06:13:12PM +0000, Michael Wilson wrote:
> Daniel Veillard wrote:
> > On Thu, Mar 15, 2001 at 01:24:44PM +0000, Michael Wilson wrote:
> >> I still don't know why Jabber insists on unescaping all character
> >> entities though.
> > 
> > Unless I misunderstood, there is a set of good reasons:
> >     - you don't want to have to load and use a DTD to check those
> >       entities (yes it's a major pain on top of SAX and would definitely
> >       be too costly on the servers !).
> 
> If entities aren't specified, why are they being parsed at all? Something
> like &blah; should just look like any other piece of text shouldn't it?

  No !  Because this is XML, and they fell under the production 
[68] EntityRef of the spec 
  http://www.w3.org/TR/REC-xml#NT-EntityRef
and not
[14] CharData
  http://www.w3.org/TR/REC-xml#NT-CharData

  Since this is XML, this is not like any other piece of text !
That entity could refernce a 200pages document, the parser just don't know !

> I definitely agree that there's no call for them at the program level,
> but I see no reason why I shouldn't be able to send HTML snippets
> containing entities in my messages without them mysteriously mutating.

  Because the parser see an entity reference, ask the value
for it, which can't be found (there is no DTD !) and the processing
goes on but it was definitely not character data which were found !
What the server does if there is an undefined character reference
is application dependant, seems in this case it simply drops it, the
simplest solution.

Daniel

-- 
Daniel Veillard      | Red Hat Network http://redhat.com/products/network/
veillard at redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/




More information about the JDev mailing list