[JDEV] XML Conformance

Daniel Veillard veillard at redhat.com
Thu Jan 17 03:17:37 CST 2002


On Wed, Jan 16, 2002 at 03:57:27PM -0500, Julian Missig wrote:
> People have repeatedly brought up on JDEV the issue of Jabber's XML 
> conformance. I just wanted to make two quick notes about it. ALL 
> DISCUSSION SHOULD BE CONTINUED *ONLY* ON THE STANDARDS-JIG LIST. This 
> mail is being cc'd to jdev because quite a few jdev members who have 
> brought up these issues are unaware of the standards JIG.

  And you think shouting will help get the people who didn't know
volunteer to do more work ... hum doesn't work that way :-(

> First off, the id attribute: the id *MUST* start with an alphabetic 
> character, but can contain numbers after that.
> Reference: XML 1.0 Recommendation:
> "Values of type ID must match the Name production. A name must not 
> appear more than once in an XML document as a value of this type; i.e., 
> ID values must uniquely identify the elements which bear them."
> The definition of Name:
> Name    ::=    (Letter | '_' | ':') (NameChar)*
> definition of NameChar:
> NameChar    ::=    Letter | Digit | '.' | '-' | '_' | ':' | 
> CombiningChar | Extender
> definition of Letter: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Letter
> So therefore, ids may start with a letter, an underscore, or a colon, 
> and then have all the numbers your pretty little heart desires. However, 
> '2' is not a valid id.

  NOTE: this is a validity error, not a well formedness error ! Does Jabber
        require validity level (and in this case where is the DTD because
	there is no way in hell you have the right to define a validity
	constraints if you don't have a DTD >:-> . I found some at
	http://www.saint-andre.com/jabber/dtds/ but not on the Jabber site,
	so I'm tempted to say so far Jabber conformance didn't require
	validity conformance. And without DTD loading you have no way
	(unless you start adding internal subset, but I doubt anybody wants
	to go that route) to know that something is an ID.

  I'm all for clarity and cleannes w.r.t. the specifications, but make sure
you understand them really well before building rules on top of them.

> Second, namespaces. Contrary to what some people believe, Jabber's usage 
> of namespaces conforms with the specification. <x> and <query> are 
> actually a parent element of everything within in the same namespace. 
> Schemas will conform with this statement. The "problem" is that current 
> Jabber implementations do not fully support namespaces via Qualified 
> Names. (Such as <last:query xmlns:last="jabber:iq:last"> and then being 
> able to use last: thereafter) - However, there is NOTHING WRONG with 
> Jabber being even more restrictive than the XML Namespaces 
> Recommendation. I feel that we should continue to enforce the fact that 
> jabber:x: and jabber:iq: namespaces within jabber:client are only 
> allowed in certain places (<x> within <message> and <presence>, <query> 
> within <iq> and so on). If the protocol remains strict here, Jabber 
> implementations will not have as much to compensate for and can be much 
> better optimized. It's also much easier to program when you expect 
> namespaces to always use certain element names in certain places. Again 
> I stress that this does not break the XML Namespaces Recommendation in 
> any fashion, we are simply adding additional restrictions to Jabber.

  Right, but adding this restriction can pose a serious problem, depending
on the tools you are using. It all depends what conformance level you
want to add to this. If Jabber requires it then any client based
on a DOM api may have troubles. The DOM serializer may remap the prefix,
more precisely when one create a node only the namespace name and the
node CNAME are really binding, the prefixes may be "cleaned up" when
the documents/nodes are serialized. So this approach *can* have drawbacks.
And if you think that the documents are better produced with just a bunch
of printf() you should not get surprized if clients tends to break the
encoding rules as soon as characters outside of the ascii range starts
to get used.
  Contrary to what you seems to think this kind of decisions have a cost,
and may actually result in having poorly conformants clients. Not something
I would argue as a Good Thing...

Daniel

-- 
Daniel Veillard      | Red Hat Network https://rhn.redhat.com/
veillard at redhat.com  | libxml Gnome XML XSLT toolkit  http://xmlsoft.org/
http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/



More information about the JDev mailing list