[standards-jig] Re: [JDEV] XML Conformance

Thu Jan 17 08:46:03 CST 2002

Daniel Veillard wrote:
*SNIP*

/me admits defeat and crossposts

> 
>>First off, the id attribute: the id *MUST* start with an alphabetic 
>>character, but can contain numbers after that.
>>Reference: XML 1.0 Recommendation:
>>"Values of type ID must match the Name production. A name must not 
>>appear more than once in an XML document as a value of this type; i.e., 
>>ID values must uniquely identify the elements which bear them."
>>The definition of Name:
>>Name    ::=    (Letter | '_' | ':') (NameChar)*
>>definition of NameChar:
>>NameChar    ::=    Letter | Digit | '.' | '-' | '_' | ':' | 
>>CombiningChar | Extender
>>definition of Letter: http://www.w3.org/TR/2000/REC-xml-20001006#NT-Letter
>>So therefore, ids may start with a letter, an underscore, or a colon, 
>>and then have all the numbers your pretty little heart desires. However, 
>>'2' is not a valid id.
>>
> 
>   NOTE: this is a validity error, not a well formedness error ! Does Jabber
>         require validity level (and in this case where is the DTD because
> 	there is no way in hell you have the right to define a validity
> 	constraints if you don't have a DTD >:-> . I found some at
> 	http://www.saint-andre.com/jabber/dtds/ but not on the Jabber site,
> 	so I'm tempted to say so far Jabber conformance didn't require
> 	validity conformance. And without DTD loading you have no way
> 	(unless you start adding internal subset, but I doubt anybody wants
> 	to go that route) to know that something is an ID.
> 
>   I'm all for clarity and cleannes w.r.t. the specifications, but make sure
> you understand them really well before building rules on top of them.
> 

Right now, there's nothing to base validity off of, but we do plan on 
eventually having schemas some day, which is why I brought it up.

> 
>>Second, namespaces. Contrary to what some people believe, Jabber's usage 
>>of namespaces conforms with the specification. <x> and <query> are 
>>actually a parent element of everything within in the same namespace. 
>>Schemas will conform with this statement. The "problem" is that current 
>>Jabber implementations do not fully support namespaces via Qualified 
>>Names. (Such as <last:query xmlns:last="jabber:iq:last"> and then being 
>>able to use last: thereafter) - However, there is NOTHING WRONG with 
>>Jabber being even more restrictive than the XML Namespaces 
>>Recommendation. I feel that we should continue to enforce the fact that 
>>jabber:x: and jabber:iq: namespaces within jabber:client are only 
>>allowed in certain places (<x> within <message> and <presence>, <query> 
>>within <iq> and so on). If the protocol remains strict here, Jabber 
>>implementations will not have as much to compensate for and can be much 
>>better optimized. It's also much easier to program when you expect 
>>namespaces to always use certain element names in certain places. Again 
>>I stress that this does not break the XML Namespaces Recommendation in 
>>any fashion, we are simply adding additional restrictions to Jabber.
>>
> 
>   Right, but adding this restriction can pose a serious problem, depending
> on the tools you are using. It all depends what conformance level you
> want to add to this. If Jabber requires it then any client based
> on a DOM api may have troubles. The DOM serializer may remap the prefix,
> more precisely when one create a node only the namespace name and the
> node CNAME are really binding, the prefixes may be "cleaned up" when
> the documents/nodes are serialized. So this approach *can* have drawbacks.
> And if you think that the documents are better produced with just a bunch
> of printf() you should not get surprized if clients tends to break the
> encoding rules as soon as characters outside of the ascii range starts
> to get used.
>   Contrary to what you seems to think this kind of decisions have a cost,
> and may actually result in having poorly conformants clients. Not something
> I would argue as a Good Thing...
>

Hrmm, I see what you're saying. So you feel there's nothing wrong with 
requiring clients and servers to support prefixes? I'm just under the 
impression that things like palm clients will have difficulty with that, 
  moreso than it would be difficult to require they not be used... 
Correct me if I'm wrong ;)

Are there really that many XML Namespaces-conforming parsers which will randomly
start using prefixes which cannot be turned off? From my point of view, it's more
difficult for many parts of jabber to support prefixes. Servers will have to keep
track of all the prefixes a client uses... because a client could declare a prefix
when sending to one person and use it in a message to another, so the server will
have to tack the xmlns:prefix definition on the packets sent to other people.
Prefixes also hurt stream compression and any routing optimizations since the tag
names could be virtually anything (There has already been some work on routing
optimization since the basic structure of most Jabber packets is very similar - it
would be somewhat negated by having to support namespace prefixes, and I get the
feeling that SSL compression won't be as effective when prefixes are changing). So
from my point of view it seems easier and more efficient to not allow prefixes. I
know in Gabber it will take a major overhaul of Jabberoo and Gabber to properly
support prefixes. Also, are XML Namespace-conforming parsers available everywhere we
have Jabber clients right now? I know it could be argued that the existing XML
parsers should be made XML Namespace-conforming, but it's an additional argument
against it in Jabber.

No, I don't think printf() is the way to go ;) - I do want conforming parsers, but it
may be possible for us to get away with not requiring a fully Namespace-conforming
parser...

Julian
-- 
email: julian at jabber.org
jabber:julian at jabber.org