[JDEV] Jabber DOM Proposal

Fri Apr 30 16:34:45 CDT 1999

> I have been doing some serious analysis of the current DOM
> (jpair/xpt/xpt_pool) combo, and the W3C recommendation of the DOM. I've also
> been looking at etherx and the jabber transport, trying to understand how
> they use all of the nifty data structures.

Wonderbuns!

> Regarding the current DOM, it's unbelievably close to the actual W3C
> [smush]
> Things to keep in mind (will eventually be a proper introduction)
> 1.) The Jabber protocol is based on a XML *subset*. As such, the W3C DOM
> really doesn't apply since we don't support all of XML anyway

Correct.  Jabber is 100% XML but doesn't utilize all the fringe parts of
the XML spec, as they are not needed for a protocol format and are more
geared for a document type of context.

> 2.) The Jabber DOM is more interested in organizing the contents of the XML
> packet than in keeping the contents of packets in sequence. Parent-child
> relationships are preserved, but the order of multiple child tags (with the
> same name)within a parent is *not*.

Yup.

> DOM Public API types & functions
> --------------------------------------------------------------
> The Jabber DOM shall provide the following opaque data types:
> 1.) Tag: represents a XML tag; may have sub-tags, attributes, and one (1)
> datum
> 2.) Attribute : represents the XML attribute(of the form <name>=<value>)
> associated with a tag
> 3.) Datum: represents character data stored between tags
> 
> To traverse the DOM, the following operations are provided:
> 1.) hasTag(Tag t, String name) : Integer
>     Desc: Determines if <t> has any subtags which match <name>; returns
> number of matching tags
> 2.) hasAttribute(Tag t, String name) : Boolean
>     Desc: Determines if <t> has any attributes with name of <name>
> 3.) hasDatum(Tag t) : Boolean
>     Desc: Determines if <t> has any character data

These aren't really critical, but they fill out the dessert tray nicely :)

> 4.) getTag(Tag t, String name) : Tag
>     Desc: Attempt to retrieve first subtag in <t> which matches <name>;
> returns NULL if none are found
> 5.) getNextTagSibling(Tag t) : Tag
>     Desc: Returns any following tags (of same name, at this level in DOM
> tree); returns NULL if none exist
> 6.) getPrevTagSibling(Tag t) : Tag
>     Desc: Returns any previous tags (of same name, at this level in DOM
> tree); returns NULL if none exist

getTag() will be very handy!

> 7.) getTagName(Tag t) : String
>     Desc: Returns name of a tag
> 8.) getTagDatum(Tag t) : Pointer
>     Desc: Returns pointer to tag <t>'s datum; *not* null terminated
> 9.) getTagDatumSz(Tag t) : Integer
>     Desc: Returns length of tag <t>'s datum segment

I'm curious about this, why wouldn't you just null terminate the string
and avoid the getTagDatumSz method?  Also, no biggie, but in XML worlds
the strings are usually called cdata... would it be more
consistent/understandable to refer to them the same way?

> 10.) getAttribute(Tag t, String name) : String
>     Desc: Returns value of tag <t>'s attribute by name of <name>
> 11.) putAttribute(Tag t, String name, String value) : void
>     Desc: Adds/replaces attribute <name> with <value> on tag <t>
> 12.) addTag(Tag parent, Tag child): void
>     Desc: Adds <child> as subtag to <parent>; does *not* replace existing
> tags
> 13.) addDatum(Tag t, Pointer datum, Integer datum_sz) : void
>     Desc: Appends <datum> to end of <t>'s existing datum; increments <t>'s
> datum size accordingly
> 14.) deleteTag(Tag t) : void
>     Desc: Releases <t>, including all attributes, children and datum; use
> with care
> 15.) deleteAttribute(Tag t, String name)
>     Desc: Releases attribute <name> associated with <t>

Yummy!

> DOM Internal representations
> --------------------------------------------------------------
> The Jabber DOM shall use the following internal data structures for the
> representation of parsed XML:
> 1.) Node = the equivalent of a XML tag; contains:
>     1.2) Value : String

Are we missing?:
      1.1) Name : String
*g*

>     1.3) Attribs : AttribTree
>     1.4) Children : NodeTree
>     1.5) NextSibling : Node
>     1.6) PrevSibling : Node
> 
> 2.) Attrib = the equivalent of a XML tag attribute; contains:
>     2.1) Name : String
>     2.2) Value : String
> 
> 3.) AttribTree = a balanced binary tree (AVL, probably) contains Attribs
> keyed by Attrib.Name

I highly doubt we need this... we only have a couple of attributes at most
on any tag, doing this extra work here would be a shame :)

> 
> 4.) NodeList = a unordered linked list of Nodes which all have the same
> name; contains:
>     4.1) Name : String
>     4.2) Nodes : Linked List
> 
> 4.) NodeTree = a balanced binary tree (AVL, again) containing NodeLists;
> keyed by the NodeList.Name

Again, the only place where we might have a ton of tags that might need
this would be in the roster packets which will account for a very very
low percentrage of the overall packet count. I'm no AVL expert, will it
help much/any for simple packets like message or status?

> [smush]
> current DOM (jpair/xpt/xpt_pool). However, I feel it is *much* more cohesive
> and maps closer to the actual format of the data. This is key to developing
> a good client library that is flexible and useful. :) I also feel quite
> strongly that the tradeoffs in additional memory consumption is well worth
> the ability to search and process large packets (should they ever occur).

I do like it(quite a bit in fact), and it would make a client
development(and internal stuff)  much easier to understand :) 

> Let me know what you think. :)

So when are you going to be checking it in?  Hehe :)

Jer