[JDEV] Binary XML useful for Jabber?

David Waite dwaite at jabber.com
Tue May 22 19:39:49 CDT 2001


Jens Alfke wrote:

> I found the W3C spec for binary XML:
>
> http://www.w3.org/TR/wbxml/
>
> I've only spent a few minutes skimming it; here are my findings:
>
> * It is not hardwired to any particular DTD. It can be used for any XML
> document and preserves the full semantics of XML.
> * Most tag and attribute names get tokenized to single bytes. A set of token
> IDs can be defined for a particular DTD to avoid having to define them in
> the token table in every document. This clearly offers very high
> compression.
> * It's definitely possible for a particular document to include its own
> string table to define additional tokens.
> * It appears possible to define tokens inline, which would allow you to use
> a particular tag or attribute name without having to predeclare it at the
> start of the stream (but since the name has to appear inline every time it's
> used, you don't save any space.)

I don't think this last item is there. the LITERAL token (for elements which
are not in the DTD) reports an offset in the string table, meaning it needs to
be defined beforehand and not inline.

You would either need to make an altered standard allowing this, or have each
element in the stream be a separate document, so that you can declare extra
elements beforehand. If the goal is to save as much space as possible, I would
recommend an altered standard so that declarations of new namespaces can be
kept over the course of the entire session.

> I think this is definitely worth considering for Jabber. It should allow us
> to make the stream data much, much smaller and considerably simplify
> parsing.
>

I really doubt this would simplify parsing (either in terms of execution speed
or in Lines of Code). If you didn't "decompress" the binary format before
sending it into Jabber, it would require substantial changes which would pretty
much encompass every line of code. You would also need to retain much more
state on the session in order to be able to convert to the binary format,
especially if you had an 'evolving' dictionary; that would increase memory
usage per user.

However, there is still plenty of merit in trying this, and it would probably
be something that someone providing IM to a ton of users would love, because it
would greatly reduce their operating cost due to bandwidth.

-David Waite





More information about the JDev mailing list