[JDEV] GZipping Jabber Messages

Michael F. March march at indirect.com
Sun Jan 6 00:06:58 CST 2002


After doing a longer session (about 20 minutes), I am getting
about 80% in both directions now..

> Doing compression with SSH I am getting about 70% compression
> outbound and 80% compression inbound..
>
> I have not investigated how OpenSSH implements compression on
> the TCP stream though so I am not sure how great of gauge this
> is..
>
> > Update. I am finding that you can get better compression ratios, up to
> > around 57%, by maintaining the LZ dictionary between packets. Also this
> > reduces the processor hit asymptotically (but still quite nonzero) with
> > more packets sent along.
> >
> > This technique raises still other problems, though, most notably
> > reliability. For this to work the gzip deflater on one end and the
> inflater
> > on the other end must remain exactly in sync for the duration of the
> > connection (hours, days, ...). An error in the compressed stream would
be
> > magnified many times over in the inflated stream. So for reliability you
> > had better hash or at least checksum all the data going across. That
means
> > you have to have an envelope format.
> >
> > So for bandwidth and processor usage, this does a lot better than I
> > expected compared to my original run, but now we are just a few steps
away
> > (credential verification, key exchange, and stream encryption) from
> > re-doing SSL.
> >
> > -Mike
> >
> > ----- Forwarded by Michael F Lin/Cambridge/IBM on 01/05/2002 11:38
> AM -----
> >
> >                       Michael F Lin
> >                                                To:      jdev at jabber.org
> >                       01/04/2002 09:26         cc:
> >                       PM                       From:    Michael F
> Lin/Cambridge/IBM at IBMUS
> >                                                Subject: Re: [JDEV]
> GZipping Jabber Messages(Document link: Michael Lin)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hi Adam, I looked over some of the DotGNU mailing list archives at the
> > discussion you are referring to.
> >
> > One person from DotGNU says
> > ---
> > At the end of the day, it is easier to just gzip it and forget about
> > the problem.  No data loss, and roughly the same level of
> > compaction.  Highly redundant data like XML compresses
> > very well.  For example, the 6 Mb All.xml file for the C#
> > library specification compresses to ~630k using gzip: about
> > 10% of the original size.
> > ---
> > I believe this is misleading in the context of realtime XML streams
(e.g.
> > Jabber; SOAP; presumably, whatever DotGNU will use) because you are not
> > compressing 6Mb of data at once. Rather you are compressing small
packets,
> > a few hundred bytes in length in the case of Jabber, and then
transmitting
> > them individually. I ran some tests to see how gzip performs under these
> > conditions.
> >
> > I wrote a program which generates random Jabber <message/> packets. The
> > body of each message is formed by randomly selecting between 1 and 25
> words
> > from a 10,000-word English language dictionary file. For each test
vector,
> > the program runs zlib compress, level 9, on it (equivalent [I think] to
> > gzip with maximum compression), then records the compressed size and the
> > original size. It repeats this until at least 1 million bytes of
> > uncompressed data has been processed.
> >
> > The results from about a dozen runs of this program are very consistent:
a
> > compression ratio of 17% in 7 seconds of runtime. A typical result is
> > 1,000,011 total bytes of raw data; 830,654 bytes of compressed data.
> >
> > If I comment the code to compress the test vectors, and leave the code
to
> > generate the test vectors, the program runs in less than 1 second.
> >
> > [This was run on]
> > athena% uname -a
> > SunOS department-of-alchemy.mit.edu 5.8 Generic_108528-08 sun4u sparc
> > SUNW,Ultra-60
> > athena%
> >
> > Obviously these are preliminary and nonscientific results only, and
there
> > are other factors to consider with Jabber, such as the likelihood
> > previously mentioned that the XML processing is going to be the limiting
> > factor in processor time. I find the topic quite interesting, however,
so
> I
> > am going to fiddle around with it over the next few days and see if I
can
> > get it to do better with custom deflate dictionaries and such. Hopefully
I
> > will even find time to write something on the topic and post it with my
> > source code. However, based on these initial results I am very wary of
> > gzipping instant messaging XML because of the apparent high processing
> cost
> > and mediocre compression ratio. I will continue to test but my
hypothesis
> > is that gzip or any generic compression algorithm is going to be very
> > mediocre for Jabber as instant messaging.
> >
> > -Mike
> >
> >
> >
> >
> >                       Adam Theo
> >                       <adamtheo at theoret        To:       jdev
> <jdev at jabber.org>
> >                       ic.com>                  cc:
> >                       Sent by:                 Subject:  [JDEV] GZipping
> Jabber Messages
> >                       jdev-admin at jabber
> >                       .org
> >
> >
> >                       01/04/2002 03:32
> >                       PM
> >                       Please respond to
> >                       jdev
> >
> >
> >
> >
> >
> > Hi, all. There's a good discussion going on over at the DotGNU Developer
> > list about gzip'ing the XML that is transmitted around on the DotGNU
> > platform.
> >
> > Was wondering if it would be possible to incorporate the same thing for
> > future versions of the Jabber server? Is it feasible, anyway? They are
> > saying the trade-offs for extra resource consumption would not be bad at
> > all if designed into the server properly, and would reduce bandwidth
> > very dramatically (like by 80%, i think). This would be useful for
> > high-volume servers with enough processing power, i think...
> > --
> >     /\    -- Adam Theo, Age 22, Tallahassee FL USA --
> >    //\\   Theoretic Solutions (http://www.theoretic.com)
> >   /____\    "Software, Internet Services and Advocacy"
> > /--||--\ Personal Website (http://www.theoretic.com/adamtheo)
> >     ||    Jabber Open IM (http://www.jabber.org)
> >     ||    Email & Jabber: adamtheo at theoretic.com
> >     ||    AIM: AdamTheo2000   ICQ: 3617306   Y!: AdamTheo2
> >   "A free-market socialist computer geek patriotic American buddhist."
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> >
> >
> >
> >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
>




More information about the JDev mailing list