[JDEV] GZipping Jabber Messages
Michael F. March
march at indirect.com
Sun Jan 6 13:13:43 CST 2002
More info:
I captured XML from a 24 hr Jabber session and the XML from
that session was 179601 bytes and it compressed down to
6966 bytes.
>
> I was port forwarding a Jabber session...
>
>
> > What is the nature of the data you are transferring?
> >
> >
> > After doing a longer session (about 20 minutes), I am getting
> > about 80% in both directions now..
> >
> > > Doing compression with SSH I am getting about 70% compression
> > > outbound and 80% compression inbound..
> > >
> > > I have not investigated how OpenSSH implements compression on
> > > the TCP stream though so I am not sure how great of gauge this
> > > is..
> > >
> > > > Update. I am finding that you can get better compression ratios, up
to
> > > > around 57%, by maintaining the LZ dictionary between packets. Also
> this
> > > > reduces the processor hit asymptotically (but still quite nonzero)
> with
> > > > more packets sent along.
> > > >
> > > > This technique raises still other problems, though, most notably
> > > > reliability. For this to work the gzip deflater on one end and the
> > > inflater
> > > > on the other end must remain exactly in sync for the duration of the
> > > > connection (hours, days, ...). An error in the compressed stream
would
> > be
> > > > magnified many times over in the inflated stream. So for reliability
> > you
> > > > had better hash or at least checksum all the data going across. That
> > means
> > > > you have to have an envelope format.
> > > >
> > > > So for bandwidth and processor usage, this does a lot better than I
> > > > expected compared to my original run, but now we are just a few
steps
> > away
> > > > (credential verification, key exchange, and stream encryption) from
> > > > re-doing SSL.
> > > >
> > > > -Mike
> > > >
> > > > ----- Forwarded by Michael F Lin/Cambridge/IBM on 01/05/2002 11:38
> > > AM -----
> > > >
> > > > Michael F Lin
> > > > To:
> jdev at jabber.org
> > > > 01/04/2002 09:26 cc:
> > > > PM From: Michael F
> > > Lin/Cambridge/IBM at IBMUS
> > > > Subject: Re: [JDEV]
> > > GZipping Jabber Messages(Document link: Michael Lin)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi Adam, I looked over some of the DotGNU mailing list archives at
the
> > > > discussion you are referring to.
> > > >
> > > > One person from DotGNU says
> > > > ---
> > > > At the end of the day, it is easier to just gzip it and forget about
> > > > the problem. No data loss, and roughly the same level of
> > > > compaction. Highly redundant data like XML compresses
> > > > very well. For example, the 6 Mb All.xml file for the C#
> > > > library specification compresses to ~630k using gzip: about
> > > > 10% of the original size.
> > > > ---
> > > > I believe this is misleading in the context of realtime XML streams
> > (e.g.
> > > > Jabber; SOAP; presumably, whatever DotGNU will use) because you are
> not
> > > > compressing 6Mb of data at once. Rather you are compressing small
> > packets,
> > > > a few hundred bytes in length in the case of Jabber, and then
> > transmitting
> > > > them individually. I ran some tests to see how gzip performs under
> > these
> > > > conditions.
> > > >
> > > > I wrote a program which generates random Jabber <message/> packets.
> The
> > > > body of each message is formed by randomly selecting between 1 and
25
> > > words
> > > > from a 10,000-word English language dictionary file. For each test
> > vector,
> > > > the program runs zlib compress, level 9, on it (equivalent [I think]
> to
> > > > gzip with maximum compression), then records the compressed size and
> > the
> > > > original size. It repeats this until at least 1 million bytes of
> > > > uncompressed data has been processed.
> > > >
> > > > The results from about a dozen runs of this program are very
> > consistent:
> > a
> > > > compression ratio of 17% in 7 seconds of runtime. A typical result
is
> > > > 1,000,011 total bytes of raw data; 830,654 bytes of compressed data.
> > > >
> > > > If I comment the code to compress the test vectors, and leave the
code
> > to
> > > > generate the test vectors, the program runs in less than 1 second.
> > > >
> > > > [This was run on]
> > > > athena% uname -a
> > > > SunOS department-of-alchemy.mit.edu 5.8 Generic_108528-08 sun4u
sparc
> > > > SUNW,Ultra-60
> > > > athena%
> > > >
> > > > Obviously these are preliminary and nonscientific results only, and
> > there
> > > > are other factors to consider with Jabber, such as the likelihood
> > > > previously mentioned that the XML processing is going to be the
> > limiting
> > > > factor in processor time. I find the topic quite interesting,
however,
> > so
> > > I
> > > > am going to fiddle around with it over the next few days and see if
I
> > can
> > > > get it to do better with custom deflate dictionaries and such.
> > Hopefully
> > I
> > > > will even find time to write something on the topic and post it with
> my
> > > > source code. However, based on these initial results I am very wary
of
> > > > gzipping instant messaging XML because of the apparent high
processing
> > > cost
> > > > and mediocre compression ratio. I will continue to test but my
> > hypothesis
> > > > is that gzip or any generic compression algorithm is going to be
very
> > > > mediocre for Jabber as instant messaging.
> > > >
> > > > -Mike
> > > >
> > > >
> > > >
> > > >
> > > > Adam Theo
> > > > <adamtheo at theoret To: jdev
> > > <jdev at jabber.org>
> > > > ic.com> cc:
> > > > Sent by: Subject: [JDEV]
> > GZipping
> > > Jabber Messages
> > > > jdev-admin at jabber
> > > > .org
> > > >
> > > >
> > > > 01/04/2002 03:32
> > > > PM
> > > > Please respond to
> > > > jdev
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi, all. There's a good discussion going on over at the DotGNU
> > Developer
> > > > list about gzip'ing the XML that is transmitted around on the DotGNU
> > > > platform.
> > > >
> > > > Was wondering if it would be possible to incorporate the same thing
> for
> > > > future versions of the Jabber server? Is it feasible, anyway? They
are
> > > > saying the trade-offs for extra resource consumption would not be
bad
> > at
> > > > all if designed into the server properly, and would reduce bandwidth
> > > > very dramatically (like by 80%, i think). This would be useful for
> > > > high-volume servers with enough processing power, i think...
> > > > --
> > > > /\ -- Adam Theo, Age 22, Tallahassee FL USA --
> > > > //\\ Theoretic Solutions (http://www.theoretic.com)
> > > > /____\ "Software, Internet Services and Advocacy"
> > > > /--||--\ Personal Website (http://www.theoretic.com/adamtheo)
> > > > || Jabber Open IM (http://www.jabber.org)
> > > > || Email & Jabber: adamtheo at theoretic.com
> > > > || AIM: AdamTheo2000 ICQ: 3617306 Y!: AdamTheo2
> > > > "A free-market socialist computer geek patriotic American
buddhist."
> > > >
> > > > _______________________________________________
> > > > jdev mailing list
> > > > jdev at jabber.org
> > > > http://mailman.jabber.org/listinfo/jdev
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > jdev mailing list
> > > > jdev at jabber.org
> > > > http://mailman.jabber.org/listinfo/jdev
> > >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> >
> >
> >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
>
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
More information about the JDev
mailing list