[JDEV] GZipping Jabber Messages
Michael F Lin
MFLIN at us.ibm.com
Sun Jan 6 00:26:48 CST 2002
What is the nature of the data you are transferring?
-Mike
"Michael F.
March" To: <jdev at jabber.org>
<march at indirect.c cc:
om> Subject: Re: [JDEV] GZipping Jabber Messages
Sent by:
jdev-admin at jabber
.org
01/06/2002 01:06
AM
Please respond to
jdev
After doing a longer session (about 20 minutes), I am getting
about 80% in both directions now..
> Doing compression with SSH I am getting about 70% compression
> outbound and 80% compression inbound..
>
> I have not investigated how OpenSSH implements compression on
> the TCP stream though so I am not sure how great of gauge this
> is..
>
> > Update. I am finding that you can get better compression ratios, up to
> > around 57%, by maintaining the LZ dictionary between packets. Also this
> > reduces the processor hit asymptotically (but still quite nonzero) with
> > more packets sent along.
> >
> > This technique raises still other problems, though, most notably
> > reliability. For this to work the gzip deflater on one end and the
> inflater
> > on the other end must remain exactly in sync for the duration of the
> > connection (hours, days, ...). An error in the compressed stream would
be
> > magnified many times over in the inflated stream. So for reliability
you
> > had better hash or at least checksum all the data going across. That
means
> > you have to have an envelope format.
> >
> > So for bandwidth and processor usage, this does a lot better than I
> > expected compared to my original run, but now we are just a few steps
away
> > (credential verification, key exchange, and stream encryption) from
> > re-doing SSL.
> >
> > -Mike
> >
> > ----- Forwarded by Michael F Lin/Cambridge/IBM on 01/05/2002 11:38
> AM -----
> >
> > Michael F Lin
> > To: jdev at jabber.org
> > 01/04/2002 09:26 cc:
> > PM From: Michael F
> Lin/Cambridge/IBM at IBMUS
> > Subject: Re: [JDEV]
> GZipping Jabber Messages(Document link: Michael Lin)
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Hi Adam, I looked over some of the DotGNU mailing list archives at the
> > discussion you are referring to.
> >
> > One person from DotGNU says
> > ---
> > At the end of the day, it is easier to just gzip it and forget about
> > the problem. No data loss, and roughly the same level of
> > compaction. Highly redundant data like XML compresses
> > very well. For example, the 6 Mb All.xml file for the C#
> > library specification compresses to ~630k using gzip: about
> > 10% of the original size.
> > ---
> > I believe this is misleading in the context of realtime XML streams
(e.g.
> > Jabber; SOAP; presumably, whatever DotGNU will use) because you are not
> > compressing 6Mb of data at once. Rather you are compressing small
packets,
> > a few hundred bytes in length in the case of Jabber, and then
transmitting
> > them individually. I ran some tests to see how gzip performs under
these
> > conditions.
> >
> > I wrote a program which generates random Jabber <message/> packets. The
> > body of each message is formed by randomly selecting between 1 and 25
> words
> > from a 10,000-word English language dictionary file. For each test
vector,
> > the program runs zlib compress, level 9, on it (equivalent [I think] to
> > gzip with maximum compression), then records the compressed size and
the
> > original size. It repeats this until at least 1 million bytes of
> > uncompressed data has been processed.
> >
> > The results from about a dozen runs of this program are very
consistent:
a
> > compression ratio of 17% in 7 seconds of runtime. A typical result is
> > 1,000,011 total bytes of raw data; 830,654 bytes of compressed data.
> >
> > If I comment the code to compress the test vectors, and leave the code
to
> > generate the test vectors, the program runs in less than 1 second.
> >
> > [This was run on]
> > athena% uname -a
> > SunOS department-of-alchemy.mit.edu 5.8 Generic_108528-08 sun4u sparc
> > SUNW,Ultra-60
> > athena%
> >
> > Obviously these are preliminary and nonscientific results only, and
there
> > are other factors to consider with Jabber, such as the likelihood
> > previously mentioned that the XML processing is going to be the
limiting
> > factor in processor time. I find the topic quite interesting, however,
so
> I
> > am going to fiddle around with it over the next few days and see if I
can
> > get it to do better with custom deflate dictionaries and such.
Hopefully
I
> > will even find time to write something on the topic and post it with my
> > source code. However, based on these initial results I am very wary of
> > gzipping instant messaging XML because of the apparent high processing
> cost
> > and mediocre compression ratio. I will continue to test but my
hypothesis
> > is that gzip or any generic compression algorithm is going to be very
> > mediocre for Jabber as instant messaging.
> >
> > -Mike
> >
> >
> >
> >
> > Adam Theo
> > <adamtheo at theoret To: jdev
> <jdev at jabber.org>
> > ic.com> cc:
> > Sent by: Subject: [JDEV]
GZipping
> Jabber Messages
> > jdev-admin at jabber
> > .org
> >
> >
> > 01/04/2002 03:32
> > PM
> > Please respond to
> > jdev
> >
> >
> >
> >
> >
> > Hi, all. There's a good discussion going on over at the DotGNU
Developer
> > list about gzip'ing the XML that is transmitted around on the DotGNU
> > platform.
> >
> > Was wondering if it would be possible to incorporate the same thing for
> > future versions of the Jabber server? Is it feasible, anyway? They are
> > saying the trade-offs for extra resource consumption would not be bad
at
> > all if designed into the server properly, and would reduce bandwidth
> > very dramatically (like by 80%, i think). This would be useful for
> > high-volume servers with enough processing power, i think...
> > --
> > /\ -- Adam Theo, Age 22, Tallahassee FL USA --
> > //\\ Theoretic Solutions (http://www.theoretic.com)
> > /____\ "Software, Internet Services and Advocacy"
> > /--||--\ Personal Website (http://www.theoretic.com/adamtheo)
> > || Jabber Open IM (http://www.jabber.org)
> > || Email & Jabber: adamtheo at theoretic.com
> > || AIM: AdamTheo2000 ICQ: 3617306 Y!: AdamTheo2
> > "A free-market socialist computer geek patriotic American buddhist."
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> >
> >
> >
> >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
>
_______________________________________________
jdev mailing list
jdev at jabber.org
http://mailman.jabber.org/listinfo/jdev
More information about the JDev
mailing list