[JDEV] GZipping Jabber Messages
mlin at mlin.net
mlin at mlin.net
Sun Jan 6 16:53:00 CST 2002
OK, I think this explains quite a bit, because even the uncompressed
bandwidth usage (200kb in 24 hours) is essentially negligible. At this
rate any appreciable amount of server bandwidth would have the capacity
for many millions of connections, and other factors (such as kernel
limitations, XML parsing, memory constraints) will limit server capacity
long before bandwidth becomes an issue. Therefore, adding compression in a
heavily strained server will actually decrease its capacity, because
internal resources (such as CPU time and memory) will be taken away to
save bandwidth, which is plentiful.
>From the cost perspective, at this rate of transfer, the cost for
bandwidth per user is also negligible. Consider that if bandwidth costs
$10/GB (this is a number from a web hosting provider, and is probably much
higher than one pays for an actual pipe), then supporting one million
concurrent users each transferring 200kb in 24 hours costs $2,000 or 0.2
cents per user. Certainly this figure decreases if your bandwidth usage
decreases, but either number is negligible when compared to the secondary
costs of supporting that many users.
The questions, then, are: (1) under what conditions is the bandwidth usage
for a client connection non-negligible? and (2) can you achieve the same
high compression ratios under these conditions?
I hypothesize that the answer to question (1) will imply that the data
being exchanged with the client is very non-repetitive and thus
non-compressible compared to the 200kb that crossed in 24 hours, and so
the answer to (2) will be no. But I will have to look into it further.
-Mike
"Michael F. March" <march at indirect.com>
Sent by: jdev-admin at jabber.org
01/06/2002 02:13 PM
Please respond to jdev
To: <jdev at jabber.org>
cc:
Subject: Re: [JDEV] GZipping Jabber Messages
More info:
I captured XML from a 24 hr Jabber session and the XML from
that session was 179601 bytes and it compressed down to
6966 bytes.
>
> I was port forwarding a Jabber session...
>
>
> > What is the nature of the data you are transferring?
> >
> >
> > After doing a longer session (about 20 minutes), I am getting
> > about 80% in both directions now..
> >
> > > Doing compression with SSH I am getting about 70% compression
> > > outbound and 80% compression inbound..
> > >
> > > I have not investigated how OpenSSH implements compression on
> > > the TCP stream though so I am not sure how great of gauge this
> > > is..
> > >
> > > > Update. I am finding that you can get better compression ratios,
up
to
> > > > around 57%, by maintaining the LZ dictionary between packets. Also
> this
> > > > reduces the processor hit asymptotically (but still quite nonzero)
> with
> > > > more packets sent along.
> > > >
> > > > This technique raises still other problems, though, most notably
> > > > reliability. For this to work the gzip deflater on one end and the
> > > inflater
> > > > on the other end must remain exactly in sync for the duration of
the
> > > > connection (hours, days, ...). An error in the compressed stream
would
> > be
> > > > magnified many times over in the inflated stream. So for
reliability
> > you
> > > > had better hash or at least checksum all the data going across.
That
> > means
> > > > you have to have an envelope format.
> > > >
> > > > So for bandwidth and processor usage, this does a lot better than
I
> > > > expected compared to my original run, but now we are just a few
steps
> > away
> > > > (credential verification, key exchange, and stream encryption)
from
> > > > re-doing SSL.
> > > >
> > > > -Mike
> > > >
> > > > ----- Forwarded by Michael F Lin/Cambridge/IBM on 01/05/2002 11:38
> > > AM -----
> > > >
> > > > Michael F Lin
> > > > To:
> jdev at jabber.org
> > > > 01/04/2002 09:26 cc:
> > > > PM From: Michael F
> > > Lin/Cambridge/IBM at IBMUS
> > > > Subject: Re: [JDEV]
> > > GZipping Jabber Messages(Document link: Michael Lin)
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi Adam, I looked over some of the DotGNU mailing list archives at
the
> > > > discussion you are referring to.
> > > >
> > > > One person from DotGNU says
> > > > ---
> > > > At the end of the day, it is easier to just gzip it and forget
about
> > > > the problem. No data loss, and roughly the same level of
> > > > compaction. Highly redundant data like XML compresses
> > > > very well. For example, the 6 Mb All.xml file for the C#
> > > > library specification compresses to ~630k using gzip: about
> > > > 10% of the original size.
> > > > ---
> > > > I believe this is misleading in the context of realtime XML
streams
> > (e.g.
> > > > Jabber; SOAP; presumably, whatever DotGNU will use) because you
are
> not
> > > > compressing 6Mb of data at once. Rather you are compressing small
> > packets,
> > > > a few hundred bytes in length in the case of Jabber, and then
> > transmitting
> > > > them individually. I ran some tests to see how gzip performs under
> > these
> > > > conditions.
> > > >
> > > > I wrote a program which generates random Jabber <message/>
packets.
> The
> > > > body of each message is formed by randomly selecting between 1 and
25
> > > words
> > > > from a 10,000-word English language dictionary file. For each test
> > vector,
> > > > the program runs zlib compress, level 9, on it (equivalent [I
think]
> to
> > > > gzip with maximum compression), then records the compressed size
and
> > the
> > > > original size. It repeats this until at least 1 million bytes of
> > > > uncompressed data has been processed.
> > > >
> > > > The results from about a dozen runs of this program are very
> > consistent:
> > a
> > > > compression ratio of 17% in 7 seconds of runtime. A typical result
is
> > > > 1,000,011 total bytes of raw data; 830,654 bytes of compressed
data.
> > > >
> > > > If I comment the code to compress the test vectors, and leave the
code
> > to
> > > > generate the test vectors, the program runs in less than 1 second.
> > > >
> > > > [This was run on]
> > > > athena% uname -a
> > > > SunOS department-of-alchemy.mit.edu 5.8 Generic_108528-08 sun4u
sparc
> > > > SUNW,Ultra-60
> > > > athena%
> > > >
> > > > Obviously these are preliminary and nonscientific results only,
and
> > there
> > > > are other factors to consider with Jabber, such as the likelihood
> > > > previously mentioned that the XML processing is going to be the
> > limiting
> > > > factor in processor time. I find the topic quite interesting,
however,
> > so
> > > I
> > > > am going to fiddle around with it over the next few days and see
if
I
> > can
> > > > get it to do better with custom deflate dictionaries and such.
> > Hopefully
> > I
> > > > will even find time to write something on the topic and post it
with
> my
> > > > source code. However, based on these initial results I am very
wary
of
> > > > gzipping instant messaging XML because of the apparent high
processing
> > > cost
> > > > and mediocre compression ratio. I will continue to test but my
> > hypothesis
> > > > is that gzip or any generic compression algorithm is going to be
very
> > > > mediocre for Jabber as instant messaging.
> > > >
> > > > -Mike
> > > >
> > > >
> > > >
> > > >
> > > > Adam Theo
> > > > <adamtheo at theoret To: jdev
> > > <jdev at jabber.org>
> > > > ic.com> cc:
> > > > Sent by: Subject: [JDEV]
> > GZipping
> > > Jabber Messages
> > > > jdev-admin at jabber
> > > > .org
> > > >
> > > >
> > > > 01/04/2002 03:32
> > > > PM
> > > > Please respond to
> > > > jdev
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > Hi, all. There's a good discussion going on over at the DotGNU
> > Developer
> > > > list about gzip'ing the XML that is transmitted around on the
DotGNU
> > > > platform.
> > > >
> > > > Was wondering if it would be possible to incorporate the same
thing
> for
> > > > future versions of the Jabber server? Is it feasible, anyway? They
are
> > > > saying the trade-offs for extra resource consumption would not be
bad
> > at
> > > > all if designed into the server properly, and would reduce
bandwidth
> > > > very dramatically (like by 80%, i think). This would be useful for
> > > > high-volume servers with enough processing power, i think...
> > > > --
> > > > /\ -- Adam Theo, Age 22, Tallahassee FL USA --
> > > > //\\ Theoretic Solutions (http://www.theoretic.com)
> > > > /____\ "Software, Internet Services and Advocacy"
> > > > /--||--\ Personal Website (http://www.theoretic.com/adamtheo)
> > > > || Jabber Open IM (http://www.jabber.org)
> > > > || Email & Jabber: adamtheo at theoretic.com
> > > > || AIM: AdamTheo2000 ICQ: 3617306 Y!: AdamTheo2
> > > > "A free-market socialist computer geek patriotic American
buddhist."
> > > >
> > > > _______________________________________________
> > > > jdev mailing list
> > > > jdev at jabber.org
> > > > http://mailman.jabber.org/listinfo/jdev
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > _______________________________________________
> > > > jdev mailing list
> > > > jdev at jabber.org
> > > > http://mailman.jabber.org/listinfo/jdev
> > >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
> >
> >
> >
> >
> > _______________________________________________
> > jdev mailing list
> > jdev at jabber.org
> > http://mailman.jabber.org/listinfo/jdev
>
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev
_______________________________________________
jdev mailing list
jdev at jabber.org
http://mailman.jabber.org/listinfo/jdev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.jabber.org/jdev/attachments/20020106/d26e5450/attachment-0002.htm>
More information about the JDev
mailing list