[JDEV] jabberd segfault under load

David Waite dwaite at jabber.com
Tue Nov 7 10:06:46 CST 2000


I don't know much about tuning machines for Jabber, but at least I know how to ask
the right questions for those who do :

What kernel are you using? 2.4.x is *highly* recommended for these type of tests
:)

Also, whats the traffic look like:
  How many messages are you sending a second?
  Do your test users have subscriptions to one another? If so, how many
subscriptions per users average?
  Do the users go available (sending presence to one another)?

I imagine you are using jpolld since you got above ~1020 clients :)

Assuming that you are using 2.4.x kernel, it almost looks like it is overflowing
the TCP buffers - there are a fixed number in the system shared for all sockets
(because the old <2.4.x scheme, 8-16k kernelside per socket was really silly).

It is very possible that the server is getting overworked, causing it to simply
choke on the volume of messages being sent (which are queued when there isn't
enough processor time, meaning that machine will swap HARD, and it will just get
worse). I am also unsure on the serverside memory requirements, but I believe
there is at very least an 8k buffer per user if karma is on - and 8k *10,000 users
is going to go over your available ram :)

-David Waite

Dennis Noordsij wrote:

> Hi,
>
> I have been doing some tests with jabber 1.2, using the binaries available
> because the CVS doesn't compile for me (some error in jabberd.h).
>
> I am running the simulator from the CVS to simulate a client load to a
> seperate machine on the 100mbps network.
>
> The simulator runs on my laptop (P3-700, 128MB ram), the jabberd server on a
> dedicated machine (no X etc), a P3-500 with 64MB.
>
> I set up the simulator to send a message only once every 10 seconds, and
> connect every second.
>
> After about 100 users top claims that jabberd is taking up 90% of the CPU,
> network load is quite low, maybe 10-20 k/sec. I have been able to connect
> 1000 users to jabberd on my own machine (using localhost, bumped up the
> ip_local_port_range and fs/inode-max and fs/file-max), this time I expected
> more over the network. After 1700 users the jabberd server segfaulted, so I
> tried again logging everything, this time it segfaulted after 1021 users.
>
> I have attached the last 50 or so lines from the jabberd -D output, it is a
> stock 1.2 jabber server, no agents, standard spooling, etc.
>
> Anyone who can help me out? :)
>
> Regards
> Dennis
>
> PS - I think the load is quite high for so little users. I imagine that an
> IRC server would use more resources per client and still it handles much more
> clients, although you have reported jabberd to handle 20,000-40,000
> connections. What can I reasonable expect with a Linux system? It appears raw
> CPU power is much more important than memory, still I expect 10,000 clients
> on a P3 system :-)
>
> --------- START "jabberd -D" OUTPUT ---------------------
>
> Tue Nov  7 13:49:36 2000  deliver.c:344 delivering to instance 'sessions'
> Tue Nov  7 13:49:36 2000  deliver.c:84 (80B6238)incoming packet <route
> to='f0360 at 194.100.32.65/89DBD18' from='381 at c2s/89A76A0'><message id='360'
> to='f0139 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long,
>         multiline message.</body></message></route>
> Tue Nov  7 13:49:36 2000  users.c:147
> js_user(f0360 at 194.100.32.65/89DBD18,8124428)
> Tue Nov  7 13:49:36 2000  mtqoverflow 8190 overflowing B7F7A80
> Tue Nov  7 13:49:36 2000  io_select.c:105 WRITE 381 len -1 of <message
> id='804' to='f0360 at 194.100.32.65'
> from='f0804 at 194.100.32.65/r973595128'><thread>asdf</thread><subject/><body>This
> is a long,         multiline message.</body></message>
>
> Tue Nov  7 13:49:36 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0534 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long,
>         multiline message.</body></message></route>
> Tue Nov  7 13:49:36 2000  deliver.c:344 delivering to instance 'sessions'
> Tue Nov  7 13:49:36 2000  deliver.c:84 (80B6238)incoming packet <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0534 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long,
>         multiline message.</body></message></route>
> Tue Nov  7 13:49:36 2000  users.c:147
> js_user(f0359 at 194.100.32.65/8940F20,8124428)
> Tue Nov  7 13:49:37 2000  mtqoverflow 8191 overflowing B7FA290
> Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0397 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is another
> short message!</body></message></route>
> Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
> Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0397 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is another
> short message!</body></message></route>
> Tue Nov  7 13:49:37 2000  users.c:147
> js_user(f0359 at 194.100.32.65/8940F20,8124428)
> Tue Nov  7 13:49:37 2000  mtqoverflow 8192 overflowing B7FB648
> Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0969 at 194.100.32.65'><thread>asdf</thread><subject/><body>How are
> you?</body></message></route>
> Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
> Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0969 at 194.100.32.65'><thread>asdf</thread><subject/><body>How are
> you?</body></message></route>
> Tue Nov  7 13:49:37 2000  users.c:147
> js_user(f0359 at 194.100.32.65/8940F20,8124428)
> Tue Nov  7 13:49:37 2000  mtqoverflow 8193 overflowing B7FCA00
> Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0225 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long,
>         multiline message.</body></message></route>
> Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
> Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route
> to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359'
> to='f0225 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long,
>         multiline message.</body></message></route>
> Tue Nov  7 13:49:37 2000  users.c:147
> js_user(f0359 at 194.100.32.65/8940F20,8124428)
> Tue Nov  7 13:49:37 2000  mtqoverflow 8194 overflowing B7FDEA8
> Tue Nov  7 13:49:37 2000  io_select.c:105 WRITE 380 len 225 of <message
> id='359' to='f0359 at 194.100.32.65/r973595128' from='f2480 at 194.100.32.65'
> type='error'><thread>asdf</thread><subject/><body>This is a long,
> multiline message.</body><error code='404'>Not Fou
>
> segfault
>
> ----------------------------------------------------
>
> What I noticed after this was that in /var/log/messages on the jabberd serer
> it said "eth0: can't fille rx buffer (force 1)!" "eth0:card reports no
> resources" etc. I am not sure what to think of this, the NIC is an Intel
> EtherExpress. Is it simply hardware? Did I screw up some tuning parameters?
> Still shouldn't jabberd spool messages if it can't send them? Anyone care to
> share their tuning tips to enable me to get 10,000 clients connected? :-)
>
>
> _______________________________________________
> jdev mailing list
> jdev at jabber.org
> http://mailman.jabber.org/listinfo/jdev





More information about the JDev mailing list