[JDEV] jabberd segfault under load

Dennis Noordsij dennis.noordsij at wiral.com
Tue Nov 7 06:13:30 CST 2000


Hi,

I have been doing some tests with jabber 1.2, using the binaries available 
because the CVS doesn't compile for me (some error in jabberd.h).

I am running the simulator from the CVS to simulate a client load to a 
seperate machine on the 100mbps network.

The simulator runs on my laptop (P3-700, 128MB ram), the jabberd server on a 
dedicated machine (no X etc), a P3-500 with 64MB. 

I set up the simulator to send a message only once every 10 seconds, and 
connect every second.

After about 100 users top claims that jabberd is taking up 90% of the CPU, 
network load is quite low, maybe 10-20 k/sec. I have been able to connect 
1000 users to jabberd on my own machine (using localhost, bumped up the 
ip_local_port_range and fs/inode-max and fs/file-max), this time I expected 
more over the network. After 1700 users the jabberd server segfaulted, so I 
tried again logging everything, this time it segfaulted after 1021 users.

I have attached the last 50 or so lines from the jabberd -D output, it is a 
stock 1.2 jabber server, no agents, standard spooling, etc. 

Anyone who can help me out? :)

Regards
Dennis

PS - I think the load is quite high for so little users. I imagine that an 
IRC server would use more resources per client and still it handles much more 
clients, although you have reported jabberd to handle 20,000-40,000 
connections. What can I reasonable expect with a Linux system? It appears raw 
CPU power is much more important than memory, still I expect 10,000 clients 
on a P3 system :-)


--------- START "jabberd -D" OUTPUT ---------------------

Tue Nov  7 13:49:36 2000  deliver.c:344 delivering to instance 'sessions'
Tue Nov  7 13:49:36 2000  deliver.c:84 (80B6238)incoming packet <route 
to='f0360 at 194.100.32.65/89DBD18' from='381 at c2s/89A76A0'><message id='360' 
to='f0139 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long, 
        multiline message.</body></message></route>
Tue Nov  7 13:49:36 2000  users.c:147 
js_user(f0360 at 194.100.32.65/89DBD18,8124428)
Tue Nov  7 13:49:36 2000  mtqoverflow 8190 overflowing B7F7A80
Tue Nov  7 13:49:36 2000  io_select.c:105 WRITE 381 len -1 of <message 
id='804' to='f0360 at 194.100.32.65' 
from='f0804 at 194.100.32.65/r973595128'><thread>asdf</thread><subject/><body>This 
is a long,         multiline message.</body></message>

Tue Nov  7 13:49:36 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0534 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long, 
        multiline message.</body></message></route>
Tue Nov  7 13:49:36 2000  deliver.c:344 delivering to instance 'sessions'
Tue Nov  7 13:49:36 2000  deliver.c:84 (80B6238)incoming packet <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0534 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long, 
        multiline message.</body></message></route>
Tue Nov  7 13:49:36 2000  users.c:147 
js_user(f0359 at 194.100.32.65/8940F20,8124428)
Tue Nov  7 13:49:37 2000  mtqoverflow 8191 overflowing B7FA290
Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0397 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is another 
short message!</body></message></route>
Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0397 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is another 
short message!</body></message></route>
Tue Nov  7 13:49:37 2000  users.c:147 
js_user(f0359 at 194.100.32.65/8940F20,8124428)
Tue Nov  7 13:49:37 2000  mtqoverflow 8192 overflowing B7FB648
Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0969 at 194.100.32.65'><thread>asdf</thread><subject/><body>How are 
you?</body></message></route>
Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0969 at 194.100.32.65'><thread>asdf</thread><subject/><body>How are 
you?</body></message></route>
Tue Nov  7 13:49:37 2000  users.c:147 
js_user(f0359 at 194.100.32.65/8940F20,8124428)
Tue Nov  7 13:49:37 2000  mtqoverflow 8193 overflowing B7FCA00
Tue Nov  7 13:49:37 2000  deliver.c:472 DELIVER 4:194.100.32.65 <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0225 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long, 
        multiline message.</body></message></route>
Tue Nov  7 13:49:37 2000  deliver.c:344 delivering to instance 'sessions'
Tue Nov  7 13:49:37 2000  deliver.c:84 (80B6238)incoming packet <route 
to='f0359 at 194.100.32.65/8940F20' from='380 at c2s/89BC768'><message id='359' 
to='f0225 at 194.100.32.65'><thread>asdf</thread><subject/><body>This is a long, 
        multiline message.</body></message></route>
Tue Nov  7 13:49:37 2000  users.c:147 
js_user(f0359 at 194.100.32.65/8940F20,8124428)
Tue Nov  7 13:49:37 2000  mtqoverflow 8194 overflowing B7FDEA8
Tue Nov  7 13:49:37 2000  io_select.c:105 WRITE 380 len 225 of <message 
id='359' to='f0359 at 194.100.32.65/r973595128' from='f2480 at 194.100.32.65' 
type='error'><thread>asdf</thread><subject/><body>This is a long,         
multiline message.</body><error code='404'>Not Fou

segfault

----------------------------------------------------


What I noticed after this was that in /var/log/messages on the jabberd serer 
it said "eth0: can't fille rx buffer (force 1)!" "eth0:card reports no 
resources" etc. I am not sure what to think of this, the NIC is an Intel 
EtherExpress. Is it simply hardware? Did I screw up some tuning parameters? 
Still shouldn't jabberd spool messages if it can't send them? Anyone care to 
share their tuning tips to enable me to get 10,000 clients connected? :-)
 




More information about the JDev mailing list