[JDEV] Re: jabberd 1.4.3 release candidate again

Frank Seesink frank at mail.wvnet.edu
Mon Nov 10 15:14:10 CST 2003


More info regarding the segfault caused by using -D under Cygwin:

I have tracked things down to line 826 in ./jabberd/mio.c (indicated 
with <===):
____________________________________________________________

         log_debug(ZONE,"mio while loop top");

         /* if we are closing down, exit the loop */
         if(mio__data->shutdown == 1 && mio__data->master__list == NULL)
             break;

         /* wait for a socket event */
         FD_SET(mio__data->zzz[0],&rfds); /* include our wakeup socket */
         if(bcast > 0)
             FD_SET(bcast,&rfds); /* optionally include our 
announcements socket
*/
         retval = pth_select(maxfd+1, &rfds, &wfds, NULL, NULL);   <===
         /* if retval is -1, fd sets are undefined across all platforms */

         log_debug(ZONE,"mio while loop, working");
____________________________________________________________

!!!!!
Apparently this call to pth_select() is making Jabberd go BOOM! right on 
startup.
!!!!!

(Verified this by adding a few more log_debug() lines just before and 
after the offending call, and sure enough, got up to but not past 
pth_select()).

Did some Googling and best I could find was the following thread:

	http://www.mail-archive.com/pth-users@gnu.org/msg00052.html

which would seem to indicate that possibly enough data is being pushed 
onto the run-time stack to cause the "STACK OVERFLOW".  Not sure why 
simply enabling debug mode would do this, as all it does is throw out 
statements (and why does this happen under Cygwin but apparently not 
under Linux/etc.?).

As written in discussion thread listed above:
________________________________________
There are only one good reason I can think of which cause the stack
overflow in such a "simple thread": Some of your functions or functions
inside some other libraries (libc, etc.) use large variables on the
stack. In C, every variable not declared "static" in a function is per
default allocated from the run-time stack. So, if you have a simple
"char buf[SIZE]" somewhere and SIZE is a few KB in size, this noticably
fills the stack of the thread while the function's scope is active.
________________________________________

Looked at the code for debug_log() in ./jabberd/log.c, which is 
basically what's called.  log_debug is just a macro that resolves to a 
conditional check to see if debug_flag is set, in which case run 
debug_log() is called (see ./jabberd/jabberd.h lines 109-113).

Only thing I see is the declarations at the beginning of debug_log():

     va_list ap;
     char message[MAX_LOG_SIZE];
     char *pos, c = '\0';
     int offset;

which might push a good bit of data on the stack depending on what the 
size of the va_list type is and the value of MAX_LOG_SIZE (which is 1024 
as seen on line 105 in jabberd.h).  But if that's the cause, I don't 
think I'd be seeing the last debug message ("mio while loop top") as the 
program should be bombing out as the code enters debug_log().  And 
considering this function is called, entered, run, and returned, any 
values it pushed on the stack are popped before continuing.

The only other thing I see that might affect the run-time stack are the 
calls to FD_SET(), which I'm not quite sure how they resolve.  All caps 
indicates a #define, but did a grep through the code and found nothing. 
   Looked at the GNU Pth docs, and nothing there except references to 
lower-case 'fd_set' var type.  Googling makes me think this is some kind 
of Unix standard connected with the select() function (which appears to 
be superceded/replaced by GNU Pth where it's used), so not quite sure 
how one plays with the other.  But maybe FD_SET under Cygwin pushes more 
data onto the stack than it does under *nix?  But does turning on debug 
output really cause this?  Not sure they're connected when I look at the 
code.

Guess at this point I'm kind of at a loss.  Looks like serious reading 
time to try and get up to speed on all this.  But if anyone out there-- 
unlike me out on the fringes--has intimate knowledge of this code or 
just the whole pthread vs. GNU Pth function calls, I'd love to get some 
insight.  Thanks in advance for reading this far and for any help you 
can provide.

____________________________________________________________
ACCESSING VARIABLES FROM OUTSIDE COMPILED MODULE UNDER CYGWIN
AND MU-Conference

After noting lines 109-113 in ./jabberd/jabberd.h, it occurred to me 
that jabberd.exe is compiled slightly differently under Cygwin than it 
is under *nix.  *nix version just checks debug_flag var directly (which 
is declared in ./jabberd/log.c), whereas Cygwin version calls a trivial 
function to do same.  (NOTE:  Did a grep on all the jabberd code, and 
this is the ONLY reference to __CYGWIN__ I can find in the entire source 
tree!!  So is this really the only difference in code now?)

Not sure why that's necessary, but removing this conditional, using just 
the *nix version of the #define, and re-compiling gave a few hiccups. 
Had to add a line to the Makefile to add one more export variable for 
doing the non-*nix build of export lib.  But even then things weren't 
100% right, as running jabberd.exe gave issues.

I suspect this all ties in with the way dynamic libraries can hook back 
into variables exported from executables in *nix but trying to do 
something similar under Cygwin gives all kinds of headaches (see post 
from 6Nov2003 for more info).  And this simple "wrapper" function might 
be a trick, possibly because under Cygwin functions can be exported but 
variables cannot?  (That's a question, not a statement.)  I have no 
clue.  So I've left this alone for now.

But this might explain why MU-Conference v0.52 blows up on me as well, 
whereas v0.3 does not.  MU-C v0.52 appears to try and connect back into 
a variable deliver__flag, which is defined in ./jabberd/deliver.c and 
compiled into jabberd.exe.  I added this variable to the export list via 
the Makefile, which allows MU-C v0.52 to compile/link against 
./jabberd/jabberd.a just fine, but MU-C still blows sky high when a room 
is created.  However, MU-C v0.3 suffers none of these issues, and 
compiles fine without that entry, implying MU-C v0.3 does NOT try to 
look at deliver__flag.  Anyway, just more observations.


Frank Seesink wrote:
...
> Ok, I admit it.  I'm kind of on a mission.  At this point Jabberd 
> 1.4.3CVS compiles/links/runs the same under Cygwin as it does on other 
> *nix platforms, with the one exception of running in debug mode (using 
> the -D switch).
> 
> So let me ask this, as I'm just starting to dig into the source code 
> itself.  Can anyone steer me in the right direction as to why, whenever 
> I attempt to fire up Jabberd in debug mode, I see the following:
> ____________________________________________________________
> $ ./jabberd/jabberd.exe -D
> Sat Nov  8 18:44:11 2003  mio.c:787 MIO is starting up
> Sat Nov  8 18:44:11 2003  mio.c:816 mio while loop top
> **Pth** STACK OVERFLOW: thread pid_t=0xa040750, name="unknown"
> Segmentation fault
> ____________________________________________________________
> 
> This happens regardless of whether I have configured/built jabberd with 
> (--enable-ssl) or without SSL support.  So I've ruled that out at least. 
>  It fails with the generic jabber.xml config.  Basically, I have not 
> been able to get Jabberd to fire up if I use the -D switch.
> 
> The actual pid_t number may vary (haven't been paying enough attention 
> to notice if it changes or if there's a pattern to be honest), but the 
> sequence of messages is always the same.  Jabberd starts and dies in the 
> blink of an eye.
> 
> However, simply NOT running in debug mode avoids ALL this, and I've had 
> a Jabber server running for weeks at a time in production (granted, low 
> user load, but still), usually only restarting when I reboot the Windows 
> XP Pro box it's running on.
> 
> Has anyone else experienced this kind of behavior on any other platform? 
>   Any insight into where to look?  I realize running Cygwin under 
> Windows, I'm working in a cludged environment at best.  But figured it 
> best to ask you good folks if you've ever seen this before, as you might 
> save me a great deal of time in finding the source of the problem...even 
> if the end result is just "It's a limitation of Cygwin/Windows.  Suck it 
> up." :-)
> 
> In the meantime, the hunt continues...





More information about the JDev mailing list