[Users] memory leak in presence module?

Klaus Darilion klaus.mailinglists at pernau.at
Mon May 7 17:14:32 CEST 2007


Hi Bogdan!

I've attached with strace to all openser threads and waited for the 
crash. Here is the strace log of the "attendant" process (ID=0):

Process 2340 attached - interrupt to quit
pause()                                 = ? ERESTARTNOHAND (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
waitpid(-1, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], WNOHANG) = 2344
time([1178548261])                      = 1178548261
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
send(3, "<134>May  7 16:31:01 /usr/sbin/o"..., 86, MSG_NOSIGNAL) = 86
time([1178548262])                      = 1178548262
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
send(3, "<134>May  7 16:31:02 /usr/sbin/o"..., 69, MSG_NOSIGNAL) = 69
waitpid(-1, 0xbfd58ecc, WNOHANG)        = 0
time([1178548262])                      = 1178548262
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=801, ...}) = 0
send(3, "<134>May  7 16:31:02 /usr/sbin/o"..., 79, MSG_NOSIGNAL) = 79
kill(0, SIGTERM)                        = 0
--- SIGTERM (Terminated) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
rt_sigaction(SIGALRM, {0x8067830, [ALRM], SA_RESTART}, {SIG_DFL}, 8) = 0
alarm(60)                               = 0
wait4(-1, NULL, 0, NULL)                = 2350
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2345
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2349
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2341
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2347
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2346
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2348
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = 2342
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn()                             = ? (mask now [])
wait4(-1, NULL, 0, NULL)                = ? ERESTARTSYS (To be restarted)
--- SIGALRM (Alarm clock) @ 0 (0) ---
kill(0, SIGKILL)                        = 0
+++ killed by SIGKILL +++
Process 2340 detached


If I read it correct, the SIGKILL is sent by this process, after sending 
SIGTERM to all its childs. The SIGTERM is sent, because a child exited. 
But which child? And why?

The openser log says:
May  7 16:31:02 debian /usr/sbin/openser[2340]: child process 2344 
exited by a signal 9
May  7 16:31:02 debian /usr/sbin/openser[2340]: core was not generated
May  7 16:31:02 debian /usr/sbin/openser[2340]: INFO: terminating due to 
SIGCHLD

To me this looks like 2344 (a UDP thread) exited with signal 9. Thus, 
the main thread receives SIGCHLD and then sends SIGTERM and afterwards 
SIGKILL to all other threads and itself.

But why received the thread 2344 a SIGKILL and who sent the SIGKILL?

I need some more debugging tips.
Bogdan, you mentioned gdb - how can I debug this with gdb?

regards
klaus

Bogdan-Andrei Iancu wrote:
> Hi Klaus,
> 
> I applied on SVN the fix for the TM memory leak - it should not happen 
> anymore now, even if you do not use t_release()...
> 
> regarding the openser stop reacting - can you attach with gdb to see 
> what the process are done?
> 
> regards,
> bogdan
> 
> Klaus Darilion wrote:
>> Hi Daniel!
>>
>> Summary:
>> - Without t_release() (no modifications to source code) openser leaks 
>> memory.
>> - with t_release() openser does not leak. But after some time there is 
>> strange behaviour, e.g.:
>>  -: openser stops reacting for some minutes and afterwards gets
>>     terminated with signal 9. When openser stops working the load
>>     increase to > 40. This happend 3 times now.
>>  -: openser stops reacting for some minutes and the linux PC
>>     where openser is running gets unresponsive. No login. Open
>>     SSH sessions are unresponsive. I had to reboot the PC. Happend
>>     1 time.
>>
>> Maybe this is not pure openser related, but a problem with openser and 
>> Linux (as I had to reboot the server one time).
>>
>> Any hints how to debug this?
>>
>> regards
>> klaus
>>
> 




More information about the Users mailing list