[OpenSIPS-Users] opensips tm timer core dump

Kennard_White at logitech.com Kennard_White at logitech.com
Sat Oct 23 00:08:10 CEST 2010


Hi Bogdan,

I believe I found the problem. When sip_msg_cloner() within build_cell()
fails due to out-of-mem, and dangling pointer to the cell is left in the
global transaction pointer. Later on the post_cb() code attempts to clean
this up, and "resurrects" the now-free memory, and in particular puts it on
a wait timer. My guess is that later on this memory is allocated into a new
transaction which eventually uses the same wait timer link, and the two
threads fight it out.

Adding a set_t(0) fixes the problem. While looking, I believe I found a
memory leak in uac creation code that is also triggered by out of memory
conditions. Please see attached patch.

The "problem" also manifest itself as a core dump with insert_timer_unsafe.
Once I disabled optimization, I saw multiple variations within this
function. The most common was that ptr==tl after the search.  The sequence
of events for this is: new cell created, stored into global T, free'd,
added to wait-list by cleanup code from global T, memory re-allocated into
new cell, timer link zero'd, and then added again to wait list. At least
that is my best guess.

Regards,
Kennard

(See attached file: opensips-tm-cell.patch)



From:	Bogdan-Andrei Iancu <bogdan at voice-system.ro>
To:	OpenSIPS users mailling list <users at lists.opensips.org>
Date:	10/22/2010 07:02 AM
Subject:	Re: [OpenSIPS-Users] opensips tm timer core dump
Sent by:	users-bounces at lists.opensips.org



Hi Kennard,

I suppose the bt is the same ? do you still have the core file ?

Regards,
Bogdan

Kennard_White at logitech.com wrote:
>
> Hi Bodgen,
>
> I replicated the error. Unfortunately the entire insert_timer_unsafe
> and been in-lined and little is available:
>
> Program terminated with signal 11, Segmentation fault.
> #0 0x00007f8b8356c2c2 in insert_timer_unsafe (new_tl=0x7f8b7a54e310,
> list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at timer.c:731
> 731 timer.c: No such file or directory.
> in timer.c
> (gdb) print tl
> $1 = <value optimized out>
> (gdb) print *tl
> Cannot access memory at address 0x0
> (gdb) print ptr
> $2 = <value optimized out>
> (gdb) print *ptr
> Cannot access memory at address 0x0
> (gdb) print *new_tl
> No symbol "new_tl" in current context.
> (gdb) up
> #1 set_1timer (new_tl=0x7f8b7a54e310, list_id=WT_TIMER_LIST,
> ext_timeout=<value optimized out>) at timer.c:904
> 904 in timer.c
> (gdb) print *new_tl
> $3 = {next_tl = 0x0, prev_tl = 0x0, ld_tl = 0x0, time_out = 0,
> timer_list = 0x0, deleted = 0}
> (gdb) print list
> $4 = <value optimized out>
> (gdb) print timeout
> $5 = 32
> (gdb) print new_tl
> $6 = (struct timer_link *) 0x7f8b7a54e310
>
> I'll keep the core for a while -- please let me know if there is
> anything else I can try.
>
> Thanks,
> Kennard
>
> Inactive hide details for Bogdan-Andrei Iancu ---10/08/2010 04:40:47
> AM---Hi Kennard, Ok, keep the core next time :)Bogdan-Andrei Iancu
> ---10/08/2010 04:40:47 AM---Hi Kennard, Ok, keep the core next time :)
>
> From: Bogdan-Andrei Iancu <bogdan at voice-system.ro>
> To: OpenSIPS users mailling list <users at lists.opensips.org>
> Date: 10/08/2010 04:40 AM
> Subject: Re: [OpenSIPS-Users] opensips tm timer core dump
> Sent by: users-bounces at lists.opensips.org
>
> ------------------------------------------------------------------------
>
>
>
> Hi Kennard,
>
> Ok, keep the core next time :)
>
> Regards,
> Bogdan
>
> Kennard_White at logitech.com wrote:
> >
> > Hi Bogden,
> >
> > Thanks for explaining the child processes involved -- I misunderstood
> > what was happening.
> >
> > Unfortunately, I don't have the core anymore. My recollection is that
> > I couldn't print anything useful due to compiler optimization. That
> > said, this should re-create pretty easily, and I'll get more dumps
> > next time it happens.
> >
> > Regards,
> > Kennard
> >
> > Inactive hide details for Bogdan-Andrei Iancu ---10/05/2010 01:41:38
> > AM---Hi Kennard, The core was generated by process 22255:Bogdan-Andrei
> > Iancu ---10/05/2010 01:41:38 AM---Hi Kennard, The core was generated
> > by process 22255:
> >
> > From: Bogdan-Andrei Iancu <bogdan at voice-system.ro>
> > To: OpenSIPS users mailling list <users at lists.opensips.org>
> > Date: 10/05/2010 01:41 AM
> > Subject: Re: [OpenSIPS-Users] opensips tm timer core dump
> > Sent by: users-bounces at lists.opensips.org
> >
> >
------------------------------------------------------------------------
> >
> >
> >
> > Hi Kennard,
> >
> > The core was generated by process 22255:
> >    [22238]: INFO:core:handle_sigs: child process 22255 exited by a
> > signal 11
> >
> > and this process also reported mem problems:
> >    [22255]: ERROR:tm:new_t: out of mem
> >
> > Can you print the "tl" or "ptr" variables in frame 0?
> >
> > Regards,
> > Bogdan
> >
> > Kennard_White at logitech.com wrote:
> > >
> > > Running against opensips HEAD, I got a segfault in the tm timer code.
> > > I believe this is triggered by running out of shared memory.
> > >
> > >
> > > The stack trace:
> > >
> > > (gdb) where
> > > #0 0x00007fe8f8d96212 in insert_timer_unsafe (new_tl=0x7fe8f66337b0,
> > > list_id=WT_TIMER_LIST, ext_timeout=<value optimized out>) at
> timer.c:731
> > > #1 set_1timer (new_tl=0x7fe8f66337b0, list_id=WT_TIMER_LIST,
> > > ext_timeout=<value optimized out>) at timer.c:904
> > > #2 0x00007fe8f8d78ac8 in t_release_transaction (trans=0x7fe8f6633730)
> > > at t_funcs.c:122
> > > #3 0x00007fe8f8d808e5 in t_unref (p_msg=<value optimized out>)
> > > at t_lookup.c:1152
> > > #4 0x0000000000483ae5 in exec_post_req_cb ()
> > > #5 0x000000000046c1e4 in receive_msg ()
> > > #6 0x00000000004bc77c in udp_rcv_loop ()
> > > #7 0x000000000042de9c in main ()
> > >
> > > The offending code (I believe):
> > > if (tl->time_out==ptr->time_out) {
> > > tl->ld_tl = ptr->ld_tl
> > > ptr->ld_tl = 0;
> > > tl->ld_tl->ld_tl = tl; <-- SEG FAULT HERE (according to trace)
> > > } else {
> > > tl->ld_tl = tl;
> > > }
> > >
> > > Unfortunately, due to optimization I cannot dump anything useful, and
> > > I'm not convinced the actual fault is on the line indicated. Note
that
> > > the core dump is not one of the processes that reported out of
memory.
> > > Maybe one of the other processes left the timer list in a corrupt
> state?
> > >
> > > The log file:
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > ERROR:tm:sip_msg_cloner: no more share memory
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > ERROR:tm:new_t: out of mem
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22255]:
> > > ERROR:tm:t_newtran: new_t failed
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > WARNING:core:fm_malloc: Not enough free memory, will atempt
> > defragmenation
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > ERROR:tm:sip_msg_cloner: no more share memory
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > ERROR:tm:new_t: out of mem
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22254]:
> > > ERROR:tm:t_newtran: new_t failed
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > INFO:core:handle_sigs: child process 22255 exited by a signal 11
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > INFO:core:handle_sigs: core was generated
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22238]:
> > > INFO:core:handle_sigs: terminating due to SIGCHLD
> > > Sep 29 11:43:36 org-sip01 /var/run/openser/opensips-pres[22256]:
> > > INFO:core:sig_usr: signal 15 received
> > >
> > >
> ------------------------------------------------------------------------
> > >
> > > _______________________________________________
> > > Users mailing list
> > > Users at lists.opensips.org
> > > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> > >
> >
> >
> > --
> > Bogdan-Andrei Iancu
> > OpenSIPS Bootcamp
> > 15 - 19 November 2010, Edison, New Jersey, USA
> > www.voice-system.ro
> >
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> >
> >
------------------------------------------------------------------------
> >
> > _______________________________________________
> > Users mailing list
> > Users at lists.opensips.org
> > http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> >
>
>
> --
> Bogdan-Andrei Iancu
> OpenSIPS Bootcamp
> 15 - 19 November 2010, Edison, New Jersey, USA
> www.voice-system.ro
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>


--
Bogdan-Andrei Iancu
OpenSIPS Bootcamp
15 - 19 November 2010, Edison, New Jersey, USA
www.voice-system.ro


_______________________________________________
Users mailing list
Users at lists.opensips.org
http://lists.opensips.org/cgi-bin/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.opensips.org/pipermail/users/attachments/20101022/74e306b0/attachment-0001.htm 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
Url : http://lists.opensips.org/pipermail/users/attachments/20101022/74e306b0/attachment-0001.gif 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: opensips-tm-cell.patch
Type: application/octet-stream
Size: 1860 bytes
Desc: not available
Url : http://lists.opensips.org/pipermail/users/attachments/20101022/74e306b0/attachment-0001.obj 


More information about the Users mailing list