[Users] Re: [Devel] "detached" timer

T.R. Missner trmissner at bandwidth.com
Thu Mar 29 23:22:21 CEST 2007


Is it possible the locked state I am seeing with openser leads to the 
"detached" timer?
Since the "detached" timer is a race, it would make sense to see the 
race condition after openser locks up and messages buffer up in the stack.
When a bunch of messages are processed all at once by multiple threads 
the race condition would occur.
Does this make sense?

Maybe I have been focusing on the wrong place.

Ignoring the "detached" timer what could cause openser to hang for a 
couple seconds then clear every 5 - 10 minutes?

Ideas?

We are seeing this on 3 different productions servers.

Thanks

TR

using openser1.1.1



T.R. Missner wrote:
> Bogdan,
>
> I have been chasing this for days and done lots of debugging.
> using 1.1.1
> While looking at the network trace at the time of these messages ( I 
> usually see at least 5 in a row with differing hex values ) I see many 
> incoming packets coming into the box and no response from the proxy 
> for somewhere between 5 - 10 seconds, then a flood a responses from 
> the proxy.
> I can email you a sample pcap file if you like.
> As part of my debugging I forced a 100 reply at the very top of my cfg 
> file.
> The forced 100 was not sent during the locked up time leading me to 
> believe openser was not processing incoming packets.
> I have now seen this on multiple servers in different locations. 
> Likely a particular customer call flow is causing this but I have not 
> been able to pin it down to the exact customer. These proxies run 
> pretty fast during the day so finding a pattern leading up the this 
> issue is difficult. What could I add to the Log output to identify the 
> offending sip-callid? Is sip-callid or branch tag or anything similar 
> easily accessible in any of the data structs in timer.c?
>
> TR
>
> Bogdan-Andrei Iancu wrote:
>> Hi TR,
>>
>> it is race between expire even (from timer) and inserting again on a 
>> timer list.
>>    1 is the final response timer list (fr_timer)
>>    3 id the wait timer list (wt_timer)
>>
>> I would say there is no way this could leas to a any kind of lock.
>>
>> what version are you using? what makes you say it locks?
>>
>> regards,
>> bogdan
>>
>> T.R. Missner wrote:
>>> Does anyone know what causes this?
>>>
>>> */set_timer for 1 list called on a "detached" timer -- ignoring /*
>>>
>>> I also see
>>>
>>> */set_timer for 3 list called on a "detached" timer -- ignoring /*
>>>
>>>
>>>
>>> When this happens Openser seems to lock up for 10 seconds or so.
>>>
>>> >From searching it appears this is caused by a race but I am not 
>>> sure what the race is or why this results in an unresponsive openser 
>>> instance for multiple seconds.
>>>
>>> Transaction expiration racing reply?
>>>
>>>
>>> Desperately need to understand how this could be triggered so I can 
>>> get customer to adjust system.
>>>
>>> Any way to adjust?
>>>
>>> tried tweaking fr_inv_timer but no joy.
>>>
>>>
>>>
>>> TR
>>> ------------------------------------------------------------------------ 
>>>
>>>
>>> _______________________________________________
>>> Devel mailing list
>>> Devel at openser.org
>>> http://openser.org/cgi-bin/mailman/listinfo/devel
>>>   
>>
>
> _______________________________________________
> Users mailing list
> Users at openser.org
> http://openser.org/cgi-bin/mailman/listinfo/users




More information about the Users mailing list