[OpenSIPS-Users] UDP workers consume 100% CPU

Vitalii Aleksandrov vitalik.voip at gmail.com
Wed Aug 14 10:42:05 EDT 2019


Absolute agree. No doubts it's the same issue.

> I suspect you are facing an issue similar to this one[1], so I think 
> we should move the discussion on that ticket. So please send us your 
> updates over there.
>
> [1] https://github.com/OpenSIPS/opensips/issues/1767
>
> Best regards,
> Răzvan
>
> On 8/14/19 12:58 PM, Vitalii Aleksandrov wrote:
>> Ok, I'll check when finish load testing with libssl-1.0.
>>
>>
>>> Hi, Vitalii!
>>>
>>> Just to double check you are using the right overwritten functions, 
>>> can you add some debugging in the tls_mgm code, recompile, and run 
>>> the test again? I'm thinking to add something like the snippet below 
>>> on line 1525[1]:
>>> LM_INFO("initialized mutex over OpenSIPS\n");
>>>
>>> [1] 
>>> https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525 
>>>
>>>
>>> Best regards,
>>> Răzvan
>>>
>>> On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
>>>> Checked the tls_mgm.c code and found that I wasn't right. I've just 
>>>> downgraded libssl and restarted opensips without rebuilding it. 
>>>> Will correctly rebuild opensips with libssl-1.0 dev files and retest.
>>>> The workaround with pthread_mutex_init() and shm_malloc() wrapping 
>>>> theoretically should work and Kamailio went the same way. Anyhow 
>>>> something definitely wrong with this "workaround" and libssl-1.1
>>>>
>>>>
>>>>> Hi,
>>>>>
>>>>> Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and 
>>>>> still can reproduce this bug. You were right that this is a 
>>>>> deadlock in libssl. Found TCP worker process blocked in libcrypto 
>>>>> and a bunch of processes burning CPU cycles in sched_yield() 
>>>>> waiting for the lock. That's really strange that do completely 
>>>>> different libssl versions have such a dangerous bug. Found a few 
>>>>> opened issues related to libssl, but according to comments 
>>>>> downgrading to libssl-1.0 helped to make it stable.
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I'm using libssl-1.1.1-1ubuntu2.1~18.04.4
>>>>>>
>>>>>> Since I need SIP over TLS I can't just remove proto_tls. Will try 
>>>>>> to reproduce it again and collect more data.
>>>>>>
>>>>>> On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:
>>>>>>> Hi Vitalii,
>>>>>>>
>>>>>>> As the backtraces show, it is a deadlock in TLS - if you remove 
>>>>>>> the proto_tls, you will not experience this issue anymore. The 
>>>>>>> root problem is not in OpenSIPS itself but in the 
>>>>>>> libssl/libcrypto - what versions are you using for these libs ?
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Bogdan-Andrei Iancu
>>>>>>>
>>>>>>> OpenSIPS Founder and Developer
>>>>>>>   https://www.opensips-solutions.com
>>>>>>> OpenSIPS Summit 2019
>>>>>>> https://www.opensips.org/events/Summit-2019Amsterdam/
>>>>>>>
>>>>>>> On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I have a test opensips 2.4.6 (last week's checkout from 2.4 
>>>>>>>> branch) and after short load testing it started to consume all 
>>>>>>>> CPU resources.
>>>>>>>>
>>>>>>>> Here is what opensips floods to syslog:
>>>>>>>> Aug  3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231430 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231530 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231630 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231730 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231830 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176231930 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker: 
>>>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now 
>>>>>>>> 176232030 ms), it may overlap..
>>>>>>>> Aug  3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker: 
>>>>>>>> timer task <tm-timer> already scheduled for 114023090 ms (now 
>>>>>>>> 176232030 ms), it may overlap..
>>>>>>>>
>>>>>>>> Where 4774 is: "Process::  ID=7 PID=4774 Type=timer"
>>>>>>>>
>>>>>>>> Here is the TOP output for time consuming processes:
>>>>>>>>   PID USER         PR  NI    VIRT    RES      SHR S %CPU 
>>>>>>>> %MEM     TIME+ COMMAND
>>>>>>>>  4801 opensips  20   0  721040  28496  24240 R  52.9 0.7 
>>>>>>>> 1817:08 opensips
>>>>>>>>  4802 opensips  20   0  721040  27724  23588 R  47.1 0.7 
>>>>>>>> 1813:06 opensips
>>>>>>>>  4806 opensips  20   0  721040  27640  23532 R  47.1 0.7 
>>>>>>>> 1814:41 opensips
>>>>>>>>  4833 opensips  20   0  721040  28848  24812 R  47.1 0.7 
>>>>>>>> 1815:16 opensips
>>>>>>>>  4803 opensips  20   0  721044  27928  23732 R  41.2 0.7 
>>>>>>>> 1817:09 opensips
>>>>>>>>  4804 opensips  20   0  721040  28188  23996 R  41.2 0.7 
>>>>>>>> 1816:05 opensips
>>>>>>>>  4799 opensips  20   0  721040  28272  24172 R  35.3 0.7 
>>>>>>>> 1813:29 opensips
>>>>>>>>  4800 opensips  20   0  721040  28340  24204 R  29.4 0.7 
>>>>>>>> 1816:52 opensips
>>>>>>>>  4805 opensips  20   0  723256  29620  25300 R  29.4 0.8 
>>>>>>>> 1819:28 opensips
>>>>>>>>
>>>>>>>> And here is the "opensipsctl ps" for them:
>>>>>>>> Process::  ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
>>>>>>>> Process::  ID=66 PID=4833 Type=TCP receiver
>>>>>>>>
>>>>>>>> "strace" output for those burning processes just floods with:
>>>>>>>> 11:07:04.042794 sched_yield()           = 0
>>>>>>>> 11:07:04.042869 sched_yield()           = 0
>>>>>>>> 11:07:04.042943 sched_yield()           = 0
>>>>>>>> 11:07:04.043016 sched_yield()           = 0
>>>>>>>> 11:07:04.043090 sched_yield()           = 0
>>>>>>>> 11:07:04.043164 sched_yield()           = 0
>>>>>>>>
>>>>>>>> Unfortunately this opensips is striped and gdb output is limited.
>>>>>>>> GDB bt of TCP receiver:
>>>>>>>> (gdb) bt
>>>>>>>> #0  0x00007f939211fe57 in sched_yield () at 
>>>>>>>> ../sysdeps/unix/syscall-template.S:78
>>>>>>>> #1  0x00007f936fffc14d in ?? () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
>>>>>>>> #2  0x00007f93791698d5 in send_pr_buffer () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #3  0x00007f93791626b8 in relay_reply () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #4  0x00007f937919fe2a in timer_routine () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #5  0x000055e00034a58a in handle_timer_job ()
>>>>>>>> #6  0x000055e00041ae2a in tcp_worker_proc_loop ()
>>>>>>>> #7  0x000055e000427218 in tcp_start_processes ()
>>>>>>>> #8  0x000055e0002e6837 in main ()
>>>>>>>>
>>>>>>>> GDB bt of UDP receiver:
>>>>>>>> #0  0x00007f939211fe57 in sched_yield () at 
>>>>>>>> ../sysdeps/unix/syscall-template.S:78
>>>>>>>> #1  0x00007f936fffc14d in ?? () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
>>>>>>>> #2  0x00007f93791698d5 in send_pr_buffer () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #3  0x00007f93791626b8 in relay_reply () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #4  0x00007f9379165015 in reply_received () from 
>>>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>>>> #5  0x000055e000322fed in forward_reply ()
>>>>>>>> #6  0x000055e000309d1d in receive_msg ()
>>>>>>>> #7  0x000055e00042b4be in ?? ()
>>>>>>>> #8  0x000055e000413dd8 in udp_start_processes ()
>>>>>>>> #9  0x000055e0002e6724 in main ()
>>>>>>>>
>>>>>>>> Those workers don't process any incoming SIP messages and just 
>>>>>>>> stay in this infinite loop.
>>>>>>>>
>>>>>>>> Any input is very appreciated.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opensips.org
>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>



More information about the Users mailing list