[OpenSIPS-Users] UDP workers consume 100% CPU
Vitalii Aleksandrov
vitalik.voip at gmail.com
Wed Aug 14 05:58:00 EDT 2019
Ok, I'll check when finish load testing with libssl-1.0.
> Hi, Vitalii!
>
> Just to double check you are using the right overwritten functions,
> can you add some debugging in the tls_mgm code, recompile, and run the
> test again? I'm thinking to add something like the snippet below on
> line 1525[1]:
> LM_INFO("initialized mutex over OpenSIPS\n");
>
> [1]
> https://github.com/OpenSIPS/opensips/blob/master/modules/tls_mgm/tls_mgm.c#L1525
>
> Best regards,
> Răzvan
>
> On 8/12/19 7:20 PM, Vitalii Aleksandrov wrote:
>> Checked the tls_mgm.c code and found that I wasn't right. I've just
>> downgraded libssl and restarted opensips without rebuilding it. Will
>> correctly rebuild opensips with libssl-1.0 dev files and retest.
>> The workaround with pthread_mutex_init() and shm_malloc() wrapping
>> theoretically should work and Kamailio went the same way. Anyhow
>> something definitely wrong with this "workaround" and libssl-1.1
>>
>>
>>> Hi,
>>>
>>> Downgraded libssl to 1.0.2n-1ubuntu5.3 available in repository and
>>> still can reproduce this bug. You were right that this is a deadlock
>>> in libssl. Found TCP worker process blocked in libcrypto and a bunch
>>> of processes burning CPU cycles in sched_yield() waiting for the
>>> lock. That's really strange that do completely different libssl
>>> versions have such a dangerous bug. Found a few opened issues
>>> related to libssl, but according to comments downgrading to
>>> libssl-1.0 helped to make it stable.
>>>
>>>> Hi,
>>>>
>>>> I'm using libssl-1.1.1-1ubuntu2.1~18.04.4
>>>>
>>>> Since I need SIP over TLS I can't just remove proto_tls. Will try
>>>> to reproduce it again and collect more data.
>>>>
>>>> On 07.08.19 13:18, Bogdan-Andrei Iancu wrote:
>>>>> Hi Vitalii,
>>>>>
>>>>> As the backtraces show, it is a deadlock in TLS - if you remove
>>>>> the proto_tls, you will not experience this issue anymore. The
>>>>> root problem is not in OpenSIPS itself but in the libssl/libcrypto
>>>>> - what versions are you using for these libs ?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bogdan-Andrei Iancu
>>>>>
>>>>> OpenSIPS Founder and Developer
>>>>> https://www.opensips-solutions.com
>>>>> OpenSIPS Summit 2019
>>>>> https://www.opensips.org/events/Summit-2019Amsterdam/
>>>>>
>>>>> On 08/05/2019 09:13 PM, Vitalii Aleksandrov wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I have a test opensips 2.4.6 (last week's checkout from 2.4
>>>>>> branch) and after short load testing it started to consume all
>>>>>> CPU resources.
>>>>>>
>>>>>> Here is what opensips floods to syslog:
>>>>>> Aug 3 13:25:01 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231430 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231530 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231630 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231730 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231830 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176231930 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:utimer_ticker:
>>>>>> utimer task <tm-utimer> already scheduled for 114023190 ms (now
>>>>>> 176232030 ms), it may overlap..
>>>>>> Aug 3 13:25:02 l /opensips[4774]: WARNING:core:timer_ticker:
>>>>>> timer task <tm-timer> already scheduled for 114023090 ms (now
>>>>>> 176232030 ms), it may overlap..
>>>>>>
>>>>>> Where 4774 is: "Process:: ID=7 PID=4774 Type=timer"
>>>>>>
>>>>>> Here is the TOP output for time consuming processes:
>>>>>> PID USER PR NI VIRT RES SHR S %CPU
>>>>>> %MEM TIME+ COMMAND
>>>>>> 4801 opensips 20 0 721040 28496 24240 R 52.9 0.7 1817:08
>>>>>> opensips
>>>>>> 4802 opensips 20 0 721040 27724 23588 R 47.1 0.7 1813:06
>>>>>> opensips
>>>>>> 4806 opensips 20 0 721040 27640 23532 R 47.1 0.7 1814:41
>>>>>> opensips
>>>>>> 4833 opensips 20 0 721040 28848 24812 R 47.1 0.7 1815:16
>>>>>> opensips
>>>>>> 4803 opensips 20 0 721044 27928 23732 R 41.2 0.7 1817:09
>>>>>> opensips
>>>>>> 4804 opensips 20 0 721040 28188 23996 R 41.2 0.7 1816:05
>>>>>> opensips
>>>>>> 4799 opensips 20 0 721040 28272 24172 R 35.3 0.7 1813:29
>>>>>> opensips
>>>>>> 4800 opensips 20 0 721040 28340 24204 R 29.4 0.7 1816:52
>>>>>> opensips
>>>>>> 4805 opensips 20 0 723256 29620 25300 R 29.4 0.8 1819:28
>>>>>> opensips
>>>>>>
>>>>>> And here is the "opensipsctl ps" for them:
>>>>>> Process:: ID=32 PID=4799 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=33 PID=4800 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=34 PID=4801 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=35 PID=4802 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=36 PID=4803 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=37 PID=4804 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=38 PID=4805 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=39 PID=4806 Type=SIP receiver udp:127.0.0.1:5070
>>>>>> Process:: ID=66 PID=4833 Type=TCP receiver
>>>>>>
>>>>>> "strace" output for those burning processes just floods with:
>>>>>> 11:07:04.042794 sched_yield() = 0
>>>>>> 11:07:04.042869 sched_yield() = 0
>>>>>> 11:07:04.042943 sched_yield() = 0
>>>>>> 11:07:04.043016 sched_yield() = 0
>>>>>> 11:07:04.043090 sched_yield() = 0
>>>>>> 11:07:04.043164 sched_yield() = 0
>>>>>>
>>>>>> Unfortunately this opensips is striped and gdb output is limited.
>>>>>> GDB bt of TCP receiver:
>>>>>> (gdb) bt
>>>>>> #0 0x00007f939211fe57 in sched_yield () at
>>>>>> ../sysdeps/unix/syscall-template.S:78
>>>>>> #1 0x00007f936fffc14d in ?? () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
>>>>>> #2 0x00007f93791698d5 in send_pr_buffer () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #3 0x00007f93791626b8 in relay_reply () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #4 0x00007f937919fe2a in timer_routine () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #5 0x000055e00034a58a in handle_timer_job ()
>>>>>> #6 0x000055e00041ae2a in tcp_worker_proc_loop ()
>>>>>> #7 0x000055e000427218 in tcp_start_processes ()
>>>>>> #8 0x000055e0002e6837 in main ()
>>>>>>
>>>>>> GDB bt of UDP receiver:
>>>>>> #0 0x00007f939211fe57 in sched_yield () at
>>>>>> ../sysdeps/unix/syscall-template.S:78
>>>>>> #1 0x00007f936fffc14d in ?? () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/proto_tls.so
>>>>>> #2 0x00007f93791698d5 in send_pr_buffer () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #3 0x00007f93791626b8 in relay_reply () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #4 0x00007f9379165015 in reply_received () from
>>>>>> /usr/lib/x86_64-linux-gnu/opensips/modules/tm.so
>>>>>> #5 0x000055e000322fed in forward_reply ()
>>>>>> #6 0x000055e000309d1d in receive_msg ()
>>>>>> #7 0x000055e00042b4be in ?? ()
>>>>>> #8 0x000055e000413dd8 in udp_start_processes ()
>>>>>> #9 0x000055e0002e6724 in main ()
>>>>>>
>>>>>> Those workers don't process any incoming SIP messages and just
>>>>>> stay in this infinite loop.
>>>>>>
>>>>>> Any input is very appreciated.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
More information about the Users
mailing list