[OpenSIPS-Users] CPU 100% with TCP
Bogdan-Andrei Iancu
bogdan at opensips.org
Wed Oct 24 05:16:29 EDT 2018
Hi Ben,
Could you run "opensipsctl trap" ?
Regards,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
http://opensips.org/training/OpenSIPS_Bootcamp_2018/
On 10/24/2018 12:56 AM, Ben Newlin wrote:
>
> Hi,
>
> We have implemented TCP recently and are performing TCP<->UDP
> translation on one of our proxy types. This proxy only exists for that
> purpose; there are no DB queries, REST calls, or anything like that.
> It is designed to be very fast and high throughput.
>
> Recently we have found that when the remote endpoint of a TCP
> connection is lost, i.e. the server goes down, while under moderate
> load OpenSIPS quickly reaches 100% CPU and becomes unresponsive. When
> this occurs, the “top” command shows that between 30-90% CPU is in
> System (kernel) space, and each OpenSIPS TCP process shows many times
> the normal CPU. We are running OpenSIPS 2.4.2 on Amazon Linux.
>
> I obtained as much information as I could using ps, strace, and gdb
> here: https://pastebin.com/JP3DnCqs. We can reproduce the failure
> consistently by removing a server during call traffic.
>
> A few things I noticed:
>
> * The number of running threads reported by OpenSIPS doesn’t align
> with our configuration, copied here:
>
> ####### Global Parameters #########
>
> children=32
>
> #// Allow 503 to pass back to Control
>
> disable_503_translation=yes
>
> #// Even though we are not receiving HEP,
>
> #// this listener is required by OpenSIPS
>
> #// in order to use the proto_hep module. :/
>
> listen=hep_tcp:10.32.40.245:9061 use_children 1
>
> #// Configure the listeners
>
> listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> #// Transaction Module
>
> loadmodule "tm.so"
>
> modparam("tm", "restart_fr_on_each_reply", 0)
>
> modparam("tm", "timer_partitions", 8)
>
> modparam("tm", "onreply_avp_mode", 1)
>
> modparam("tm", "wt_timer", 10)
>
> According to the documentation if “tcp_children” is not set then the
> value of “children” will be used [1], but we have set “children” to 32
> and only have the default 8 TCP processes. Also we appear to only have
> 1 timer process, although we have set the number of timer partitions to 8.
>
> * The server that is terminated was using TCP connections
> exclusively, but all of the CPU seems to be in the UDP threads.
> The one I looked at appeared to be handling a CANCEL to one of the
> calls that was active and was attempting to send it out via TCP.
> I’m not sure why it would be trying to relay the CANCEL as no 100
> Trying had been received from the server. I have noticed that in
> 2.x OpenSIPS will now send CANCELs for transactions even when 100
> Trying was not received. Is that intentional? RFC 3261 states that
> no CANCEL should be sent unless a provisional response has been
> received.
>
> Any assistance with this would be appreciated.
>
> [1] -
> http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66
>
> Ben Newlin
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20181024/0b2ed3ae/attachment.html>
More information about the Users
mailing list