[OpenSIPS-Users] CPU 100% with TCP

Wed Oct 24 05:16:29 EDT 2018

Hi Ben,

Could you run "opensipsctl trap" ?

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
   http://opensips.org/training/OpenSIPS_Bootcamp_2018/

On 10/24/2018 12:56 AM, Ben Newlin wrote:
>
> Hi,
>
> We have implemented TCP recently and are performing TCP<->UDP 
> translation on one of our proxy types. This proxy only exists for that 
> purpose; there are no DB queries, REST calls, or anything like that. 
> It is designed to be very fast and high throughput.
>
> Recently we have found that when the remote endpoint of a TCP 
> connection is lost, i.e. the server goes down, while under moderate 
> load OpenSIPS quickly reaches 100% CPU and becomes unresponsive. When 
> this occurs, the “top” command shows that between 30-90% CPU is in 
> System (kernel) space, and each OpenSIPS TCP process shows many times 
> the normal CPU. We are running OpenSIPS 2.4.2 on Amazon Linux.
>
> I obtained as much information as I could using ps, strace, and gdb 
> here: https://pastebin.com/JP3DnCqs. We can reproduce the failure 
> consistently by removing a server during call traffic.
>
> A few things I noticed:
>
>   * The number of running threads reported by OpenSIPS doesn’t align
>     with our configuration, copied here:
>
> ####### Global Parameters #########
>
> children=32
>
> #// Allow 503 to pass back to Control
>
> disable_503_translation=yes
>
> #// Even though we are not receiving HEP,
>
> #// this listener is required by OpenSIPS
>
> #// in order to use the proto_hep module. :/
>
> listen=hep_tcp:10.32.40.245:9061 use_children 1
>
> #// Configure the listeners
>
> listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> #// Transaction Module
>
> loadmodule "tm.so"
>
> modparam("tm", "restart_fr_on_each_reply", 0)
>
> modparam("tm", "timer_partitions", 8)
>
> modparam("tm", "onreply_avp_mode", 1)
>
> modparam("tm", "wt_timer", 10)
>
> According to the documentation if “tcp_children” is not set then the 
> value of “children” will be used [1], but we have set “children” to 32 
> and only have the default 8 TCP processes. Also we appear to only have 
> 1 timer process, although we have set the number of timer partitions to 8.
>
>   * The server that is terminated was using TCP connections
>     exclusively, but all of the CPU seems to be in the UDP threads.
>     The one I looked at appeared to be handling a CANCEL to one of the
>     calls that was active and was attempting to send it out via TCP.
>     I’m not sure why it would be trying to relay the CANCEL as no 100
>     Trying had been received from the server. I have noticed that in
>     2.x OpenSIPS will now send CANCELs for transactions even when 100
>     Trying was not received. Is that intentional? RFC 3261 states that
>     no CANCEL should be sent unless a provisional response has been
>     received.
>
> Any assistance with this would be appreciated.
>
> [1] - 
> http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66
>
> Ben Newlin
>
>
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20181024/0b2ed3ae/attachment.html>