[OpenSIPS-Users] CPU 100% with TCP

Bogdan-Andrei Iancu bogdan at opensips.org
Fri Oct 26 03:06:19 EDT 2018


Hi Ben,

Thank you for the info.

It looks like theprocesses get stuck into a HEP related internal lock - 
do you see any HEP related errors inyour logs, prior to the dead-lock ?

Also, as PoC, could you disabled HEP tracing to see if the problem goes 
away ?

Thanks,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
   http://opensips.org/training/OpenSIPS_Bootcamp_2018/

On 10/24/2018 10:18 PM, Ben Newlin wrote:
>
> Bogdan,
>
> I have run the command but the output was too large for pastebin so I 
> have sent it to you directly.
>
> Ben Newlin
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> *Date: *Wednesday, October 24, 2018 at 5:17 AM
> *To: *OpenSIPS users mailling list <users at lists.opensips.org>, Ben 
> Newlin <Ben.Newlin at genesys.com>
> *Subject: *Re: [OpenSIPS-Users] CPU 100% with TCP
>
> Hi Ben,
>
> Could you run "opensipsctl trap" ?
>
> Regards,
>
> Bogdan-Andrei Iancu
> OpenSIPS Founder and Developer
>    http://www.opensips-solutions.com
> OpenSIPS Bootcamp 2018
>    http://opensips.org/training/OpenSIPS_Bootcamp_2018/
>
> On 10/24/2018 12:56 AM, Ben Newlin wrote:
>
>     Hi,
>
>     We have implemented TCP recently and are performing TCP<->UDP
>     translation on one of our proxy types. This proxy only exists for
>     that purpose; there are no DB queries, REST calls, or anything
>     like that. It is designed to be very fast and high throughput.
>
>     Recently we have found that when the remote endpoint of a TCP
>     connection is lost, i.e. the server goes down, while under
>     moderate load OpenSIPS quickly reaches 100% CPU and becomes
>     unresponsive. When this occurs, the “top” command shows that
>     between 30-90% CPU is in System (kernel) space, and each OpenSIPS
>     TCP process shows many times the normal CPU. We are running
>     OpenSIPS 2.4.2 on Amazon Linux.
>
>     I obtained as much information as I could using ps, strace, and
>     gdb here: https://pastebin.com/JP3DnCqs
>     <https://pastebin.com/JP3DnCqs>. We can reproduce the failure
>     consistently by removing a server during call traffic.
>
>     A few things I noticed:
>
>       * The number of running threads reported by OpenSIPS doesn’t
>         align with our configuration, copied here:
>
>     ####### Global Parameters #########
>
>     children=32
>
>     #// Allow 503 to pass back to Control
>
>     disable_503_translation=yes
>
>     #// Even though we are not receiving HEP,
>
>     #// this listener is required by OpenSIPS
>
>     #// in order to use the proto_hep module. :/
>
>     listen=hep_tcp:10.32.40.245:9061 use_children 1
>
>     #// Configure the listeners
>
>     listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
>     listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
>     #// Transaction Module
>
>     loadmodule "tm.so"
>
>     modparam("tm", "restart_fr_on_each_reply", 0)
>
>     modparam("tm", "timer_partitions", 8)
>
>     modparam("tm", "onreply_avp_mode", 1)
>
>     modparam("tm", "wt_timer", 10)
>
>     According to the documentation if “tcp_children” is not set then
>     the value of “children” will be used [1], but we have set
>     “children” to 32 and only have the default 8 TCP processes. Also
>     we appear to only have 1 timer process, although we have set the
>     number of timer partitions to 8.
>
>       * The server that is terminated was using TCP connections
>         exclusively, but all of the CPU seems to be in the UDP
>         threads. The one I looked at appeared to be handling a CANCEL
>         to one of the calls that was active and was attempting to send
>         it out via TCP. I’m not sure why it would be trying to relay
>         the CANCEL as no 100 Trying had been received from the server.
>         I have noticed that in 2.x OpenSIPS will now send CANCELs for
>         transactions even when 100 Trying was not received. Is that
>         intentional? RFC 3261 states that no CANCEL should be sent
>         unless a provisional response has been received.
>
>     Any assistance with this would be appreciated.
>
>     [1] -
>     http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66
>
>     Ben Newlin
>
>
>
>
>     _______________________________________________
>
>     Users mailing list
>
>     Users at lists.opensips.org <mailto:Users at lists.opensips.org>
>
>     http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20181026/dab72e49/attachment-0001.html>


More information about the Users mailing list