[OpenSIPS-Users] CPU 100% with TCP
Bogdan-Andrei Iancu
bogdan at opensips.org
Fri Oct 26 03:06:19 EDT 2018
Hi Ben,
Thank you for the info.
It looks like theprocesses get stuck into a HEP related internal lock -
do you see any HEP related errors inyour logs, prior to the dead-lock ?
Also, as PoC, could you disabled HEP tracing to see if the problem goes
away ?
Thanks,
Bogdan-Andrei Iancu
OpenSIPS Founder and Developer
http://www.opensips-solutions.com
OpenSIPS Bootcamp 2018
http://opensips.org/training/OpenSIPS_Bootcamp_2018/
On 10/24/2018 10:18 PM, Ben Newlin wrote:
>
> Bogdan,
>
> I have run the command but the output was too large for pastebin so I
> have sent it to you directly.
>
> Ben Newlin
>
> *From: *Bogdan-Andrei Iancu <bogdan at opensips.org>
> *Date: *Wednesday, October 24, 2018 at 5:17 AM
> *To: *OpenSIPS users mailling list <users at lists.opensips.org>, Ben
> Newlin <Ben.Newlin at genesys.com>
> *Subject: *Re: [OpenSIPS-Users] CPU 100% with TCP
>
> Hi Ben,
>
> Could you run "opensipsctl trap" ?
>
> Regards,
>
> Bogdan-Andrei Iancu
> OpenSIPS Founder and Developer
> http://www.opensips-solutions.com
> OpenSIPS Bootcamp 2018
> http://opensips.org/training/OpenSIPS_Bootcamp_2018/
>
> On 10/24/2018 12:56 AM, Ben Newlin wrote:
>
> Hi,
>
> We have implemented TCP recently and are performing TCP<->UDP
> translation on one of our proxy types. This proxy only exists for
> that purpose; there are no DB queries, REST calls, or anything
> like that. It is designed to be very fast and high throughput.
>
> Recently we have found that when the remote endpoint of a TCP
> connection is lost, i.e. the server goes down, while under
> moderate load OpenSIPS quickly reaches 100% CPU and becomes
> unresponsive. When this occurs, the “top” command shows that
> between 30-90% CPU is in System (kernel) space, and each OpenSIPS
> TCP process shows many times the normal CPU. We are running
> OpenSIPS 2.4.2 on Amazon Linux.
>
> I obtained as much information as I could using ps, strace, and
> gdb here: https://pastebin.com/JP3DnCqs
> <https://pastebin.com/JP3DnCqs>. We can reproduce the failure
> consistently by removing a server during call traffic.
>
> A few things I noticed:
>
> * The number of running threads reported by OpenSIPS doesn’t
> align with our configuration, copied here:
>
> ####### Global Parameters #########
>
> children=32
>
> #// Allow 503 to pass back to Control
>
> disable_503_translation=yes
>
> #// Even though we are not receiving HEP,
>
> #// this listener is required by OpenSIPS
>
> #// in order to use the proto_hep module. :/
>
> listen=hep_tcp:10.32.40.245:9061 use_children 1
>
> #// Configure the listeners
>
> listen=udp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> listen=tcp:10.32.40.245:5060 as XXX.XXX.XXX.XXX
>
> #// Transaction Module
>
> loadmodule "tm.so"
>
> modparam("tm", "restart_fr_on_each_reply", 0)
>
> modparam("tm", "timer_partitions", 8)
>
> modparam("tm", "onreply_avp_mode", 1)
>
> modparam("tm", "wt_timer", 10)
>
> According to the documentation if “tcp_children” is not set then
> the value of “children” will be used [1], but we have set
> “children” to 32 and only have the default 8 TCP processes. Also
> we appear to only have 1 timer process, although we have set the
> number of timer partitions to 8.
>
> * The server that is terminated was using TCP connections
> exclusively, but all of the CPU seems to be in the UDP
> threads. The one I looked at appeared to be handling a CANCEL
> to one of the calls that was active and was attempting to send
> it out via TCP. I’m not sure why it would be trying to relay
> the CANCEL as no 100 Trying had been received from the server.
> I have noticed that in 2.x OpenSIPS will now send CANCELs for
> transactions even when 100 Trying was not received. Is that
> intentional? RFC 3261 states that no CANCEL should be sent
> unless a provisional response has been received.
>
> Any assistance with this would be appreciated.
>
> [1] -
> http://www.opensips.org/Documentation/Script-CoreParameters-2-4#toc66
>
> Ben Newlin
>
>
>
>
> _______________________________________________
>
> Users mailing list
>
> Users at lists.opensips.org <mailto:Users at lists.opensips.org>
>
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20181026/dab72e49/attachment-0001.html>
More information about the Users
mailing list