[OpenSIPS-Users] Autoscaler in 3.2.x

Bogdan-Andrei Iancu bogdan at opensips.org
Thu Sep 15 06:01:30 UTC 2022


Hi Yury,

For the crash -> is there any core file to check ?

For mem usage -> you should try to get a memory dump for further 
investigation [1].

[1] https://opensips.org/Documentation/TroubleShooting-OutOfMem

Best regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/

On 9/14/22 10:13 PM, Yury Kirsanov wrote:
> Hi Bogdan,
> Thanks a lot for your help and support! The only question I know have 
> is why OpenSIPS was going into a crash if all TCP processes were 
> blocked waiting for connection? It was starting to consume more and 
> more memory and then it was crashing with a segfault upon reaching 
> then -m memory parameter. I do understand that TCP listeners were in a 
> blocking mode and were not able to do any work until the session could 
> be fully established, not being able to forward any SIP packets, but 
> isn't that a bug that OpenSIPS was starting to eat memory and then 
> crash? Do I need to open a bug report on this? Thanks!
>
> Best regards,
> Yury.
>
> On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu 
> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>
>     Hi Yury,
>
>     You need to check the TCP setting and to be sure your OpenSIPS
>     will (1) not try to perform TCP connect against destination known
>     not to be able to accept (like TCP/WS end points behind NAT) - see
>     the tcp_no_new_conn_bflag [1] - or (2) not block for long time
>     while attempting a connect - see the tcp_connect_timeout [2] or
>     consider enabling async [3].
>
>     [1]
>     https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
>     <https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag>
>     [2]
>     https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
>     <https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout>
>     [3]
>     https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>     <https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992>
>
>     Regards,
>
>     Bogdan-Andrei Iancu
>
>     OpenSIPS Founder and Developer
>        https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>     OpenSIPS Summit 27-30 Sept 2022, Athens
>        https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>
>     On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>>     Hi Bogdan,
>>     Thanks for this update, but it looks like I can't check
>>     autoscaler because of this first issue with blocking TCP connect.
>>     Is there a way to resolve it? Am I doing something wrong? Or is
>>     that something to do with OpenSIPS code? As yes, you're right, as
>>     soon as I restart OpenSIPS having a lot of SIP devices trying to
>>     connect to it - it goes crazy, starts to consume memory and stops
>>     to forward packets sitting there at 100% load until it runs out
>>     of memory and segfaults. Sometimes I can't even restart it to
>>     come to normal state to make it work, it just loops into same
>>     crash whatever I try to do.
>>
>>     I've compiled OpenSIPS 3.3.1 with your patch and was able to
>>     start it but not sure, maybe I was just lucky this time.
>>
>>     What should I do? Thanks!
>>
>>     Best regards,
>>     Yury.
>>
>>     On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu,
>>     <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>
>>         Hi Yury,
>>
>>         it looks like you some multiple issues, overlapping here. The
>>         traps you sent here have nothing to do with the auto-scaling,
>>         but with a blocking TCP connect for SIP - most of the procs
>>         get blocked into a sync TCP connect.
>>
>>         Regards,
>>
>>         Bogdan-Andrei Iancu
>>
>>         OpenSIPS Founder and Developer
>>            https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>>         OpenSIPS Summit 27-30 Sept 2022, Athens
>>            https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>>
>>         On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>>         Hi Bogdan,
>>>         I've applied the patch (had to find where to apply it
>>>         manually for 3.2.8 downloaded from Web page, line 1568
>>>         instead of 1652) and restarted the server with only about
>>>         300-350 SIP devices and immediately got into same issue. I'm
>>>         attaching two GDB dumps made within several minutes from
>>>         each other. Autoscale was now OFF, please see my previous
>>>         message as currently for some reason I'm experiencing
>>>         lockups even when it's off :(
>>
>>>         Best regards,
>>>         Yury.
>>>
>>>         On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu
>>>         <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>>
>>>             Hi Yuri,
>>>
>>>             Could you give this patch a try? it should fix the
>>>             blocking you experience (it should apply on 3.2 too).
>>>
>>>             Best regards,
>>>
>>>             Bogdan-Andrei Iancu
>>>
>>>             OpenSIPS Founder and Developer
>>>                https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>>>             OpenSIPS Summit 27-30 Sept 2022, Athens
>>>                https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>>>
>>>             On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>>>             Hi Yury,
>>>>
>>>>             Thanks for the details info here - let me do a review
>>>>             of some code and run some tests, as at this point I
>>>>             have a good idea on the direction to dig into.
>>>>
>>>>             I will update here.
>>>>
>>>>             Best regards,
>>>>             Bogdan-Andrei Iancu
>>>>
>>>>             OpenSIPS Founder and Developer
>>>>                https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>>>>             OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>                https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>>>>             On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>>>>             Hi Bogdan,
>>>>>             Yes, I'm listening on all types of sockets including
>>>>>             UDP, TCP and TLS on the outside public interface and
>>>>>             then forward traffic into internal LAN via UDP only.
>>>>>
>>>>>             Previously it was getting stuck quite easily, now I
>>>>>             had to wait for a while before this actually happened.
>>>>>             I've routed part of my customers to this server to
>>>>>             obtain this result so I will have to do that again.
>>>>>
>>>>>             As soon as I see one of the processes stuck I'll dot
>>>>>             the trap command and send you all the details
>>>>>             including processes load, ps output and so on.
>>>>>
>>>>>             For now I had to switch autoscaling off and just
>>>>>             create many listeners. Do I understand correctly that
>>>>>             I need to restart OpenSIPS in order to apply
>>>>>             autoscaling profiles and reload-routes is not sufficient?
>>>>>
>>>>>             Also, do I need separate UDP profiles for public and
>>>>>             private interfaces? And do I need to apply autoscaling
>>>>>             profile just to a socket or I need to specify udp or
>>>>>             tcp_workers with autoscaler too?
>>>>>
>>>>>             Thanks and best regards,
>>>>>             Yury.
>>>>>
>>>>>             On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu,
>>>>>             <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>>>>>
>>>>>                 Hi Yury,
>>>>>
>>>>>                 Thanks for the info. I see that the stuck process
>>>>>                 (24) is an auto-scalled one (based on its id). Do
>>>>>                 you have SIP traffic from UDP to TCP or doing some
>>>>>                 HEP capturing for SIP ? I saw a recent similar
>>>>>                 report where a UDP auto-scalled worked got stuck
>>>>>                 when trying to do some communication with the TCP
>>>>>                 main/manager process (in order to handle a TCP
>>>>>                 operation).
>>>>>
>>>>>                 BTW, any chance to do a "opensips-cli -x trap"
>>>>>                 when you have that stuck process, just to see
>>>>>                 where is it stuck? and is it hard to reproduce? as
>>>>>                 I may ask you to extract some information from the
>>>>>                 running process....
>>>>>
>>>>>                 Regards,
>>>>>
>>>>>                 Bogdan-Andrei Iancu
>>>>>
>>>>>                 OpenSIPS Founder and Developer
>>>>>                    https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>>>>>                 OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>>                    https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>>>>>
>>>>>                 On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>>>>
>>>>
>>>>
>>>>             _______________________________________________
>>>>             Users mailing list
>>>>             Users at lists.opensips.org  <mailto:Users at lists.opensips.org>
>>>>             http://lists.opensips.org/cgi-bin/mailman/listinfo/users  <http://lists.opensips.org/cgi-bin/mailman/listinfo/users>
>>>
>>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20220915/4a16dc78/attachment-0001.html>


More information about the Users mailing list