[OpenSIPS-Users] Autoscaler in 3.2.x

Yury Kirsanov y.kirsanov at gmail.com
Wed Sep 14 15:49:53 UTC 2022


Hi Bogdan,
Looks like my problem was quite complex as I had following issues:

1. tcp_async was off
2. TCP timeouts were set to be very high

I've tried to just enable tcp_async and that didn't help - after restart
and TCP SYN storm OpenSIPS started to consume memory and processes got
locked up again. Then I started to tune other parameters. Here's like it
was before:

# Proto TCP
loadmodule "proto_tcp.so"
modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_send_timeout", 5000)
modparam("proto_tcp", "tcp_async_local_connect_timeout", 5000)
modparam("proto_tcp", "tcp_async_local_write_timeout", 5000)
modparam("proto_tcp", "tcp_max_msg_chunks", 16)

I had a very high tcp_send_timout because some of our customers are
connecting from across the globe and have high latency times, of course
that's not 5 seconds but I set it that high just to make sure they will be
able to connect. Now I ended up with this config:

# Proto TCP
loadmodule "proto_tcp.so"
modparam("proto_tcp", "tcp_async", 1)
modparam("proto_tcp", "tcp_send_timeout", 1000)
modparam("proto_tcp", "tcp_async_local_connect_timeout", 500)
modparam("proto_tcp", "tcp_async_local_write_timeout", 500)
modparam("proto_tcp", "tcp_max_msg_chunks", 16)
modparam("proto_tcp", "tcp_parallel_handling", 1)

And looks like OpenSIPS is now able to survive restarts!

One more thing I tried before was to rate-limit TCP connections on iptables
- that also helped even in my incorrect configuration and with blocking TCP
mode. I rate-limited TCP SYN packets on my public interface on TCP ports
that go to OpenSIPS using iptables rate-limit module with 10 packets per
second and 50 packets burst - that also seemed to help. This can be
adjusted as required depending on new connections load. Hope this helps
someone who would run into the same troubles!

I will continue monitoring our OpenSIPS instances and if everything works
fine after restart I will enable auto-scaler to test it with the new patch.

Thanks a lot for your help, Bogdan, that's much appreciated!

Best regards,
Yury.

On Thu, Sep 15, 2022 at 1:22 AM Yury Kirsanov <y.kirsanov at gmail.com> wrote:

> Hi Bogdan,
> Thanks for your answer, I've checked my configs and yes, for some reason I
> had tcp_async off!!! I will definitely switch it on for now and then give
> it a try!!! Can't believe I missed that one!!!
>
> Best regards,
> Yury.
>
> On Wed, Sep 14, 2022 at 10:58 PM Bogdan-Andrei Iancu <bogdan at opensips.org>
> wrote:
>
>> Hi Yury,
>>
>> You need to check the TCP setting and to be sure your OpenSIPS will (1)
>> not try to perform TCP connect against destination known not to be able to
>> accept (like TCP/WS end points behind NAT) - see the tcp_no_new_conn_bflag
>> [1] - or (2) not block for long time while attempting a connect - see the
>> tcp_connect_timeout [2] or consider enabling async [3].
>>
>> [1]
>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_no_new_conn_bflag
>> [2]
>> https://www.opensips.org/Documentation/Script-CoreParameters-3-2#tcp_connect_timeout
>> [3] https://opensips.org/html/docs/modules/3.2.x/proto_tcp.html#idp168992
>>
>> Regards,
>>
>> Bogdan-Andrei Iancu
>>
>> OpenSIPS Founder and Developer
>>   https://www.opensips-solutions.com
>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>   https://www.opensips.org/events/Summit-2022Athens/
>>
>> On 9/13/22 12:01 PM, Yury Kirsanov wrote:
>>
>> Hi Bogdan,
>> Thanks for this update, but it looks like I can't check autoscaler
>> because of this first issue with blocking TCP connect. Is there a way to
>> resolve it? Am I doing something wrong? Or is that something to do with
>> OpenSIPS code? As yes, you're right, as soon as I restart OpenSIPS having a
>> lot of SIP devices trying to connect to it - it goes crazy, starts to
>> consume memory and stops to forward packets sitting there at 100% load
>> until it runs out of memory and segfaults. Sometimes I can't even restart
>> it to come to normal state to make it work, it just loops into same crash
>> whatever I try to do.
>>
>> I've compiled OpenSIPS 3.3.1 with your patch and was able to start it but
>> not sure, maybe I was just lucky this time.
>>
>> What should I do? Thanks!
>>
>> Best regards,
>> Yury.
>>
>> On Tue, 13 Sept 2022, 18:56 Bogdan-Andrei Iancu, <bogdan at opensips.org>
>> wrote:
>>
>>> Hi Yury,
>>>
>>> it looks like you some multiple issues, overlapping here. The traps you
>>> sent here have nothing to do with the auto-scaling, but with a blocking TCP
>>> connect for SIP - most of the procs get blocked into a sync TCP connect.
>>>
>>> Regards,
>>>
>>> Bogdan-Andrei Iancu
>>>
>>> OpenSIPS Founder and Developer
>>>   https://www.opensips-solutions.com
>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>
>>> On 9/12/22 4:39 PM, Yury Kirsanov wrote:
>>>
>>> Hi Bogdan,
>>> I've applied the patch (had to find where to apply it manually for 3.2.8
>>> downloaded from Web page, line 1568 instead of 1652) and restarted the
>>> server with only about 300-350 SIP devices and immediately got into same
>>> issue. I'm attaching two GDB dumps made within several minutes from each
>>> other. Autoscale was now OFF, please see my previous message as currently
>>> for some reason I'm experiencing lockups even when it's off :(
>>>
>>>
>>> Best regards,
>>> Yury.
>>>
>>> On Mon, Sep 12, 2022 at 7:48 PM Bogdan-Andrei Iancu <bogdan at opensips.org>
>>> wrote:
>>>
>>>> Hi Yuri,
>>>>
>>>> Could you give this patch a try? it should fix the blocking you
>>>> experience (it should apply on 3.2 too).
>>>>
>>>> Best regards,
>>>>
>>>> Bogdan-Andrei Iancu
>>>>
>>>> OpenSIPS Founder and Developer
>>>>   https://www.opensips-solutions.com
>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>>
>>>> On 9/7/22 2:54 PM, Bogdan-Andrei Iancu wrote:
>>>>
>>>> Hi Yury,
>>>>
>>>> Thanks for the details info here - let me do a review of some code and
>>>> run some tests, as at this point I have a good idea on the direction to dig
>>>> into.
>>>>
>>>> I will update here.
>>>>
>>>> Best regards,
>>>>
>>>> Bogdan-Andrei Iancu
>>>>
>>>> OpenSIPS Founder and Developer
>>>>   https://www.opensips-solutions.com
>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>>
>>>> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
>>>>
>>>> Hi Bogdan,
>>>> Yes, I'm listening on all types of sockets including UDP, TCP and TLS
>>>> on the outside public interface and then forward traffic into internal LAN
>>>> via UDP only.
>>>>
>>>> Previously it was getting stuck quite easily, now I had to wait for a
>>>> while before this actually happened. I've routed part of my customers to
>>>> this server to obtain this result so I will have to do that again.
>>>>
>>>> As soon as I see one of the processes stuck I'll dot the trap command
>>>> and send you all the details including processes load, ps output and so on.
>>>>
>>>> For now I had to switch autoscaling off and just create many listeners.
>>>> Do I understand correctly that I need to restart OpenSIPS in order to apply
>>>> autoscaling profiles and reload-routes is not sufficient?
>>>>
>>>> Also, do I need separate UDP profiles for public and private
>>>> interfaces? And do I need to apply autoscaling profile just to a socket or
>>>> I need to specify udp or tcp_workers with autoscaler too?
>>>>
>>>> Thanks and best regards,
>>>> Yury.
>>>>
>>>> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, <bogdan at opensips.org>
>>>> wrote:
>>>>
>>>>> Hi Yury,
>>>>>
>>>>> Thanks for the info. I see that the stuck process (24) is an
>>>>> auto-scalled one (based on its id). Do you have SIP traffic from UDP to TCP
>>>>> or doing some HEP capturing for SIP ? I saw a recent similar report where a
>>>>> UDP auto-scalled worked got stuck when trying to do some communication with
>>>>> the TCP main/manager process (in order to handle a TCP operation).
>>>>>
>>>>> BTW, any chance to do a "opensips-cli -x trap" when you have that
>>>>> stuck process, just to see where is it stuck? and is it hard to reproduce?
>>>>> as I may ask you to extract some information from the running process....
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bogdan-Andrei Iancu
>>>>>
>>>>> OpenSIPS Founder and Developer
>>>>>   https://www.opensips-solutions.com
>>>>> OpenSIPS Summit 27-30 Sept 2022, Athens
>>>>>   https://www.opensips.org/events/Summit-2022Athens/
>>>>>
>>>>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Users mailing listUsers at lists.opensips.orghttp://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>>
>>>>
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20220915/606d29ca/attachment-0001.html>


More information about the Users mailing list