[OpenSIPS-Users] Autoscaler in 3.2.x

Yury Kirsanov y.kirsanov at gmail.com
Mon Sep 12 08:12:03 UTC 2022

Hi Bogdan,
We've run into another issue, this time I was just restarting OpenSIPS
server during busy hours when about ~2500 SIP devices were registering and
making calls (even though dialog number was only around 100-200 but there
were a lot of packets) and I was unable to successfully restart OpenSIPS,
it was getting some processes stuck almost immediately at 100% load and
then they were starting to consume more and more memory and after eating up
all the memory they were dying and OpenSIPS stopped processing SIP packets.

I believe it's similar to autoscaler issue because in this case I only had
16 UDP workers and 16 TCP workers and it was taking more time for OpenSIPS
to run into the issue, while when I had autoscaler on it wasn't able to
open that many processes at once so currently active ones were getting
stuck very fast and crash was happening almost immediately.

I'm running a localhost REDIS cache to store where to proxy each SIP packet
to and if there's no record for this SIP device then I'm querying REST
server and cache its response. REST server load was no more than 25% during
restart when all SIP devices were urgently trying to re-connect to OpenSIPS
so I don't think they're of any issue.

I'm using async REST calls and believe there should be no issues with my
configuration script even though it runs a lot of nested routes due to
async REST requests. Hopefully I didn't forget some 'exit' statements
anywhere but if it was the case - OpenSIPS service would be locking up at
any time.

OpenSIPS itself is running on a VMWare host as a virtual machine and I
could see it was consuming up to 100% CPU of a 40-core host when it was
locking up. Also VMWare readyness for VM was spiking to 1500ms during these
lock-ups meaning that VM was waiting for some cores to actually free up to
get some CPU time.

The only way out of this situation for me was to run multiple OpenSIPS VMs
and spread the load between them, no matter what I tried to do I wasn't
able to make OpenSIPS running fine again even though it was working
perfectly fine for more than a week in this configuration and under same
load, but I was starting/restarting it only during night hours when there
were no calls active.

I'm happy to share my configuration file with you privately if requred.

Hope this helps!

Thanks and best regards,

On Wed, Sep 7, 2022 at 9:54 PM Bogdan-Andrei Iancu <bogdan at opensips.org>

> Hi Yury,
> Thanks for the details info here - let me do a review of some code and run
> some tests, as at this point I have a good idea on the direction to dig
> into.
> I will update here.
> Best regards,
> Bogdan-Andrei Iancu
> On 9/6/22 11:24 AM, Yury Kirsanov wrote:
> Hi Bogdan,
> Yes, I'm listening on all types of sockets including UDP, TCP and TLS on
> the outside public interface and then forward traffic into internal LAN via
> UDP only.
> Previously it was getting stuck quite easily, now I had to wait for a
> while before this actually happened. I've routed part of my customers to
> this server to obtain this result so I will have to do that again.
> As soon as I see one of the processes stuck I'll dot the trap command and
> send you all the details including processes load, ps output and so on.
> For now I had to switch autoscaling off and just create many listeners. Do
> I understand correctly that I need to restart OpenSIPS in order to apply
> autoscaling profiles and reload-routes is not sufficient?
> Also, do I need separate UDP profiles for public and private interfaces?
> And do I need to apply autoscaling profile just to a socket or I need to
> specify udp or tcp_workers with autoscaler too?
> Thanks and best regards,
> Yury.
> On Tue, 6 Sept 2022, 18:18 Bogdan-Andrei Iancu, <bogdan at opensips.org>
> wrote:
>> Hi Yury,
>> Thanks for the info. I see that the stuck process (24) is an auto-scalled
>> one (based on its id). Do you have SIP traffic from UDP to TCP or doing
>> some HEP capturing for SIP ? I saw a recent similar report where a UDP
>> auto-scalled worked got stuck when trying to do some communication with the
>> TCP main/manager process (in order to handle a TCP operation).
>> BTW, any chance to do a "opensips-cli -x trap" when you have that stuck
>> process, just to see where is it stuck? and is it hard to reproduce? as I
>> may ask you to extract some information from the running process....
>> Regards,
>> On 9/3/22 6:54 PM, Yury Kirsanov wrote:
