[OpenSIPS-Users] Autoscaler in 3.2.x

Bogdan-Andrei Iancu bogdan at opensips.org
Tue Sep 6 08:18:17 UTC 2022


Hi Yury,

Thanks for the info. I see that the stuck process (24) is an 
auto-scalled one (based on its id). Do you have SIP traffic from UDP to 
TCP or doing some HEP capturing for SIP ? I saw a recent similar report 
where a UDP auto-scalled worked got stuck when trying to do some 
communication with the TCP main/manager process (in order to handle a 
TCP operation).

BTW, any chance to do a "opensips-cli -x trap" when you have that stuck 
process, just to see where is it stuck? and is it hard to reproduce? as 
I may ask you to extract some information from the running process....

Regards,

Bogdan-Andrei Iancu

OpenSIPS Founder and Developer
   https://www.opensips-solutions.com
OpenSIPS Summit 27-30 Sept 2022, Athens
   https://www.opensips.org/events/Summit-2022Athens/

On 9/3/22 6:54 PM, Yury Kirsanov wrote:
> Hi Bogdan,
> This has finally happened, OS is stuck again in 100% for one of its 
> processes. Here's the output of load: command:
>
> opensips-cli -x mi get_statistics load:
> {
>     "load:load-proc-1": 0,
>     "load:load1m-proc-1": 0,
>     "load:load10m-proc-1": 0,
>     "load:load-proc-2": 0,
>     "load:load1m-proc-2": 0,
>     "load:load10m-proc-2": 0,
>     "load:load-proc-3": 0,
>     "load:load1m-proc-3": 0,
>     "load:load10m-proc-3": 0,
>     "load:load-proc-4": 0,
>     "load:load1m-proc-4": 0,
>     "load:load10m-proc-4": 0,
>     "load:load-proc-5": 0,
>     "load:load1m-proc-5": 0,
>     "load:load10m-proc-5": 8,
>     "load:load-proc-6": 0,
>     "load:load1m-proc-6": 0,
>     "load:load10m-proc-6": 6,
>     "load:load-proc-13": 0,
>     "load:load1m-proc-13": 0,
>     "load:load10m-proc-13": 0,
>     "load:load-proc-14": 0,
>     "load:load1m-proc-14": 0,
>     "load:load10m-proc-14": 0,
>     "load:load-proc-21": 0,
>     "load:load1m-proc-21": 0,
>     "load:load10m-proc-21": 0,
>     "load:load-proc-22": 0,
>     "load:load1m-proc-22": 0,
>     "load:load10m-proc-22": 0,
>     "load:load-proc-23": 0,
>     "load:load1m-proc-23": 0,
>     "load:load10m-proc-23": 0,
>     "load:load-proc-24": 100,
>     "load:load1m-proc-24": 100,
>     "load:load10m-proc-24": 100,
>     "load:load": 12,
>     "load:load1m": 12,
>     "load:load10m": 14,
>     "load:load-all": 10,
>     "load:load1m-all": 10,
>     "load:load10m-all": 11,
>     "load:processes_number": 13
> }
>
> As you can see, process 24 is consuming 100% of time for more than a 
> minute already
>
> Here's the output of process list, it's a UDP socket listener on 
> internal interface that's stuck at 100% load:
>
> opensips-cli -x mi ps
> {
>     "Processes": [
>         {
>             "ID": 0,
>             "PID": 5457,
>             "Type": "attendant"
>         },
>         {
>             "ID": 1,
>             "PID": 5463,
>             "Type": "HTTPD 10.x.x.x:8888"
>         },
>         {
>             "ID": 2,
>             "PID": 5464,
>             "Type": "MI FIFO"
>         },
>         {
>             "ID": 3,
>             "PID": 5465,
>             "Type": "time_keeper"
>         },
>         {
>             "ID": 4,
>             "PID": 5466,
>             "Type": "timer"
>         },
>         {
>             "ID": 5,
>             "PID": 5467,
>             "Type": "SIP receiver udp:10.x.x.x:5060"
>         },
>         {
>             "ID": 6,
>             "PID": 5470,
>             "Type": "SIP receiver udp:10.x.x.x:5060"
>         },
>         {
>             "ID": 13,
>             "PID": 5477,
>             "Type": "SIP receiver udp:103.x.x.x:7060"
>         },
>         {
>             "ID": 14,
>             "PID": 5478,
>             "Type": "SIP receiver udp:103.x.x.x:7060"
>         },
>         {
>             "ID": 21,
>             "PID": 5485,
>             "Type": "TCP receiver"
>         },
>         {
>             "ID": 22,
>             "PID": 5486,
>             "Type": "Timer handler"
>         },
>         {
>             "ID": 23,
>             "PID": 5487,
>             "Type": "TCP main"
>         },
>         {
>             "ID": 24,
>             "PID": 5759,
>             "Type": "SIP receiver udp:10.x.x.x:5060"
>         }
>     ]
> }
>
> opensips -V
> version: opensips 3.2.8 (x86_64/linux)
> flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, 
> Q_MALLOC, F_MALLOC, HP_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, 
> MAX_URI_SIZE 1024, BUF_SIZE 65535
> poll method support: poll, epoll, sigio_rt, select.
> git revision: d2496fed5
> main.c compiled on 16:17:53 Aug 24 2022 with gcc 9
>
> This time server has some load but still it's not heavy at all plus 
> I'm using async requests for REST queries.
>
> This is my autoscaling section:
>
> # Scaling section
> auto_scaling_profile = PROFILE_UDP_PUB
>      scale up to 16 on 70% for 4 cycles within 5
>      scale down to 2 on 20% for 5 cycles
>
> auto_scaling_profile = PROFILE_UDP_PRIV
>      scale up to 16 on 70% for 4 cycles within 5
>      scale down to 2 on 20% for 5 cycles
>
> auto_scaling_profile = PROFILE_TCP
>      scale up to 16 on 70% for 4 cycles within 5
>      scale down to 2 on 20% for 10 cycles
>
> And that's how I apply it to sockets, I'm not applying it to UDP 
> workers at all:
>
> socket=udp:10.x.x.x:5060 use_auto_scaling_profile PROFILE_UDP_PRIV
> socket=udp:103.x.x.x:7060 use_auto_scaling_profile PROFILE_UDP_PUB
>
> tcp_workers = 1 use_auto_scaling_profile PROFILE_TCP
>
> I can't get this process unstuck until I restart OpenSIPS.
>
> Just to add - if I turn off auto scaling and enable 16 UDP and 16 TCP 
> workers and just specify sockets without any parameters - load goes to 
> 0, see graph attached, load was at 25% all the time until I restarted 
> OpenSIPS in normal mode, then it's immediately 0:
>
> image.png
>
> Here's an output of load:
>
> opensips-cli -x mi get_statistics load:
> {
>     "load:load-proc-1": 0,
>     "load:load1m-proc-1": 0,
>     "load:load10m-proc-1": 0,
>     "load:load-proc-2": 0,
>     "load:load1m-proc-2": 0,
>     "load:load10m-proc-2": 0,
>     "load:load-proc-3": 0,
>     "load:load1m-proc-3": 0,
>     "load:load10m-proc-3": 0,
>     "load:load-proc-4": 0,
>     "load:load1m-proc-4": 0,
>     "load:load10m-proc-4": 0,
>     "load:load-proc-5": 0,
>     "load:load1m-proc-5": 0,
>     "load:load10m-proc-5": 2,
>     "load:load-proc-6": 0,
>     "load:load1m-proc-6": 0,
>     "load:load10m-proc-6": 0,
>     "load:load-proc-7": 0,
>     "load:load1m-proc-7": 0,
>     "load:load10m-proc-7": 1,
>     "load:load-proc-8": 0,
>     "load:load1m-proc-8": 0,
>     "load:load10m-proc-8": 1,
>     "load:load-proc-9": 0,
>     "load:load1m-proc-9": 0,
>     "load:load10m-proc-9": 1,
>     "load:load-proc-10": 0,
>     "load:load1m-proc-10": 0,
>     "load:load10m-proc-10": 0,
>     "load:load-proc-11": 0,
>     "load:load1m-proc-11": 0,
>     "load:load10m-proc-11": 3,
>     "load:load-proc-12": 0,
>     "load:load1m-proc-12": 0,
>     "load:load10m-proc-12": 2,
>     "load:load-proc-13": 0,
>     "load:load1m-proc-13": 0,
>     "load:load10m-proc-13": 1,
>     "load:load-proc-14": 0,
>     "load:load1m-proc-14": 0,
>     "load:load10m-proc-14": 3,
>     "load:load-proc-15": 0,
>     "load:load1m-proc-15": 0,
>     "load:load10m-proc-15": 2,
>     "load:load-proc-16": 0,
>     "load:load1m-proc-16": 0,
>     "load:load10m-proc-16": 1,
>     "load:load-proc-17": 0,
>     "load:load1m-proc-17": 0,
>     "load:load10m-proc-17": 4,
>     "load:load-proc-18": 0,
>     "load:load1m-proc-18": 0,
>     "load:load10m-proc-18": 2,
>     "load:load-proc-19": 0,
>     "load:load1m-proc-19": 0,
>     "load:load10m-proc-19": 3,
>     "load:load-proc-20": 0,
>     "load:load1m-proc-20": 0,
>     "load:load10m-proc-20": 2,
>     "load:load-proc-21": 0,
>     "load:load1m-proc-21": 0,
>     "load:load10m-proc-21": 0,
>     "load:load-proc-22": 0,
>     "load:load1m-proc-22": 0,
>     "load:load10m-proc-22": 0,
>     "load:load-proc-23": 0,
>     "load:load1m-proc-23": 0,
>     "load:load10m-proc-23": 0,
>     "load:load-proc-24": 0,
>     "load:load1m-proc-24": 0,
>     "load:load10m-proc-24": 0,
>     "load:load-proc-25": 0,
>     "load:load1m-proc-25": 0,
>     "load:load10m-proc-25": 0,
>     "load:load-proc-26": 0,
>     "load:load1m-proc-26": 0,
>     "load:load10m-proc-26": 0,
>     "load:load-proc-27": 0,
>     "load:load1m-proc-27": 0,
>     "load:load10m-proc-27": 0,
>     "load:load-proc-28": 0,
>     "load:load1m-proc-28": 0,
>     "load:load10m-proc-28": 0,
>     "load:load-proc-29": 0,
>     "load:load1m-proc-29": 0,
>     "load:load10m-proc-29": 0,
>     "load:load-proc-30": 0,
>     "load:load1m-proc-30": 0,
>     "load:load10m-proc-30": 0,
>     "load:load-proc-31": 0,
>     "load:load1m-proc-31": 0,
>     "load:load10m-proc-31": 0,
>     "load:load-proc-32": 0,
>     "load:load1m-proc-32": 0,
>     "load:load10m-proc-32": 0,
>     "load:load-proc-33": 0,
>     "load:load1m-proc-33": 0,
>     "load:load10m-proc-33": 0,
>     "load:load-proc-34": 0,
>     "load:load1m-proc-34": 0,
>     "load:load10m-proc-34": 0,
>     "load:load-proc-35": 3,
>     "load:load1m-proc-35": 0,
>     "load:load10m-proc-35": 0,
>     "load:load-proc-36": 0,
>     "load:load1m-proc-36": 0,
>     "load:load10m-proc-36": 0,
>     "load:load-proc-37": 0,
>     "load:load1m-proc-37": 0,
>     "load:load10m-proc-37": 0,
>     "load:load-proc-38": 0,
>     "load:load1m-proc-38": 0,
>     "load:load10m-proc-38": 0,
>     "load:load-proc-39": 0,
>     "load:load1m-proc-39": 0,
>     "load:load10m-proc-39": 0,
>     "load:load-proc-40": 0,
>     "load:load1m-proc-40": 0,
>     "load:load10m-proc-40": 0,
>     "load:load-proc-41": 0,
>     "load:load1m-proc-41": 0,
>     "load:load10m-proc-41": 0,
>     "load:load-proc-42": 0,
>     "load:load1m-proc-42": 0,
>     "load:load10m-proc-42": 0,
>     "load:load-proc-43": 0,
>     "load:load1m-proc-43": 0,
>     "load:load10m-proc-43": 0,
>     "load:load-proc-44": 0,
>     "load:load1m-proc-44": 0,
>     "load:load10m-proc-44": 0,
>     "load:load-proc-45": 0,
>     "load:load1m-proc-45": 0,
>     "load:load10m-proc-45": 0,
>     "load:load-proc-46": 0,
>     "load:load1m-proc-46": 0,
>     "load:load10m-proc-46": 0,
>     "load:load-proc-47": 0,
>     "load:load1m-proc-47": 0,
>     "load:load10m-proc-47": 0,
>     "load:load-proc-48": 0,
>     "load:load1m-proc-48": 0,
>     "load:load10m-proc-48": 0,
>     "load:load-proc-49": 0,
>     "load:load1m-proc-49": 0,
>     "load:load10m-proc-49": 0,
>     "load:load-proc-50": 0,
>     "load:load1m-proc-50": 0,
>     "load:load10m-proc-50": 0,
>     "load:load-proc-51": 0,
>     "load:load1m-proc-51": 0,
>     "load:load10m-proc-51": 0,
>     "load:load-proc-52": 0,
>     "load:load1m-proc-52": 0,
>     "load:load10m-proc-52": 0,
>     "load:load-proc-53": 0,
>     "load:load1m-proc-53": 0,
>     "load:load10m-proc-53": 0,
>     "load:load-proc-54": 0,
>     "load:load1m-proc-54": 0,
>     "load:load10m-proc-54": 0,
>     "load:load": 0,
>     "load:load1m": 0,
>     "load:load10m": 0,
>     "load:load-all": 0,
>     "load:load1m-all": 0,
>     "load:load10m-all": 0,
>     "load:processes_number": 55
> }
>
>
> Hope this is all the information you need! Thanks!
>
> Best regards,
> Yury.
>
> On Thu, Aug 25, 2022 at 8:24 PM Bogdan-Andrei Iancu 
> <bogdan at opensips.org <mailto:bogdan at opensips.org>> wrote:
>
>     Hi Yury,
>
>     And when that scaling up happens, do you actually have traffic ?
>     or your OpenSIPS is idle ?
>
>     Also, could you run `opensips-cli -x mi get_statistics load:` (not
>     the colon at the end).
>
>     Regards,
>
>     Bogdan-Andrei Iancu
>
>     OpenSIPS Founder and Developer
>        https://www.opensips-solutions.com  <https://www.opensips-solutions.com>
>     OpenSIPS Summit 27-30 Sept 2022, Athens
>        https://www.opensips.org/events/Summit-2022Athens/  <https://www.opensips.org/events/Summit-2022Athens/>
>
>     On 8/25/22 10:57 AM, Yury Kirsanov wrote:
>>     Hi all,
>>     I've ran into a strange issue, if I enable autoscaler on OpenSIPS
>>     3.2.x (tried 5,6,7 and now 8) on a server without any load using
>>     'socket' statement like this:
>>
>>     auto_scaling_profile = PROFILE_UDP_PRIV
>>          scale up to 16 on 30% for 4 cycles within 5
>>          scale down to 2 on 10% for 5 cycles
>>
>>     udp_workers=4
>>
>>     socket=udp:10.x.x.x:5060 use_auto_scaling_profile PROFILE_UDP_PRIV
>>
>>     then after a while OpenSIPS load goes up to some high number,
>>     autoscaler starts to open new processes up to a maximum number
>>     specified in profile and them load stays at that number, for example:
>>
>>     opensips-cli -x mi get_statistics load
>>     {
>>         "load:load": 60
>>     }
>>
>>     It never changes and looks just 'stuck'.
>>
>>     Any ideas why this is happening in my case? Or should I file a
>>     bug report? Thanks.
>>
>>     Regards,
>>     Yury.
>>
>>     _______________________________________________
>>     Users mailing list
>>     Users at lists.opensips.org  <mailto:Users at lists.opensips.org>
>>     http://lists.opensips.org/cgi-bin/mailman/listinfo/users  <http://lists.opensips.org/cgi-bin/mailman/listinfo/users>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20220906/9deb3191/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 24725 bytes
Desc: not available
URL: <http://lists.opensips.org/pipermail/users/attachments/20220906/9deb3191/attachment-0001.png>


More information about the Users mailing list