[OpenSIPS-Users] Fine tuning high CPS and msyql queries

Calvin Ellison calvin.ellison at voxox.com
Thu Jun 4 19:14:51 EST 2020


The scenario is INVITE -> MySQL query -> non-200 final response. No
calls are connected here, only dipping things like LRN, Do Not Call,
and Wireless/Landline. A similar service runs on a second port,
specific to a different kind of traffic and dip. We're using async
avp_db_query and memcached, with about 3:1 cache hits.

Our target is up to 10,000 CPS across two opensips servers, which are
dual-CPU Xeon E5620 with 48G RAM. Both are run memcached, and both
servers are using both memcached to share a distributed cache thanks
to this:
'modparam("cachedb_memcached","cachedb_url","memcached:lrn://lrn-d,lrn-e/")'.
At a glance there are over 200mil total cached items, distributed
nearly equally.

The issue is that individual child processes start getting suck at
100% CPU. Logs indicate connection failures to the MySQL database
causing children to run in sync mode, and there are warnings about
delayed timer jobs tm-timer and blcore-expire. Eventually, the service
becomes unresponsive. Restarting opensips restores service and the
children return to single-digit CPU utilization, but eventually,
children get suck again.

I'm not certain if the issue is on the database server, or if the
opensips servers are overloaded, or if the config is just not right
yet.

Is there an established method for fine-tuning these things?
shared memory
process memory
children
db_max_async_connections
listen=... use_children
modparam("tm", "timer_partitions", ?)

What else is worth considering?
Does a child ever return to async mode after running in sync mode?
How do I know when my servers have reached their limit?
opensips.cfg is available on request.

version: opensips 2.4.7 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC,
F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll, sigio_rt, select.
git revision: 9e1fcc915
main.c compiled on  with gcc 7

*re-built using dpkg-buildpackage including the patch to support DB
floating point types:
https://opensips.org/pipermail/users/2020-March/042528.html

$ lsb_release -d
Description:    Ubuntu 18.04.4 LTS

$ uname -a
Linux TC-521 4.15.0-91-generic #92-Ubuntu SMP Fri Feb 28 11:09:48 UTC
2020 x86_64 x86_64 x86_64 GNU/Linux

$ free -mw
              total        used        free      shared     buffers
   cache   available
Mem:          48281        1085         337          87        1729
   45128       46551

$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):           2
NUMA node(s):        2
Vendor ID:           GenuineIntel
CPU family:          6
Model:               44
Model name:          Intel(R) Xeon(R) CPU           E5620  @ 2.40GHz
Stepping:            2
CPU MHz:             2527.029
BogoMIPS:            4788.05
Virtualization:      VT-x
L1d cache:           32K
L1i cache:           32K
L2 cache:            256K
L3 cache:            12288K
NUMA node0 CPU(s):   0,2,4,6,8,10,12,14
NUMA node1 CPU(s):   1,3,5,7,9,11,13,15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr
pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe
syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca
sse4_1 sse4_2 popcnt aes lahf_lm pti ssbd ibrs ibpb stibp tpr_shadow
vnmi flexpriority ept vpid dtherm ida arat flush_l1d

Regards,

Calvin Ellison
Senior Voice Operations Engineer
calvin.ellison at voxox.com



More information about the Users mailing list