[OpenSIPS-Users] Opensips TCP children deadlock

Yuval Dinari yuval.dinari at vonage.com
Thu Jul 12 09:07:19 EDT 2018

I have a state in which opensips gets into an unrecoverable bad state, in
which some of the tcp children process are stuck waiting to acquire a lock
which they never get.
The issue occurs in the following load test scenario:

   1. About 25K clients register in TCP (but also happens with less)
   2. All the TCP connections become unresponsive (by blocking outgoing
   traffic on the test clients machine)
   3. INVITEs are sent for each of those clients, putting their connection
   in retransmit mode
   4. After a few minutes opensips gets into a bad state - some tcp
   children run at 90-100% cpu, no traffic is being sent from the machine
   (including OPTIONS pings)
   5. After all the tcp connections die due to timeouts, opensips does not
   recover, the mentioned symptoms stay
   6. After all the registered users are removed from internal table
   there's still no change

When attaching debugger to the problematic processes (with high cpu usage)
we see that they're all stuck trying to get a lock which they never seem to
get. Stack traces:

#0  0x00007fd6b72d1bb7 in sched_yield () at
#1  0x0000000000549e65 in get_lock (lock=<optimized out>) at
#2  _tcp_write_on_socket (len=<optimized out>, buf=<optimized out>,
fd=<optimized out>, c=<optimized out>) at net/proto_tcp/proto_tcp.c:724
#3  proto_tcp_send (send_sock=0x7ffd8e12c140, buf=0x0, len=399,
to=0x7fd5c7ccdcc0, id=1) at net/proto_tcp/proto_tcp.c:922
#4  0x00007fd5a5cb7b30 in msg_send (msg=<optimized out>, len=<optimized
out>, buf=<optimized out>, id=<optimized out>, to=<optimized out>,
proto=<optimized out>,
    send_sock=0x7fd6a7208168) at ../../forward.h:123
#5  send_pr_buffer (rb=0x7fd5c7ccdca0, buf=0x7fd6a76b4a50, len=0,
ctx=0xffffffffffffffff) at t_funcs.c:66


#0  0x00007fd6b72d1bb7 in sched_yield () at
#1  0x00000000005349b8 in get_lock (lock=<optimized out>) at
#2  handle_io (event_type=<optimized out>, idx=<optimized out>,
fm=<optimized out>) at net/net_tcp_proc.c:210
#3  io_wait_loop_epoll (repeat=287, t=<optimized out>, h=<optimized out>)
at net/../io_wait_loop.h:280

This traces look the same every time we attach.
The machine opensips runs on has 4 cpus.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20180712/95a634d6/attachment.html>

More information about the Users mailing list