[OpenSIPS-Users] tcp_async timeouts confusion
steve.brisson at librestream.com
Tue Jan 9 15:25:16 EST 2018
Thanks for the explanation! That matches what I was expecting to see so I think there is an issue worth examining here.
*** Server Details ***
version: opensips 2.3.2 (x86_64/linux)
flags: STATS: On, DISABLE_NAGLE, USE_MCAST, SHM_MMAP, PKG_MALLOC, F_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16, MAX_URI_SIZE 1024, BUF_SIZE 65535
poll method support: poll, epoll_lt, epoll_et, sigio_rt, select.
main.c compiled on 15:38:40 Nov 24 2017 with gcc 5.4.0
My main problem scenario is establishing a connection between an endpoint local to opensips, and an endpoint registered to a cisco vcs, using either TCP or TLS. Both transports fail in slightly different ways I will describe below. The opensips server is running on an AWS server so I have the advertised_address and listen aliases set to deal with IP translation. The only tcp timeout I had configured was tcp_connect_timeout=3000.
*** Using TCP ***
After the invite is sent to the vcs, tcpdump at the opensips server showed 100, 180, and 200 OK responses from the vcs arriving and ACK'd correctly at the opensips server. The 100 response arrived 185ms after the invite is sent. But, I don't see these responses in the branch's onreply_route, the global onreply_route, or in the log at DBG level. netstat -t shows the connection with the data in the recv-q that never reaches 0. This implies to me that opensips is not polling that connection correctly for recv data.
If I disable tcp_async then the call is completed successfully. So in the case that works, I have tcp_connect_timeout=3000 and tcp_async=0.
*** Using TLS ***
Running tcpdump, I see the opensips server send a Client Hello then a FIN packet 100ms later. The vcs responds with a Server Hello 200ms after the Client Hello and this gets RST.
To workaround this case, I set tls_handshake_timeout=3000 and tls_send_timeout=1000. Maybe this is the correct behavior, I'm still not 100% sure how the tls parameters function.
*** Conclusion ***
In both the TCP and TLS cases it seems like the tcp_connect_timeout isn't being used as expected.
So to workaround this, I went from having only tcp_connect_timeout=3000 to:
modparam("proto_tcp", "tcp_async", 0)
modparam("proto_tcp", "tcp_send_timeout", 1000)
modparam("tls_mgm", "tls_handshake_timeout", 3000)
modparam("tls_mgm", "tls_send_timeout", 1000)
Please let me know if I am unclear in my description of the issue. There is a lot of details to go through.
Thanks again for the quick response.
From: Users [mailto:users-bounces at lists.opensips.org] On Behalf Of Liviu Chircu
Sent: Tuesday, January 9, 2018 3:20 AM
To: users at lists.opensips.org
Subject: Re: [OpenSIPS-Users] tcp_async timeouts confusion
The transport layer was heavily refactored roughly three years ago, see ,  and  for the relevant commits which, indeed, bumped the default connect timeout down a lot, to a much lower value (10s -> 100ms). Although 100ms might seem unnecessary (it's async! let it sleep as long as it wants!), keep in mind that the TLS support isn't async at all, yet it will also make use of the same, default "tcp_connect_timeout" - a 10s default here is quite bad for high traffic volume TLS proxies which often need to open up lots of TCP/TLS connections.
All in all, the "tcp_connect_timeout" should not get ignored at all. The "tcp_async_local_connect_timeout"  is the first one that hits, after which the connect waiting will be performed by a non-TCP worker, up to "tcp_connect_timeout" milliseconds. If it doesn't behave like this, let us know, and we'll look into it more.
On 08.01.2018 22:38, Steve Brisson wrote:
I've run into some issues related to tcp_async and tcp/tls timeouts since upgrading opensips from v1.8 to v2.3.
Based on my v1.8 config, I had the tcp_connect_timeout set to 3 secs but this gets ignored in v2.3 because tcp_async is enabled by default. As a result, calls made from a local opensips endpoint to a remote registered endpoint (through a cisco vcs) were failing. I then noticed that the tcp/tls timeouts were aggressively reduced from 10-30s to 100ms by default with the tcp_async feature.
My main questions are:
- How is the tcp_async feature supposed to function if the tcp_async_local_connect_timeout expires? The code seems to imply that the socket gets put onto a tcp main thread and handled.
- 100ms seems pretty short as a default for these timeouts, especially tls. Does a timeout result in the sip request getting cancelled or is there still some processing that can occur after because of handling on the tcp main thread.
In short, I'm a confused about what the tcp_async feature does and how the timeouts should be set. Any explanations would be greatly appreciated.
Thanks for your time.
Users mailing list
Users at lists.opensips.org<mailto:Users at lists.opensips.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Users