[OpenSIPS-Users] Rest Client Async operation

Tue Jun 25 11:41:04 EDT 2019

Liviu,

Thank you for the response and explanation, and for investigating the timing discrepancies. I must say though that this answer is very disappointing. We are not transferring large files or data sets during realtime call processing (I hope no one is!), so the transfer time is not really a concern; it is the connect time that can cause severe backlogs when the far end is not available, as in my test case. So as it stands the async functionality for rest_client is mostly useless to us as it doesn’t prevent the most common cause of congestion.

I can appreciate that libcurl does not provide access during the connect, but I guess my question would be why isn’t the entire operation run in async? Why must the connect be performed in the current process and only the transfer be in another process? My expectation when using any async function was that all execution of the underlying function called would be performed in a separate process, not that the function could pick and choose which parts would block and which would not.

Are other async function implementations similar to this? For example, does an async_db_query establish the DB connection in the current thread in blocking mode and only perform the actual query as async?

I would also recommend updating the documentation for both async and rest_client specifically to make this limitation clear, as it was not at all clear to us.

Ben Newlin

From: Users <users-bounces at lists.opensips.org> on behalf of Liviu Chircu <liviu at opensips.org>
Reply-To: OpenSIPS users mailling list <users at lists.opensips.org>
Date: Tuesday, June 25, 2019 at 8:52 AM
To: "users at lists.opensips.org" <users at lists.opensips.org>
Subject: Re: [OpenSIPS-Users] Rest Client Async operation

Hi Ben,

Regarding the "blocking rest_get()" problem: due to the specifics of the libcurl API, there is
no direct access to the socket file descriptors during the initial TCP connect handshake.  Thus,
this phase will be synchronous (blocking), while all transfers (send HTTP req, receive reply) are
done asynchronously, regardless of how the data gets send or arrives (all at once, in chunks,
byte-by-byte, etc.).

So, indeed: on the specific test case that the remote TCP endpoint is either unresponsive or it is
blocked by a firewall, the throughput of your OpenSIPS may be significantly affected when talking HTTP
to such peers even if you perform async HTTP requests.  Maybe this is just my opinion, but I believe
that the average case is when the HTTP server/API is down and the TCP connect attempt will get rejected.
See [1] for more on this topic.

Regarding the 5s vs. 8s timeouts, I will have to confirm the behavior first and get back to you.
Async is heavily reliant on tm, so I wouldn't exclude the possibility that the transaction timeout
overrides, in some cases, the async logic's timeout.

Best regards,

[1]: https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused<https://serverfault.com/questions/521359/why-do-some-connections-time-out-and-others-get-refused>

Liviu Chircu

OpenSIPS Developer

http://www.opensips-solutions.com<http://www.opensips-solutions.com>
On 25.06.2019 00:33, Ben Newlin wrote:
Hello all,

During testing I have found behavior that I cannot explain based on our current understanding of the operation of the async and rest_client functionality. I have reproduced this behavior with a fairly simple script configuration, the relevant piece of which is below:

children=8

loadmodule "rest_client.so"
modparam("rest_client", "connection_timeout", 5)
modparam("rest_client", "connect_poll_interval", 1)
modparam("rest_client", "curl_timeout", 5)

route {
  xlog("L_ALERT", "before t_newtran()\r\n");
  t_newtran();
  xlog("L_ALERT", "after t_newtran()\r\n");
  async(rest_post("http://0.0.0.0"<http://0.0.0.0>, "{}", "application/json", "$var(body)", "$var(ctype)", "$var(rcode)"), test_resume);
  xlog("L_ALERT", "after async\r\n");
}
route[test_resume] {
  xlog("L_ALERT", "test_resume: $retcode\r\n");
  t_reply("503", "Service Unavailable");
}

The expected operation is to immediately trigger t_newtran() which sends a 100 response, then timeout on the rest_post call in 5 seconds and respond with a 503. For individual calls this does occur, although the timeout consistently takes several seconds too long, usually ~8s, as can be seen in the logs from the above code:

Jun 24 19:10:56 [337] before t_newtran()
Jun 24 19:10:56 [337] after t_newtran()
Jun 24 19:11:04 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 19:11:04 [337] test_resume: -2

The extra time is troubling, but the bigger concern is that when I run multiple concurrent calls, I find that the 8 UDP children are still all blocking against the rest_post call. Here are the logs for that test:

Jun 24 21:19:51 [337] before t_newtran()
Jun 24 21:19:51 [337] after t_newtran()
Jun 24 21:19:51 [338] before t_newtran()
Jun 24 21:19:51 [338] after t_newtran()
Jun 24 21:19:51 [335] before t_newtran()
Jun 24 21:19:51 [335] after t_newtran()
Jun 24 21:19:51 [333] before t_newtran()
Jun 24 21:19:51 [333] after t_newtran()
Jun 24 21:19:52 [331] before t_newtran()
Jun 24 21:19:52 [331] after t_newtran()
Jun 24 21:19:52 [336] before t_newtran()
Jun 24 21:19:52 [336] after t_newtran()
Jun 24 21:19:52 [332] before t_newtran()
Jun 24 21:19:52 [332] after t_newtran()
Jun 24 21:19:52 [334] before t_newtran()
Jun 24 21:19:52 [334] after t_newtran()
Jun 24 21:19:59 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [337] test_resume: -2
Jun 24 21:19:59 [337] before t_newtran()
Jun 24 21:19:59 [337] after t_newtran()
Jun 24 21:19:59 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [338] test_resume: -2
Jun 24 21:19:59 [338] before t_newtran()
Jun 24 21:19:59 [338] after t_newtran()
Jun 24 21:19:59 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [335] test_resume: -2
Jun 24 21:19:59 [335] before t_newtran()
Jun 24 21:19:59 [335] after t_newtran()
Jun 24 21:19:59 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [333] test_resume: -2
Jun 24 21:19:59 [333] before t_newtran()
Jun 24 21:19:59 [333] after t_newtran()
Jun 24 21:19:59 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [331] test_resume: -2
Jun 24 21:19:59 [331] before t_newtran()
Jun 24 21:19:59 [331] after t_newtran()
Jun 24 21:19:59 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [336] test_resume: -2
Jun 24 21:19:59 [336] before t_newtran()
Jun 24 21:19:59 [336] after t_newtran()
Jun 24 21:19:59 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:19:59 [332] test_resume: -2
Jun 24 21:19:59 [332] before t_newtran()
Jun 24 21:19:59 [332] after t_newtran()
Jun 24 21:20:00 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:00 [334] test_resume: -2
Jun 24 21:20:00 [334] before t_newtran()
Jun 24 21:20:00 [334] after t_newtran()
Jun 24 21:20:07 [337] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [337] test_resume: -2
Jun 24 21:20:07 [338] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [338] test_resume: -2
Jun 24 21:20:07 [335] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [335] test_resume: -2
Jun 24 21:20:07 [333] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [333] test_resume: -2
Jun 24 21:20:07 [331] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [331] test_resume: -2
Jun 24 21:20:07 [336] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [336] test_resume: -2
Jun 24 21:20:07 [332] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [332] test_resume: -2
Jun 24 21:20:07 [334] ERROR:rest_client:start_async_http_req: connect timeout on http://0.0.0.0<http://0.0.0.0> (5s)
Jun 24 21:20:07 [334] test_resume: -2

You can clearly see that the first 8 Invites are processed immediately, but then no new messages are processed until the rest queries begin to timeout. As each query times out, freeing a child process, a new inbound request is processed. Clearly the message processing is being blocked on the rest queries, which is not supposed to happen when using async. Am I missing something in my configuration or am I not understanding how async is supposed to work somehow?

Ben Newlin

_______________________________________________

Users mailing list

Users at lists.opensips.org<mailto:Users at lists.opensips.org>

http://lists.opensips.org/cgi-bin/mailman/listinfo/users<http://lists.opensips.org/cgi-bin/mailman/listinfo/users>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.opensips.org/pipermail/users/attachments/20190625/a9b11a3d/attachment-0001.html>