Copyright © 2008 Dan Pascu
Revision History | |
---|---|
Revision $Revision: 5886 $ | $Date: 2009-07-16 12:54:34 +0200 (Thu, 16 Jul 2009) $ |
Table of Contents
List of Examples
keepalive_interval
parameterkeepalive_method
parameterkeepalive_from
parameterkeepalive_extra_headers
parameterkeepalive_state_file
parameterclient_nat_test
functionfix_contact
functionnat_keepalive
function$keepalive.socket
in multi-proxy environments$source_uri
to set the received AVP on registrars$source_uri
in multi-proxy environmentsThe nat_traversal module provides support for handling far-end NAT traversal for SIP signaling. The module includes functionality to detect user agents behind NAT, to modify SIP headers to allow user agents to work transparently behind NAT and to send keepalive messages to user agents behind NAT in order to preserve their visibility in the network. The module can handle user agents behind multiple cascaded NAT boxes as easily as user agents behind a single level of NAT.
The module is designed to work in complex environments where multiple SIP proxies may be involved in handling registration and routing and where the incoming and outgoing paths may not necessarily be the same, or where the routing path may even change between consecutive dialogs.
The nat_traversal module implements a very sophisticated keepalive mechanism, that is able to handle the most complex environments and use cases, including distributed environments with multiple proxies. Unlike existing keepalive solutions that only send keepalive messages to user agents that have registered (during their registration), the nat_traversal module can keepalive an user agent based on multiple conditions, making it not only more flexible and more efficient, but also able to work in environments and with use cases where a simple keepalive implementation based on keeping alive registrations alone cannot work.
The keepalive mechanism works by sending a SIP request to a user agent behind NAT to make that user agent send back a reply. The purpose is to have packets sent from inside the NAT to the proxy often enough to prevent the NAT box from timing out the connection. Many NAT boxes do not consider packets that travel from the outside to the inside of the NAT to reset the connection expiration timer, thus to keepalive a user agent we need to trigger an answer from it.
One of the major limitations of an implementation that only sends keepalive messages to registered user agents, is that it creates an artificial association between the concept of network visibility with the concept of user registration. The registration process only creates network visibility for incoming INVITE requests, in other words for incoming calls. However, there are other cases where a user agent needs to preserve its network visibility when behind NAT, that have nothing to do with receiving incoming calls. One of them is the ability of the user agent to keep receiving NOTIFY requests for a presence subscription it has made. Another situation is where the user agent should be able to receive all messages within a dialog it has initiated, even if it is not registered. In the first case, a presence agent is required to register to be able to receive notifications for its subscriptions and it has to keep the registration active the whole time. In the second case a user agent that wants to make an outgoing call has to register and keep the registration active during the call, otherwise it may not be able to receive future in-dialog messages, including the BYE that closes the dialog.
Not only we have this forced association shown above, that requires a user agent to register to be able to do anything, but a simple keepalive implementation based on sending keepalive messages only to registered user agents, will also fail to work in common cases, exactly because of this artificial association. For example lets assume that we have an user agent that is registered. If during an outgoing call initiated by this user agent, the agent stops registering, then it will not be able to receive further in-dialog messages after the NAT binding expires. The same is true for a presence agent, receiving notifications for its subscriptions.
In environments with multiple proxies handling the same domains, the problem gets even more acute. In this case the incoming and outgoing paths for a call may be completely different: the user agent may register using one proxy as an entry point to the network, but may make an outgoing call using a different proxy as the network entry point. Even more a registration may use a different proxy as the entry point to the network with each renewal of the registration, making it volatile and unreliable for anything else except incoming calls. A keepalive implementation that only sends keepalive messages to registered user agents will not be able to guarantee the delivery of in-dialog messages for outgoing calls even if it requires the user agent to register before making a call. In this case, even if we assume that the user agent would pick the same proxy for an outgoing call as the one it has used for the last registration, at the next registration it may pick another one (as returned by DNS), and will dissociate the incoming and outgoing paths rendering the outgoing path unusable (assuming the outgoing call takes longer than the registration period).
All this leads to the conclusion that a keepalive implementation based solely on sending keepalive messages to registered user agents can only work in single proxy environments and then only work reliably if it requires the user agent to register before doing anything else, even though some actions would not require a user agent to register.
To avoid the above mentioned issues, this implementation introduces the concept of network visibility for a given condition. This way we can keepalive a user agent for multiple independent conditions, thus avoiding all the problems presented above.
The conditions for which the module will send keepalive messages are:
Registration - for user agents that have registered to preserve their visibility for incoming calls. This is the result of triggering keepalive for a REGISTER request.
Subscription - for presence agents that have subscribed to some events to preserve their visibility for receiving back notifications. This is the result of triggering keepalive for a SUBSCRIBE request.
Dialogs - for user agents that have initiated an outgoing call to preserve their visibility for receiving further in-dialog messages. This is the result of triggering keepalive for an outgoing INVITE request.
A user agent's NAT entry point may be kept alive for one or multiple of the conditions listed above. Even when a NAT endpoint is kept alive for more than one condition, only one keepalive message is sent to that NAT endpoint. The presence of multiple conditions for a NAT endpoint, only guarantees that the network visibility for a user agent based on a certain condition will be available while that condition is true, independently of the other conditions. When all the conditions to keepalive a NAT endpoint will disappear, that endpoint will be removed from the list with the NAT endpoints that need to be kept alive.
The user interface for the keepalive functionality is very simple. It consists of a single function called nat_keepalive() that needs to be called only once for the requests that trigger the need for network visibility. These requests are: REGISTER, SUBSCRIBE and outgoing INVITEs. After such a request arrives it makes the user agent visible for the purpose of receiving back other messages. Thus, after a REGISTER the user agent may receive back incoming calls, after a SUBSCRIBE it may receive back notifications and after an outgoing INVITE it may receive back further in-dialog messages including the BYE that ends the dialog. The nat_keepalive() function needs to be called on the proxy that directly receives the request from the user agent, if it determines that the user agent making the request is behind NAT. The function needs to be called before the request gets either a stateless reply or it is relayed with t_relay(). Calling the nat_keepalive() function has no effect if the request gets no stateless reply or it is not relayed.
For environments with multiple proxies, where the proxy that acts as an entry point to the network for a given request is not the one that actually handles the request, then the nat_keepalive() function needs to be called on the proxy that is the entry point and after that the request must be sent to the proxy that actually handles the request using t_relay(). This is needed because the keepalive functionality detects from the stateless replies or the TM relayed replies if the NAT endpoint needs to be kept alive for the condition triggered by the request for which the nat_keepalive() function was called. For example assume a network where a proxy P1 receives a REGISTER from an user agent behind NAT. P1 will determine that the user agent is behind NAT so it needs keepalive functionality, but another proxy called P2 is actually handling the subscriber registrations. In this case P1 has to call nat_keepalive() even though it doesn't yet know the answer P2 will give to the REGISTER request (which may even be a negative reply) or if P2 will restrict the proposed expiration time in any way. Thus P1 calls nat_keepalive() after which it calls t_relay(). When the reply from P2 arrives, a callback is triggered which will determine if the request did get a positive reply, and if so it will extract the registration expiration time and enable the keepalive functionality for that endpoint for the registration condition for the time given by the registration expiration. For single proxy environments, or if P1 is the same as P2, then t_relay() is not called, instead save_location() is called if the registration is accepted. Then the same process described above happens only this time triggered by a stateless reply callback. In both cases, calling nat_keepalive() when the REGISTER is received has no other effect that to trigger some callbacks that will determine from the reply if the caller endpoint should be kept alive or not.
Below is described how nat_keepalive() should be called and what it does for each of the requests that need keepalive functionality (the function should only be called if it is determined that the user agent that generated the request is behind NAT):
REGISTER - called before save_location() or t_relay() (depending on whether the proxy that received the REGISTER is also handling registration for that subscriber or not). It will determine from either the stateless reply generated by save_location() or the TM relayed reply if the registration was successful and what is its expiration time. If the registration was successful it will mark the given NAT endpoint for keepalive for the registration condition using the detected expiration time. If the REGISTER request is discarded after nat_keepalive() was called or if it intercepts a negative reply it will have no effect and the registration condition will not be activated for that endpoint.
SUBSCRIBE - called before handle_subscribe() or t_relay() (depending on whether the proxy that received the SUBSCRIBE is also handling subscriptions for that subscriber or not). It will determine from either the stateless reply generated by handle_subscribe() or the TM relayed reply if the subscription was successful and what is its expiration time. If the subscription was successful it will mark the given NAT endpoint for keepalive for the subscription condition using the detected expiration time. If the SUBSCRIBE request is discarded after nat_keepalive() was called or if it intercepts a negative reply it will have no effect and the subscription condition will not be activated for that endpoint. It should be called for every SUBSCRIBE received, not only the ones that start a subscription (do not have a to tag), because it needs to update (extend) the expiration time for the subscription.
INVITE - called before t_relay() for the first INVITE in a dialog. It will automatically trigger dialog tracing for that dialog and will use the dialog callbacks to detect changes in the dialog state. It will add a keepalive entry with the dialog condition for the caller NAT endpoint as soon as the dialog is created (this happens when t_relay() is called). It will then keep that condition for the given endpoint until the dialog is destroyed (either terminated, failed or expired). If the INVITE request cannot be relayed after nat_keepalive() was called it will have no effect and the dialog condition will not be activated for that endpoint.
In addition an INVITE that starts a dialog will automatically trigger keepalive functionality for the destination endpoints if they are behind NAT. This is done by detecting if any of the destination endpoints already has a keepalive entry for the register condition. If so, a dialog condition will be added to that entry thus preserving that endpoint visibility even if the registration expires during the dialog or is moved to another proxy. During the call setup stage, multiple entries for the callee may be added with the dialog condition if parallel forking is used, however only the destination endpoints behind NAT will have the extra dialog condition set. Later when the dialog is confirmed, only the endpoint that answered the call will keep the dialog condition activated (if present), while all the endpoints from the unanswered branches will have it removed. This is done automatically without any need to call any function.
Considering the elements presented in this section, we can say that the nat_traversal module provides a flexible and efficient keepalive functionality that is very easy to use. Because only the border proxies send keepalive messages, the network traffic is minimized. For the same reason, message processing in the proxies is also minimized, as border proxies generate keepalive messages themselves and send them statelessly, instead of having to relay messages generated by the registrars. Network traffic is also minimized by only sending a single keepalive message for an endpoint no matter for how many reasons the endpoint is kept alive. Keepalive messages are also distributed over the keepalive interval to avoid overloading the proxy by generating too many messages at a time. The nat_traversal module keeps its internal state about endpoints that need keepalive, state that is build while messages are processed by the proxy and thus it doesn't need to transfer any information from the usrloc module, which should also improve its efficiency.
The following modules must be loaded before this module:
sl module - if keepalive is enabled.
tm module - if keepalive is enabled.
dialog module - if keepalive is enabled and keeping alive INVITE dialogs is needed.
The time interval (in seconds) required to send a keepalive message to all the endpoints that need being kept alive. During this interval, each endpoint will receive exactly one keepalive message. A negative value or zero will disable the keepalive functionality.
Default value is “60”.
Example 1.1. Setting the keepalive_interval
parameter
... modparam("nat_traversal", "keepalive_interval", 90) ...
What SIP method to use to send keepalive messages. Typical methods used for this purpose are NOTIFY and OPTIONS. NOTIFY generates smaller replies from user agents, but they are almost entirely negative replies. Apparently almost none of the user agents understand that the purpose of the NOTIFY with a “keep-alive” event is to keep NAT open, even though many user agents send such NOTIFY requests themselves. However this does not affect the result at all, since the purpose is to trigger a response from the user agent behind NAT, positive or negative replies having little relevance as they are discarded anyway. The OPTIONS method on the other hand has a much higher rate of positive replies, but at the same time those positive replies are much bigger, mostly because the OPTIONS method is used to inform about the user agent capabilities and thus it includes a lot of extra headers to indicate those capabilities. Many user agents also include a SDP body with a bogus media session, probably to indicate media capabilities. All of this makes that positive replies to OPTIONS requests are 2 to 3 times bigger than negative replies or replies to NOTIFY requests. For this reason the default value for the used method is NOTIFY.
Default value is “NOTIFY”.
Example 1.2. Setting the keepalive_method
parameter
... modparam("nat_traversal", "keepalive_method", "OPTIONS") ...
Indicates what SIP URI to use in the From header of the keepalive requests. If not specified it will use sip:keepalive@proxy_ip, where proxy_ip is the IP address of the outgoing interface used to send the keepalive message, which is the same interface on which the request that triggered keepalive functionality arrived.
Default value is “sip:keepalive@proxy_ip” with proxy_ip being the actual IP of the outgoing interface.
Example 1.3. Setting the keepalive_from
parameter
... modparam("nat_traversal", "keepalive_from", "sip:keepalive@my-domain.com") ...
Specifies extra headers that should be added to the keepalive messages that are sent by the proxy. The header specification must also include the CRLF (\r\n) line separator. Multiple headers can be specified by concatenating them and each of them must include the \r\n separator.
Default value is undefined (send no extra headers).
Example 1.4. Setting the keepalive_extra_headers
parameter
... modparam("nat_traversal", "keepalive_extra_headers", "User-Agent: OpenSIPS\r\nX-MyHeader: some_value\r\n") ...
Specifies a filename where information about the NAT endpoints and the conditions for which they are being kept alive is saved when OpenSIPS exits. The information in this file is then used when OpenSIPS starts to restore its internal state and continue to send keepalive messages to the NAT endpoints that have not expired in the meantime. This is useful when restarting OpenSIPS to avoid losing keepalive state information about the NAT endpoints. The internal keepalive state is guaranteed to be saved in this file on exit, even when OpenSIPS crashes.
The value of this parameter can be either a relative path, in which case it will store it in the OpenSIPS working directory, or an absolute path.
Default value is undefined “keepalive_state”.
Example 1.5. Setting the keepalive_state_file
parameter
... modparam("nat_traversal", "keepalive_state_file", "/var/run/opensips/keepalive_state") ...
Check if the client is behind NAT. What tests are performed is specified by the type parameter which is an integer given by the sum of the numbers corresponsing to the tests that one wishes to perform. The numbers corresponding to individual tests are shown below:
1 - tests if client has a private IP address (as defined by RFC1918) in the Contact field of the SIP message.
2 - tests if client has contacted OpenSIPS from an address that is different from the one in the Via field. Both the IP and port are compared by this test.
4 - tests if client has a private IP address (as defined by RFC1918) in the top Via field of the SIP message.
For example calling client_nat_test("3") will perform test 1 and test 2 and return true if at least one succeeds, otherwise false.
This function can be used from REQUEST_ROUTE, ONREPLY_ROUTE, FAILURE_ROUTE, BRANCH_ROUTE.
Will replace the IP and port in the Contact header with the IP and port the SIP message was received from. Usually called after a succesful call to client_nat_test(type)
This function can be used from REQUEST_ROUTE, ONREPLY_ROUTE, BRANCH_ROUTE.
Trigger keepalive functionality for the source address of the request. When called it only sets some internal flags, which will trigger later the addition of the endpoint to the keepalive list if a positive reply is generated/received (for REGISTER and SUBSCRIBE) or when the dialog is started/replied (for INVITEs). For this reason, it can be called early or late in the script. The only condition is to call it before replying to the request or before sending it to another proxy. If the request needs to be sent to another proxy, t_relay() must be used to be able to intercept replies via TM or dialog callbacks. If stateless forwarding is used, the keepalive functionality will not work. Also for outgoing INVITEs, record_route() should also be used to make sure the proxy that keeps the caller endpoint alive stays in the path. For multi-proxy setups, this function should always be called on the border proxies (the ones that received the request directly from the user agent). For more details about this function, see the Implementation subsection from the Keepalive functionality section.
This function can be used from REQUEST_ROUTE.
Example 1.8. Using the nat_keepalive
function
... if ((method=="REGISTER" || method=="SUBSCRIBE" || (method=="INVITE" && !has_totag())) && client_nat_test("3")) { nat_keepalive(); } ...
Indicates how many of the NAT endpoints are kept alive for registrations.
Indicates how many of the NAT endpoints are kept alive for subscriptions.
Returns the local socket used to send messages to the given NAT endpoint URI. The socket has the form proto:ip:port. The NAT endpoint URI is in the form: sip:ip:port[;transport=xxx] with transport missing if UDP. If the requested NAT endpoint URI is present in the internal keepalive table for any condition, it will return its associated local socket, else it will return null. The nat_endpoint can be a string or another pseudo-variable.
This can be useful to restore the sending socket when relaying messages to a given user agent in multi-proxy environments. Consider an example where 2 proxies are involved, P1 and P2. A user agent registers by sending a REGISTER request to P1. P1 will call nat_keepalive() but because it determines that P2 should actually handle the user registration will forward the request to P2. Now assume P2 receives an incoming INVITE for this user. It will determine that the registration came through P1 and will forward the request to P1. P2 should also include the NAT endpoint URI where this request is to be relayed. This information should have been provided by P1 when it relayed the REGISTER request to P2. The means to do this is out of the scope of this example, but one can either use the path extension or custom headers to do this. When P1 receives the INVITE it will use the NAT endpoint URI it has received along with the request to determine the socket to send out the request, which should be the same as the one where the registration request was originally received. In the example below lets assume that P2 provided the original NAT endpoint address in a custom header called X-NAT-URI and that it also provides a custom header called X-Scope to indicate that the message is sent to P1 for being relayed back to the user agent by P1 which has the NAT open with it.
Example 1.9. Using $keepalive.socket
in multi-proxy environments
... # This code runs on P1 which has received an INVITE from P2 to forward # it to the user agent behind NAT (because P1 has the NAT open with it). if (method=="INVITE" && $hdr(X-Scope)=="nat-relay") { $du = $hdr(X-NAT-URI); $fs = $keepalive.socket($du); t_relay(); exit; } ...
Returns the URI specification from where a request was received in the form sip:ip:port[;transport=xxx] with transport missing if UDP.
This pseudo-variable can be used to set the received AVP for the registrar module to indicate that a user agent is behind NAT. This is meant as a more flexible replacement for the fix_nated_register() function, because it allows one to modify the source uri by appending some extra parameters before saving it to the received AVP.
Another use for this pseudo-variable is in multi-proxy environments to indicate the NAT endpoint URI to the next proxy (if needed). Consider the previous example with two proxies P1 and P2. P1 receives the REGISTER request from a user agent and forwards it to P2 which does the actual registration. P1 needs to indicate the NAT endpoint URI to P2, so that P2 can include it later for incoming INVITE requests to this user agent.
Example 1.10. Using $source_uri
to set the received AVP on registrars
... modparam("registrar", "received_avp", "$avp(s:received_uri)") modparam("registrar", "tcp_persistent_flag", 10) ... # This code runs on the registrar, assuming it has received the # REGISTER request directly from the user agent. if (method=="REGISTER") { if (client_nat_test("3")) { if (proto==UDP) { nat_keepalive(); } else { # Keep TCP/TLS connections open until the registration # expires, by setting the tcp_persistent_flag setflag(10); } force_rport(); $avp(s:received_uri) = $source_uri; # or we could add some extra parameters to it if needed # $avp(s:received_uri) = $source_uri + ";relayed=false" } if (!www_authorize("", "subscriber")) { www_challenge("", "0"); return; } else if (!check_to()) { sl_send_reply("403", "Username!=To not allowed ($au!=$tU)"); return; } if (!save("location")) { sl_reply_error(); } exit; } ...
Example 1.11. Using $source_uri
in multi-proxy environments
... # This code runs on P1 which received the REGISTER request and has to # forward it to the registrar P2. if (method=="REGISTER") { if (client_nat_test("3")) { force_rport(); nat_keepalive(); append_hf("X-NAT-URI: $source_uri\r\n"); } $du = "sip:P2_ip:P2_port"; t_relay(); exit; } ...
In this case the usage is straight forward. The nat_keepalive() function needs to be called before save_location() for REGISTER requests, before handle_subscribe() for SUBSCRIBE requests and before t_relay() for the first INVITE of a dialog.
If the proxy receiving the REGISTER request is the same as the proxy handling it, then the case is reduced to the single proxy case. For this example, lets assume they are different. We have a user agent UA1 for which the registration is handled by the proxy P1. However UA1 sends the REGISTER to P0 which in turn forwards it to P1 like this: UA1 --> P0 --> P1. In this case P0 calls nat_keepalive(), adds the NAT endpoint URI to the request (for example using a custom header) and forwards the request to P1. P1 will save the user in the user location together with the NAT endpoint URI.
When an incoming INVITE request arrives on P1 for UA1, P1, will lookup the location and determine that it has to relay it to P0 because P0 has the NAT open with UA1. P1 will include the original NAT endpoint URI in the request and an indication that the only role P0 has in this transaction is to relay it to UA1. P0 will receive this request and determine that is has to act as a relay for it. It will extract the NAT endpoint URI, then based on it the corresponding local socket using $keepalive.socket(endpoint_uri). It will then set both $du and $fs to the values it has found, call record_route() to stay in the path and call t_relay() to send it to UA1.
Handling other type of requests (like for example SUBSCRIBE or MESSAGE) that arrive on P1 for UA1 is done the same way as with the first INVITE, on both P1 and P0.
If the proxy receiving the SUBSCRIBE request is the same as the proxy handling it, then the case is reduced to the single proxy case. For this example, lets assume they are different. We have a user agent UA1 for which subscriptions are handled by the proxy P1. However UA1 sends the SUBSCRIBE to P0 which in turn forwards it to P1 like this: UA1 --> P0 --> P1. In this case P0 calls nat_keepalive(), then calls record_route() to stay in the path and forwards the request to P1 using t_relay(). Further SUBSCRIBE and NOTIFY requests will follow the record route and use P0 as a NAT entry point to have access to UA1. Further in-dialog SUBSCRIBE requests should also call record_route().
If the proxy receiving the INVITE request is the same as the proxy handling it, then the case is reduced to the single proxy case. For this example, lets assume they are different. We have a user agent UA1 which is handled by the proxy P1 and UA2 which is handled by P2. UA2 has registered with P2 going through P3, while UA1 calls UA2 by sending the first INVITE to P0. The call flow for the first INVITE looks like this: UA1 --> P0 --> P1 --> P2 --> P3 --> UA2. In this case P0 calls nat_keepalive(), then calls record_route() to stay in the path and forwards the request to P1. P1 authenticates UA1 then forwards the request to P2, which is the home proxy for UA2. P1 doesn't have to use record_route to stay in the path, but it can do that if needed for other purposes. P2 will lookup UA2 and find out that it is reachable through P3. It will take the original NAT endpoint URI that is has saved in the user location when UA2 has registered and include it in the message along with an indication that P3 only has to relay the message to UA2. If P2 does accounting or starts a media relay, it should also call record_route() to stay in the path. Then it forwards the request to P3 using t_relay(). P3 will detect that it only has to relay the request to UA2 because it has the NAT open with it. It will extract the NAT endpoint URI from the message and the local sending socket using $keepalive.socket(endpoint_uri) and will set both $du and $fs. After that it will call record_route() to stay in the path, and forward the request to UA2 using t_relay(). Further in-dialog requests will follow the recorded route and use P0 and P3 as access points to UA1 respectively UA2. All the proxies that have used record_route() during the first INVITE should also call record_route() during further in-dialog requests to keep staying in the path.