[OpenSIPS-Users] Using Contact replication and HA

Wed May 10 11:52:54 EDT 2017

Hi Bogdan,

Thanks for your response and for investigating this.

The socket field on my backup server does *not* have NULL for the socket field, it has the VIP
udp:192.168.0.111:5060

The socket field contains identical data on both my test servers.

When you tried it, did you use the ip_nonlocal_bind setting and have both servers start with a listen statement for the VIP?
If not, perhaps the backup server deletes the socket data in a replication record where it fails to match any of its own interface addresses.

I do not want the backup server to attempt to send NAT pings to registered devices. I only want the active server to do that - the nat pings *must* come from the VIP if they are going to be accepted by the firewall in front of the client device. If the backup server sent pings from a different IP address they would not get through the client's firewall, would fail and so trigger removal of the Contact record if "remove_on_timeout_bflag" option is set.

I will try some tests with the force_socket parameter in nathelper and let you know the results.

John Quick
Smartvox Limited
Tel:   01727-221221

-----Original Message-----
From: Bogdan-Andrei Iancu [mailto:bogdan at opensips.org]
Sent: 10 May 2017 11:07
To: john.quick at smartvox.co.uk
Cc: users at lists.opensips.org
Subject: Re: [OpenSIPS-Users] Using Contact replication and HA

Hi John,

What you did (with the "net.ipv4.ip_nonlocal_bind") is a good workaround for the problem.

Also, I investigated the original issue and here it is:
   1) the replicated contact (on backup) is saved with NULL socket, as the received one is not valid (there is no err log on this, but only a dbg log)
   2) when pinging the contact via nathelper, as the socket is NULL, nathelper is trying to get a socket, but simply using the first listener matching the proto (UDP) and AF family (ipv4) as per destination
   3) it looks like this first UDP listener is not compatible with the destination (localhost or a private network??)

Have you tried to use force_socket:
http://www.opensips.org/html/docs/modules/2.3.x/nathelper.html#idp5512752
(it take effect only if the contact has no socket assigned).

Regards,

Bogdan-Andrei Iancu
   OpenSIPS Founder and Developer
   http://www.opensips-solutions.com

OpenSIPS Summit May 2017 Amsterdam
   http://www.opensips.org/events/Summit-2017Amsterdam.html

On 05/09/2017 05:54 PM, John Quick wrote:
> Hi Bogdan,
>
> I tried different scenarios and eventually ended up with the backup server having a listen statement for the VIP address.
> Normally you cannot start OpenSIPS (or any other application) binding to an IP address that is not assigned on a local interface.
> However, adding the line "net.ipv4.ip_nonlocal_bind = 1" to /etc/sysctl.conf I was then able to start OpenSIPS with that listen statement in place.
>
> The backup server also listens on its own static IP using the proto_bin mechanism so it can receive and send replications while it is in "standby" mode.
>
> That is the dilemma:
> Replicated Contacts can only be useful if the backup server is able to take over the same VIP that was used on the primary server.
> If the backup server does not use the VIP when it takes over as "active", then the replicated socket information in the location table will be wrong.
> If OpenSIPS only starts on the backup server *after* that server has acquired the VIP then it could not receive the replicated Contacts using proto_bin when it was in standby mode.
>
> John Quick
> Smartvox Limited
>
>
> -----Original Message-----
> From: Bogdan-Andrei Iancu [mailto:bogdan at opensips.org]
> Sent: 09 May 2017 14:45
> To: john.quick at smartvox.co.uk; OpenSIPS users mailling list 
> <users at lists.opensips.org>
> Subject: Re: [OpenSIPS-Users] Using Contact replication and HA
>
> Hi John,
>
> So, in your setup, on the backup server, OpenSIPS is not listening on the VIP address at all, right ?
>
> Best regards,
>
> Bogdan-Andrei Iancu
>     OpenSIPS Founder and Developer
>     http://www.opensips-solutions.com
>
> OpenSIPS Summit May 2017 Amsterdam
>     http://www.opensips.org/events/Summit-2017Amsterdam.html
>
> On 05/03/2017 04:46 PM, John Quick wrote:
>> Hello,
>>
>> I am still working my way through some of the new features described 
>> at last year's Summit conference while you are all hopefully enjoying 
>> this year's Summit.
>>
>> I'm playing with the Clusterer module. It is a great idea but I am 
>> finding a few practical difficulties for contact replication in the USRLOC module.
>>
>> In my test rig, there are two almost identical OpenSIPS servers (A and B).
>> Contact replication is enabled between the two servers and each 
>> server has its own local database.
>>
>> Linux HA - Corosync and Pacemaker - is used to control a Virtual IP
>> (VIP) address resource. This allows UA's to register at the VIP 
>> address. HA decides which server has the virtual address at any given 
>> time, based on node availability. Currently, Server A is assigned the 
>> VIP and processes all UA registrations.
>>
>> Problem: The "socket" field in the location table contains the VIP 
>> address on both server A and B, but only Server A is bound to that 
>> address while both servers are up.
>> Unless I completely disable NAT Pings in the nathelper module, Server 
>> B reports a lot of errors like this:
>> 2017-05-03 14:15:51 CRITICAL:core:proto_udp_send: invalid 
>> sendtoparameters#012one possible reason is the server is bound to 
>> localhost and#012attempts to send to the net
>> 2017-05-03 14:15:51 ERROR:nathelper:msg_send: send() for proto 1 
>> failed
>> 2017-05-03 14:15:51 ERROR:nathelper:nh_timer: sip msg_send failed!
>>
>> Worse, if I also enable the "remove_on_timeout_bflag" option on 
>> Server B, it removes the registration on *both* servers after a short 
>> delay even though the UA is still available!
>>
>> Initially, I encountered problems with the HA IP Resource (or VIP) 
>> with respect to OpenSIPS not starting on server B because it was 
>> trying to bind to an address that was not currently assigned to any 
>> local interface. While it is possible to group the IP resource with 
>> the OpenSIPS service resource to overcome this problem, that would 
>> completely break USRLOC contact replication because the OpenSIPS 
>> service on Server B would not be running as long as Server A is up. I 
>> had to resort to an option in sysctl.conf that allows processes to 
>> start even if they are trying to bind to a non-local address.
>>
>> This makes me wonder what is the purpose of Usrloc Contact 
>> replication? Is there some other scenario that could use it and not have these problems?
>> I also wonder what difference does the db_mode setting in Usrloc make 
>> when using contact replication.
>>    
>> John Quick
>> Smartvox Limited
>>
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users