[OpenSIPS-Users] Using Contact replication and HA

John Quick john.quick at smartvox.co.uk
Wed May 3 09:46:46 EDT 2017


Hello,

I am still working my way through some of the new features described at last
year's Summit conference while you are all hopefully enjoying this year's
Summit.

I'm playing with the Clusterer module. It is a great idea but I am finding a
few practical difficulties for contact replication in the USRLOC module.

In my test rig, there are two almost identical OpenSIPS servers (A and B).
Contact replication is enabled between the two servers and each server has
its own local database.

Linux HA - Corosync and Pacemaker - is used to control a Virtual IP (VIP)
address resource. This allows UA's to register at the VIP address. HA
decides which server has the virtual address at any given time, based on
node availability. Currently, Server A is assigned the VIP and processes all
UA registrations. 

Problem: The "socket" field in the location table contains the VIP address
on both server A and B, but only Server A is bound to that address while
both servers are up.
Unless I completely disable NAT Pings in the nathelper module, Server B
reports a lot of errors like this:
2017-05-03 14:15:51 CRITICAL:core:proto_udp_send: invalid
sendtoparameters#012one possible reason is the server is bound to localhost
and#012attempts to send to the net
2017-05-03 14:15:51 ERROR:nathelper:msg_send: send() for proto 1 failed
2017-05-03 14:15:51 ERROR:nathelper:nh_timer: sip msg_send failed!

Worse, if I also enable the "remove_on_timeout_bflag" option on Server B, it
removes the registration on *both* servers after a short delay even though
the UA is still available!

Initially, I encountered problems with the HA IP Resource (or VIP) with
respect to OpenSIPS not starting on server B because it was trying to bind
to an address that was not currently assigned to any local interface. While
it is possible to group the IP resource with the OpenSIPS service resource
to overcome this problem, that would completely break USRLOC contact
replication because the OpenSIPS service on Server B would not be running as
long as Server A is up. I had to resort to an option in sysctl.conf that
allows processes to start even if they are trying to bind to a non-local
address.

This makes me wonder what is the purpose of Usrloc Contact replication? Is
there some other scenario that could use it and not have these problems?
I also wonder what difference does the db_mode setting in Usrloc make when
using contact replication.
 
John Quick
Smartvox Limited





More information about the Users mailing list