[OpenSIPS-Users] usrloc restart persistency on seed node

Liviu Chircu liviu at opensips.org
Thu Jan 3 05:33:50 EST 2019

Happy New Year John, Alexey and everyone else!

I just finished catching up with this thread, and I must admit that I now
concur with John's distaste of the asymmetric nature of cluster node 

Although it is correct and gets the job done, the 2.4 "seed" mechanism 
the admin to conditionally add a "opensipsctl fifo ul_cluster_sync" command
into the startup script of all "seed" nodes.  I think we can do better :)

What if we kept the "seed" concept, but tweaked it such that instead of 

"following a restart, always start in 'synced' state, with an empty dataset"

... it would now mean:

"following a restart or cluster sync command, fall back to a 'synced' state,
with an empty dataset if and only if we are unable to find a suitable sync
candidate within X seconds"

This solution seems to fit all requirements that I've seen posted so 
far.  It is:

* correct (a cluster with at least 1 "seed" node will still never deadlock)
* symmetric (with the exception of cluster bootstrapping, all node 
restarts are identical)
* autonomous (users need not even know about "ul_cluster_sync" anymore!  
Not saying
               this is necessarily good, but it brings down the learning 

The only downside could be that any cluster bootstrap will now last at 
least X seconds.
But that seems such a rare event (in production, at least) that we need 
not worry
about it.  Furthermore, the X seconds will be configurable.

What do you think?

PS: by "cluster bootstrap" I mean (re)starting all nodes simultaneously.

Best regards,

Liviu Chircu
OpenSIPS Developer

On 02.01.2019 12:24, John Quick wrote:
> Alexey,
> Thanks for your feedback. I acknowledge that, in theory, a situation may
> arise where a node is brought online and all the previously running nodes
> were not fully synchronised so it is then a problem for the newly started
> node to know which data set to pull. In addition to the example you give -
> lost interconnection - I can also foresee difficulties when several nodes
> all start at the same time. However, I do not see how arbitrarily setting
> one node as "seed" will help to resolve either of these situations unless
> the seed node has more (or better) information than the others.
> I am trying to design a multi-node solution that is scalable. I want to be
> able to add and remove nodes according to current load. Also, to be able to
> take one node offline, do some maintenance, then bring it back online. For
> my scenario, the probability of any node being taken offline for maintenance
> during the year is 99.9% whereas I would say the probability of partial loss
> of LAN connectivity (causing the split-brain issue) is less than 0.01%.
> If possible, I would really like to see an option added to the usrloc module
> to override the "seed" node concept. Something that allows any node
> (including seed) to attempt to pull registration details from another node
> on startup. In my scenario, a newly started node with no usrloc data is a
> major problem - it could take it 40 minutes to get close to having a full
> set of registration data. I would prefer to take the risk of it pulling data
> from the wrong node rather than it not attempting to synchronise at all.
> Happy New Year to all.
> John Quick
> Smartvox Limited
>> Hi John,
>> Next is just my opinion. And I didn't explore source code OpenSIPS for
> syncing data.
>> The problem is little bit deeper. As we have cluster, we potentially have
> split-brain.
>> We can disable seed node at all and just let nodes work after
> disaster/restart. But it means that we can't guarantee consistency of data.
> So nodes must show this with <Not in sync> state.
>> Usually clusters use quorum to trust on. But for OpenSIPS I think this
> approach is too expensive. And of course for quorum we need minimum 3 hosts.
>> For 2 hosts after loosing/restoring interconnection it is impossible to
> say, which host has consistent data. That's why OpenSIPS uses seed node as
> artificial trust point. I think <seed> node doesn't solve syncing problems,
> but it simplifies total work.
>> Let's imagine 3 nodes A,B,C. A is Active. A and B lost interconnection. C
> is down. Then C is up and has 2 hosts for syncing. But A already has 200
> phones re-registered for some reason. So we have 200 conflicts (on node B
> the same phones still in memory). Where to sync from? <Seed> host will
> answer this question in 2 cases (A or B). Of course if C is <seed> so it
> just will be happy from the start. And I actually don't know what happens,
> if we now run <ul_cluster_sync> on C. Will it get all the contacts from A
> and B or not?
>> We operate with specific data, which is temporary. So syncing policy can be
> more relaxed. May be it's a good idea to connect somehow <seed> node with
> Active role in the cluster. But again, if Active node restarts and still
> Active - we will have a problem.
>> -----
>> Alexey Vasilyev

More information about the Users mailing list