[OpenSIPS-Users] usrloc restart persistency on seed node

Wed Jan 2 05:24:41 EST 2019

Alexey,

Thanks for your feedback. I acknowledge that, in theory, a situation may
arise where a node is brought online and all the previously running nodes
were not fully synchronised so it is then a problem for the newly started
node to know which data set to pull. In addition to the example you give -
lost interconnection - I can also foresee difficulties when several nodes
all start at the same time. However, I do not see how arbitrarily setting
one node as "seed" will help to resolve either of these situations unless
the seed node has more (or better) information than the others.

I am trying to design a multi-node solution that is scalable. I want to be
able to add and remove nodes according to current load. Also, to be able to
take one node offline, do some maintenance, then bring it back online. For
my scenario, the probability of any node being taken offline for maintenance
during the year is 99.9% whereas I would say the probability of partial loss
of LAN connectivity (causing the split-brain issue) is less than 0.01%.

If possible, I would really like to see an option added to the usrloc module
to override the "seed" node concept. Something that allows any node
(including seed) to attempt to pull registration details from another node
on startup. In my scenario, a newly started node with no usrloc data is a
major problem - it could take it 40 minutes to get close to having a full
set of registration data. I would prefer to take the risk of it pulling data
from the wrong node rather than it not attempting to synchronise at all.

Happy New Year to all.

John Quick
Smartvox Limited

> Hi John,
>
> Next is just my opinion. And I didn't explore source code OpenSIPS for
syncing data.
>
> The problem is little bit deeper. As we have cluster, we potentially have
split-brain.
> We can disable seed node at all and just let nodes work after
disaster/restart. But it means that we can't guarantee consistency of data.
So nodes must show this with <Not in sync> state.  
>
> Usually clusters use quorum to trust on. But for OpenSIPS I think this
approach is too expensive. And of course for quorum we need minimum 3 hosts.
> For 2 hosts after loosing/restoring interconnection it is impossible to
say, which host has consistent data. That's why OpenSIPS uses seed node as
artificial trust point. I think <seed> node doesn't solve syncing problems,
but it simplifies total work.
>
> Let's imagine 3 nodes A,B,C. A is Active. A and B lost interconnection. C
is down. Then C is up and has 2 hosts for syncing. But A already has 200
phones re-registered for some reason. So we have 200 conflicts (on node B
the same phones still in memory). Where to sync from? <Seed> host will
answer this question in 2 cases (A or B). Of course if C is <seed> so it
just will be happy from the start. And I actually don't know what happens,
if we now run <ul_cluster_sync> on C. Will it get all the contacts from A
and B or not?
>
>We operate with specific data, which is temporary. So syncing policy can be
more relaxed. May be it's a good idea to connect somehow <seed> node with
Active role in the cluster. But again, if Active node restarts and still
Active - we will have a problem.
>
> -----
> Alexey Vasilyev