[OpenSIPS-Users] Serialforking failure, with lcr:parse_phostport: too many colons in udp:: 0

Fri Oct 29 12:22:41 CEST 2010

Hi again Bogdan,

I'm sorry its takes such a long time to reply, considering the lightning-
fast supportservice you are providing for all of us on this list :-)

Anyway, of course your suggestion helped, so I know have serialforking
working!

A few notes though. It seems like I need the serialize_branches() to return
a useful returncode as well. Otherwise my script cannot differentiate between
when serialforking really is being done, or when normal proxy or parallell fork
is in progress. (In which case I wanted normal timer C)
Returning 1 in the end of serialize.c, instead of 0 which is returned when nothing
is performed by the call to serialize_branches() took care of that.
This cause action.c do LOG_ERR though, so I changed that to only log error if
the return from serialize_branches was < 0.

When you have time, I am very interessted in your views on my other issues.

Regards
Taisto Qvist

Bogdan-Andrei Iancu skrev 2010-10-13 23:16:
> Hi Taisto,
>
> Your problem is not timer related or how serial forking is done in opensips (I will 
> comment on these in a later reply).
>
> Right now, the quick answer to fix your problem: failure route must be re-armed 
> after each branch -> this is why your failure route does not catches the end of the 
> second branch. Adding a t_on_failure("1") before t_relay() in failure route will 
> fix your problem.
>
> Regards,
> Bogdan
>
>
>
> Taisto Qvist wrote:
>> Hi Bogdan,
>>
>> I've now been trying with some tests, and I cant really get it to work,
>> since the transactionlayer on the server transaction returns a 408
>> back to the UAC before serial forking has ended.
>> This seems a little bit related to what I once commented on a long time
>> ago regarding handling of timer C and the fact that the timer c seems to
>> be quite "tied" to Timer B
>>
>> When the fr_timer pops, (causing the CANCEL to be sent so that we can move
>> on to the next serial-fork-target), the tm-layer seems store this timer-pop
>> as a 408 response
>>
>> 20:41:44 osips[4686]: DBG:tm:utimer_routine: timer routine:4,tl=0xb5b6770c 
>> next=(nil), timeout=649300000
>> 20:41:55 osips[4686]: DBG:tm:timer_routine: timer routine:1,tl=0xb5b67728 
>> next=(nil), timeout=660
>> 20:41:55 osips[4686]: DBG:tm:final_response_handler: stop retr. and send CANCEL 
>> (0xb5b675c0)
>> 20:41:55 osips[4686]: DBG:tm:t_should_relay_response: T_code=180, new_code=408
>> 20:41:55 osips[4686]: DBG:tm:t_pick_branch: picked branch 0, code 408 (prio=800)
>>
>> As the capture and log I've attached indicates, I am not able to perform a three
>> step serial fork. I have three Uas:es registered with 1.0, 0.9, and 0.8 in q-values.
>>
>> First timer pop causes a CANCEL, and a new INVITE towards UAS with q=0.9, but when
>> it pops the second time, TM still cancels the second target, but instead of 
>> continuing
>> with the third, it sends a 408 towards the UAC.
>>
>> It might be something with my script-handling in the failure_route, so here it is:
>>
>> failure_route[1]
>> {
>>     if ( t_was_cancelled() )
>>     {
>>         log(2, "transaction was cancelled by UAC\n");
>>     }
>>     xlog("(lab1) - In FailureRoute: branches=$(branch(uri)[*])\n");
>>     if ( isflagset(1) )
>>     {
>>         log(2,"(lab1) - 3++ Received, attempting serial fork!\n");
>>         next_branches();
>>         switch ( $retcode )
>>         {
>>             case 1:
>>                    log(2,"(lab1) - More branches left, rollOver timer set.");
>>                    $avp(s:timerC) = 12;
>>                    setflag(1);  # Do I need this? Should I use branchflags instead?
>>             break;
>>             case 2:
>>                    log(2,"(lab1) - Last branch, timerC set to 60 sec");
>>                    $avp(s:timerC) = 60;
>>             break;
>>
>>             default:
>>                    log(2,"(lab1) - No more serial fork targets.");
>>                    exit;
>>         }
>>         if ( !t_relay() )
>>         {
>>             log(2,"(lab1) - Error during relay for serial fork!\n");
>>         }
>>     }
>>     else
>>     {
>>         log(2,"(lab1) - 3++ result. Serialforking not available.\n");
>>     }
>>
>> }
>>
>> When I say that it seems related to another issue I commented on a long time
>> ago, I am referring to the general handling of Timer C, which doesn't seem to
>> be a separate timer, but is reusing the timerB.
>>
>> When the timer pops after the normal 180 seconds, the TM layer will *instantly*
>> generate a 408 response on the server txn, while at the same time generating
>> the CANCEL attempting to terminate the client txn.
>> To me, this is wrong, but maybe I am suppose to handle this in the failure_route?
>>
>> What I would expect is that the CANCEL will cause a 487 response from the UAS,
>> and this will be the final response sent to the UAC.
>> Also by behaving this way, we may cause a protocol violation even though the risk
>> is small.
>>
>> Once timer C pops we send the CANCEL hoping that it will cause a 487. BUT, it is
>> quite possible that before the cancel is received by the UAS, it sends a 200 to
>> the INVITE! Even IF the CANCEL receives a 2xx response, we may still get a 2xx
>> response to the INVITE.
>> But with the current behavior of opensips, this would cause opensips to proxy
>> TWO final responses on the server txn, once being the initial 408 sent by the
>> txn on timer C timeout, and then the real/actual 2xx sent by the uas.
>>
>> I've also seen a similar problem with 6xx responses received on a branch during
>> forking.
>> Opensips forwards the 6xx *before* the remaining client txns has completed, and
>> there is no guarantee that these client txns will all terminate with 487 even
>> if opensips tries to CANCEL all of them asap.
>> They may still return 2xx to the invite, which would cause a forwarding of both
>> a 6xx and a 2xx on the server txn. This scenario is even mentioned in rfc3261.
>>
>> So all these three problems have in common that the server txn seems to be
>> terminating a bit early, before the client side has fully completed, but as
>> I said, it might at least partially be something I should handle in my
>> failure_routes...?
>>
>> Thanks for all your help.
>> Regards
>> Taisto Qvist
>>
>>
>> Bogdan-Andrei Iancu skrev 2010-10-06 17:04:
>>> Hi Taisto,
>>>
>>> could you test the rev 7248 on trunk for solution 2) ? if ok, I will backport to 1.6
>>>
>>> Regards,
>>> Bogdan
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.opensips.org/pipermail/users/attachments/20101029/a79890a9/attachment.htm