[OpenSIPS-Users] crashing in 2.2.2

Richard Robson rrobson at greenlightcrm.com
Fri Mar 3 08:15:28 EST 2017


More cores

http://pastebin.com/MXW2VBhi
http://pastebin.com/T7JFAP2U
http://pastebin.com/u44aaVpWquit
http://pastebin.com/SFKKcGxE
http://pastebin.com/dwSgMsJi
http://pastebin.com/9HdGLm96

I've put 2.2.3 on the dev box now and will try to replicate on that box, 
but its difficult to replicate the traffic artificially. I'll try to 
replicate the fault on the dev box over the weekend. I cant do it on the 
live gateways because it will affect customer traffic.

Regards,

Richard


On 03/03/2017 11:28, Richard Robson wrote:
> I've revisited the gateway failover mechanism I had developed in order 
> to re route calls to the next gateway on 500's due to capacity on the 
> gateways we are using.
>
> we have 3 gateways from one carrier and one from another. The 3 have 4 
> cps and will return a 503 or 500 if we breach this. The single gateway 
> from the other carrier has plenty of capacity and should not be a 
> problem so we want to catch this . and route to the next gateway.
>
> We are counting the CPS and channel limits and are routing to the next 
> gateway if we exceed the limit set, but There are still occasions 
> where a 5XX is generated, which results in a rejected call.
>
>
> We want to stop these rejected calls and therefore want to implement 
> the failover mechanism for the 5XX responses. For 6 months we have 
> been failing over if we think the counts are to high on any one 
> gateway without a problem. But when I implement a failover on a 5XX 
> response opensips starts crashing.
>
> It's difficult to generate artificial traffic to mimic the real 
> traffic, but I've not had a problem with the script in testing. Last 
> night I rolled out the new script but by 09:15 this morning opensips 
> started crashing 10 times in 5 minutes. This was as the traffic ramped 
> up. I rolled back the script and it restarted OK and has not crashed 
> since. Therefore the Failover Mechanism in the script is where the 
> crash is happening
>
> Core dump: http://pastebin.com/CqnESCm4
>
> I'll add more dumps later
>
> Regards,
>
> Richard
>
>
> this is the failure route catching the 5XX
>
> failure_route[dr_fo] {
>         xlog (" [dr]  Recieved reply to method $rm From: $fd, $fn, 
> $ft, $fu, $fU, $si, $sp, To: $ru");
>         if (t_was_cancelled()) {
>                 xlog("[dr]call cancelled by internal caller");
>                 rtpengine_manage();
>                 do_accounting("db", "cdr|missed");
>                 exit;
>         }
>
>         if ( t_check_status("[54]03")) {
>                 route(relay_failover);
>         }
>         if ( t_check_status("500")) {
>                 route(relay_failover);
>         }
>
>         do_accounting("db", "cdr|missed");
>         rtpengine_manage();
>         exit;
> }
>
> This is the route taken on the failure
>
>
> route[relay_failover]{
>
>         if (use_next_gw()) {
>                 xlog("[relay_failover-route] Selected Gateway is $rd");
>                 $avp(trunkratelimit)=$(avp(attrs){s.select,0,:});
> $avp(trunkchannellimit)=$(avp(attrs){s.select,1,:});
>
>                 ####### check channel limit ######
>                 get_profile_size("outbound","$rd","$var(size)");
>                 xlog("[relay_failover-route] Selected Gateway is $rd 
> var(size) = $var(size)");
>                 xlog("[relay_failover-route] Selected Gateway is $rd 
> avp(trunkcalllimit) = $avp(trunkchannellimit)");
>                 xlog("[relay_failover-route] Selected Gateway is $rd  
> result = ( $var(size) > $avp(trunkchannellimit))");
>                 if ( $(var(size){s.int}) > 
> $(avp(trunkchannellimit){s.int})) {
>                         xlog("[relay_failover-route] Trunk $rd 
> exceeded $avp(trunkchannellimit) concurrent calls $var(size)");
>                         route(relay_failover);
>                 }
>         } else {
>                send_reply("503", "Gateways Exhusted");
>                exit;
>         }
>
>         ##### We need to check Rate Limiting #######
>         if (!rl_check("$rd", "$(avp(trunkratelimit){s.int})", 
> "TAILDROP")) { # Check Rate limit $avp needs changing
>                 rl_dec_count("$rd"); # decrement the counter since 
> we've not "used" one
>                 xlog("[ratelimiter-route] [Max CPS: 
> $(avp(trunkratelimit){s.int}) Current CPS: $rl_count($rd)] Call to: 
> $rU from: $fU CPS exceeded, delaying");
>                 $avp(initial_time)=($Ts*1000)+($Tsm/1000);
>                 async(usleep("200000"),relay_failover_delay);
>                 xlog ("Should not get here!!!! after async requst");
>         } else {
>                 xlog ("[relay_outbound-route] [Max CPS: 
> $avp(trunkratelimit) Current CPS: $rl_count($rd)] Call to: $rU from: 
> $fU not ratelimited");
>         }
>
>         t_on_failure("dr_fo");
>         do_accounting("db", "cdr|missed");
>         rtpengine_manage();
>         if (!t_relay()) {
>                         xlog("[relay-route] ERROR: Unable to relay");
>                         send_reply("500","Internal Error");
>                         exit;
>         }
> }
>
>
>
>


-- 
Richard Robson
Greenlight Support
01382 843843
support at greenlightcrm.com




More information about the Users mailing list