[OpenSIPS-Users] crash after failover

microx acmicrox at gmail.com
Tue Apr 16 09:26:20 CEST 2013


Hi all, 

In my environment, I use two active SIP proxy servers A and B that share one
backend database for high performance and failover. When active proxy server
A goes down, proxy server B will fetch the dialog information (originally
processed by the proxy server A) from the database into its local memory
when receiving a BYE. Specifically, proxy server B executes
avp_db_query("update dialog set caller_sock='udp:127.0.0.1:5060',
callee_sock='udp:127.0.0.1:5060' where caller_sock='udp:10.100.13.12:5060'")
and then calls exec_msg("/usr/local/sbin/opensipsctl fifo dlg_db_sync").
Note that udp:10.100.13.12:5060 is the processing socket of proxy server A. 

With the above approach, proxy server B can handle BYE requests whose
associated dialogs are established by proxy server A in the beginning.
However, when a number of BYE requests reach proxy server B simultaneously
(60 BYES within one second), proxy server B may crash, resulting from an
ineligible query "Query: update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959". A part of the log is below.

Although the query is not my used avp_db_query, I don't know whether the
root cause is wrong setting of my config for failover. Any comment is
greatly appreciated. 

Best regards, 
Chen-Che 

: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
PQsendQuery(update dialog set
state=4,timeout=1366102836,caller_cseq='0',callee_cseq='1',caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles='O1S1R1#<null>|',script_flags=0,flags=144
where dlg_id=2822250379813) 
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8 
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720) 
: DBG:db_postgres:free_query: PQclear(0x10f0720) result set 
: DBG:core:db_free_rows: freeing 0 rows 
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8 
: DBG:dialog:dialog_update_db: updating existing dialog 0x7f0dc4271750 
: DBG:dialog:set_final_update_cols: DLG vals and profiles should not be
saved[0:0] 
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010
PQsendQuery(update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959) 
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
failed: ERROR:  null value in column "caller_cseq" violates not-null
constraint#012DETAIL:  Failing row contains (2972999775959,
20-31706 at 192.168.8.222, sip:100000039 at thcloud.com, 31706SIPpTag00Caller20,
sip:100000040 at thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
0, 0, null, null, sip:100000039 at 192.168.8.222:6060,
sip:10005 at 192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959 
: DBG:db_postgres:free_query: PQclear(0x10f1360) result set 
: ERROR:db_postgres:db_postgres_submit_query: 0x7f0dec096010 PQsendQuery
Error: ERROR:  null value in column "caller_cseq" violates not-null
constraint#012DETAIL:  Failing row contains (2972999775959,
20-31706 at 192.168.8.222, sip:100000039 at thcloud.com, 31706SIPpTag00Caller20,
sip:100000040 at thcloud.com, 22109SIPpTag01Callee20, null, null, null, null,
0, 0, null, null, sip:100000039 at 192.168.8.222:6060,
sip:10005 at 192.168.8.225:6060;transport=UDP, udp:127.0.0.1:5060,
udp:127.0.0.1:5060, 4, 1366091036, 1366090793, null, null, 0, 0).#012 Query:
update dialog set
state=4,timeout=1366090793,caller_cseq=NULL,callee_cseq=NULL,caller_ping_cseq=0,callee_ping_cseq=0,vars=NULL,profiles=NULL,script_flags=0,flags=0
where dlg_id=2972999775959 
: ERROR:core:db_do_update: error while submitting query 
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8 
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec096010
PQresultStatus(PGRES_FATAL_ERROR) PQgetResult((nil)) 
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010 - invalid
query, execution aborted 
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010:
PGRES_FATAL_ERROR 
: ERROR:db_postgres:db_postgres_store_result: 0x7f0dec096010: 
: DBG:core:db_free_rows: freeing 0 rows 
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8 
: WARNING:db_postgres:db_postgres_update: unexpected result returned 
: ERROR:dialog:dialog_update_db: could not update database info 
: DBG:core:pool_remove: connection still kept in the pool 
Apr 16 13:44:46  /usr/sbin/opensips[18220]: last message repeated 2 times 
: DBG:db_postgres:db_postgres_submit_query: 0x7f0dec095bc0
PQsendQuery(delete from location where expires<'2013-04-16 00:44:47' AND
expires!='1969-12-31 18:00:00') 
: DBG:core:db_new_result: allocate 48 bytes for result set at 0x7f0dec0961a8 
: DBG:db_postgres:db_postgres_store_result: 0x7f0dec095bc0
PQresultStatus(PGRES_COMMAND_OK) PQgetResult(0x10f0720) 
: DBG:db_postgres:free_query: PQclear(0x10f0720) result set 
: DBG:core:db_free_rows: freeing 0 rows 
: DBG:core:db_free_result: freeing result set at 0x7f0dec0961a8 
: DBG:core:pool_remove: removing connection from the pool 
: DBG:db_postgres:db_postgres_free_connection: PQfinish(0x10e7d50) 
: DBG:db_postgres:db_postgres_free_connection: pkg_free(0x7f0dec095cc0) 
: DBG:tm:tm_shutdown: tm_shutdown : start 
: DBG:tm:unlink_timer_lists: emptying DELETE list 
: DBG:tm:tm_shutdown: emptying hash table 
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42f86f8, callback type 4096, id 1
entered 
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fcfb0, callback type 4096, id 1
entered 
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fe9f8, callback type 4096, id 1
entered 
: DBG:tm:run_trans_callbacks: trans=0x7f0dc42fa140, callback type 4096, id 1
entered 
: DBG:tm:run_trans_callbacks: trans=0x7f0dc4272290, callback type 4096, id 1
entered 
: DBG:tm:tm_shutdown: releasing timers 
: DBG:tm:tm_shutdown: removing semaphores 
: DBG:tm:tm_shutdown: destroying callback lists 
: DBG:tm:tm_shutdown: tm_shutdown : done 
Apr 16 13:45:46 qri-cdrc-application-node1 /usr/sbin/opensips[18220]:
CRITICAL:core:sig_alarm_abort: BUG - shutdown timeout triggered, dying... 



--
View this message in context: http://opensips-open-sip-server.1449251.n2.nabble.com/crash-after-failover-tp7585845.html
Sent from the OpenSIPS - Users mailing list archive at Nabble.com.



More information about the Users mailing list