[OpenSIPS-Users] Chasing down unreleased memory
Jock McKechnie
jock.mckechnie at gmail.com
Tue Oct 9 09:09:51 EDT 2018
Good morning Răzvan,
During a standard test, not from OpenSIPS itself - the perl
application that controls the calls makes an API request and in a
small percentage of cases it will relay an ERROR back if the API
failed, but we're talking well below 1% of calls.
The only time OpenSIPS will start throwing ERRORs is if I've let it
run past its ability to clear memory (the reserve of SHM has been used
up) and it gets very upset about not being able to free memory
fragments up for itself until eventually the Linux kernel will OOM
kill the process.
There's no trick to using mc_compact() I'm missing is there? Something
foolish I'm doing that's causing it?
In the main body of the routing script I have a
if (is_method("INVITE") && !has_totag()) {
record_route();
set_flag(22); # SIP tracing
sip_trace("tid", "d", "sip");
route(1);
}
And inside route(1) {} amongst the perl API calls, some AVP
manipulation and R-URI rewrites (as mentioned, $rU/$rd overwriting)
and setting up the callbacks (t_on_failure, t_on_reply, t_on_branch)
and some $acc_extra() setup I have:
mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
And... that's it. One call in the route() block per call.
Is there any kind of additional debugging I could do? I could throw
OpenSIPS into full debug mode and write its log to a RAMdisk or
something, but I fear that might cause other problems due to the extra
load of writing logs - not to mention the insane amount of logging it
would make :) But I'm willing to try anything to help out and solve
this.
As always, I appreciate your time;
- Jock
On Mon, Oct 8, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org> wrote:
> Hi, Jock!
>
> Regarding the SHM issue, are you seeing any ERROR messages in your logs?
>
> Best regards,
> Razvan
>
> On 10/2/18 6:07 PM, Jock McKechnie wrote:
>>
>> Here's the package dump from the test I ran - I'm not able to identify
>> anything "Better" in this output, but I'll leave it up to a trained
>> eye.
>>
>> I git pulled opensips-2.4 and built that with all of the memory
>> debugging included and then ran the same tests I ran yesterday.
>> Hopefully this is the right release, if not, let me know.
>>
>> - Jock
>>
>> Memory status (pkg):
>> qm_status (0x7f4cf82d7010):
>> heap size= 134217728
>> used= 11682552, used+overhead=12043128, free=122174600
>> max used (+overhead)= 12067040
>> INFO:core:sig_usr: signal 15 received
>> Memory status (pkg):
>> qm_status (0x7f4cf82d7010):
>> heap size= 134217728
>> used= 11682552, used+overhead=12043128, free=122174600
>> max used (+overhead)= 12067040
>> dumping summary of all alloc'ed. fragments:
>> ----------------------------------------------------
>> total_bytes | num_allocations x [file: func, line]
>> ----------------------------------------------------
>> 8 : 1 x [compression.c: build_hdr_masks, line 229]
>> 128 : 1 x [compression_helpers.c: parse_whitelist, line 160]
>> 16 : 1 x [ut.h: pkg_str_extend, line 841]
>> 16 : 2 x [script_var.c: add_var, line 59]
>> 2097152 : 1 x [io_wait.c: init_io_wait, line 568]
>> 45056 : 351 x [cfg.lex: addstr, line 890]
>> 64 : 2 x [rr_cb.c: register_rrcb, line 57]
>> 39000 : 195 x [route_struct.c: mk_action, line 106]
>> 13552 : 105 x [pvar.c: pv_parse_format, line 4119]
>> 272 : 17 x [cfg.y: new_string, line 2703]
>> 496 : 12 x [mod_fix.c: fixup_spve, line 948]
>> 16 : 1 x [siptrace.c: sip_trace_fixup, line 1279]
>> 728 : 1 x [siptrace.c: parse_siptrace_id, line 525]
>> 24 : 1 x [compression_helpers.c: search_hdr, line 122]
>> 160 : 10 x [cfg.y: yyparse, line 999]
>> 864 : 18 x [evi/evi_params.c: evi_param_create, line 42]
>> 248 : 10 x [mem/shm_mem.c: init_shm_post_yyparse, line 516]
>> 6291456 : 1 x [io_wait.c: init_io_wait, line 559]
>> 128 : 1 x [compression.c: set_wh_param, line 420]
>> 96 : 2 x [script_var.c: add_var, line 52]
>> 48 : 2 x [sl_cb.c: register_slcb, line 61]
>> 368 : 9 x [sipmsgops.c: fixup_method, line 886]
>> 8 : 1 x [acc_logic.c: do_acc_fixup, line 1221]
>> 64 : 4 x [socket_info.c: fix_socket_list, line 641]
>> 48 : 1 x [sr_module_deps.c: alloc_module_dep, line 54]
>> 8 : 1 x [compression_helpers.c: parse_whitelist, line 167]
>> 704 : 6 x [route.c: fix_expr, line 189]
>> 48 : 1 x [ipc.c: ipc_register_handler, line 120]
>> 80 : 1 x [mi/mi_trace.c: try_load_trace_api, line 55]
>> 32 : 1 x [transformations.c: tr_parse_nparam, line 2490]
>> 1600 : 20 x [pvar.c: pv_add_extra, line 4690]
>> 8 : 1 x [compression.c: build_hdr_masks, line 242]
>> 1832 : 15 x [route.c: fix_actions, line 750]
>> 144 : 16 x [map.c: map_get, line 150]
>> 32 : 1 x [map.c: map_create, line 79]
>> 3145728 : 1 x [io_wait.c: init_io_wait, line 621]
>> 32 : 1 x [transformations.c: tr_parse_nparam, line 2514]
>> 8192 : 1 x [mi_writer.c: mi_writer_init, line 49]
>> 48 : 1 x [dlg_vals.c: fetch_dlg_value, line 180]
>> 40 : 1 x [flags.c: get_flag_id_by_name, line 202]
>> 64 : 4 x [cfg.y: yyparse, line 514]
>> 192 : 4 x [cfg.y: mk_listen_id, line 2685]
>> 16 : 1 x [acc_logic.c: do_acc_fixup, line 1184]
>> 240 : 6 x [acc_extra.c: add_extra, line 155]
>> 64 : 4 x [socket_info.c: new_sock_info, line 125]
>> 960 : 4 x [socket_info.c: new_sock_info, line 119]
>> 8320 : 80 x [cfg.y: yyparse, line 1456]
>> 24 : 1 x [dlg_hash.c: state_changed_event_init, line 863]
>> 144 : 3 x [transformations.c: parse_transformation, line 2376]
>> 304 : 5 x [script_cb.c: add_callback, line 60]
>> 3744 : 1 x [mi/mi.c: register_mi_cmd, line 146]
>> 864 : 18 x [sr_module.c: register_module, line 150]
>> 104 : 4 x [socket_info.c: fix_socket_list, line 720]
>> 6720 : 120 x [route_struct.c: mk_elem, line 70]
>> 96 : 4 x [cfg.y: yyparse, line 1175]
>> 56 : 7 x [cfg.y: yyparse, line 1236]
>> 24 : 1 x [acc.c: init_acc_evi, line 1094]
>> 768 : 7 x [mod_fix.c: fixup_regexp, line 310]
>> 208 : 2 x [context.c: register_context_destroy, line 66]
>> 32 : 4 x [socket_info.c: fix_socket_list, line 590]
>> 96 : 1 x [context.c: context_alloc, line 119]
>> 8 : 1 x [pvar.c: pv_init_extra_list, line 4626]
>> 104 : 1 x [transformations.c: tr_parse_nparam, line 2476]
>> 160 : 1 x [acc_extra.c: add_tag, line 111]
>> 4104 : 1 x [xlog.c: buf_init, line 69]
>> 1232 : 1 x [compression_helpers.c: wrap_realloc, line 444]
>> 16 : 1 x [io_wait.c: init_io_wait, line 576]
>> 4368 : 78 x [route_struct.c: mk_exp, line 54]
>> 896 : 16 x [map.c: map_get, line 139]
>> 80 : 2 x [pvar.c: new_pv_context, line 4765]
>> ----------------------------------------------------
>> dumping free list stats :
>> hash= 380. fragments no.: 1, unused: 0#012#011#011 bucket
>> size: 3040 - 3040 (first 3040)
>> hash= 462. fragments no.: 1, unused: 0#012#011#011 bucket
>> size: 3696 - 3696 (first 3696)
>> hash= 492. fragments no.: 1, unused: 0#012#011#011 bucket
>> size: 3936 - 3936 (first 3936)
>> hash= 1655. fragments no.: 1, unused: 0#012#011#011 bucket
>> size: 13240 - 13240 (first 13240)
>> hash= 2061. fragments no.: 1, unused: 0#012#011#011 bucket
>> size: 67108864 - 134217728 (first 122150688)
>> -----------------------------
>>
>> On Tue, Oct 2, 2018 at 8:19 AM, Jock McKechnie <jock.mckechnie at gmail.com>
>> wrote:
>>>
>>> Good morning Răzvan,
>>>
>>> I completely understand being busy, I was confident you weren't just
>>> ignoring me :)
>>>
>>> I will build the latest from git with the debugging bits and pieces
>>> and report back after testing.
>>>
>>> I had been running the latest 2.4.2 off yum.opensips.org when I
>>> originally identified the problem, but have since hand compiled it to
>>> enable all the debugging widgets:
>>> version: opensips 2.4.2 (x86_64/linux)
>>> flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
>>> PKG_MALLOC, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>>> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
>>> MAX_URI_SIZE 1024, BUF_SIZE 65535
>>> poll method support: poll, epoll, sigio_rt, select.
>>> git revision: 66114a66c
>>> main.c compiled on 07:56:12 Sep 20 2018 with gcc 4.8.5
>>>
>>> I'll get back to you soon;
>>>
>>> - Jock
>>>
>>> On Tue, Oct 2, 2018 at 4:00 AM, Răzvan Crainea <razvan at opensips.org>
>>> wrote:
>>>>
>>>> Hi, Jock!
>>>>
>>>> Can you confirm exactly the opensips version you are running? (opensips
>>>> -V).
>>>>
>>>> Best regards,
>>>> Razvan
>>>>
>>>>
>>>> On 10/2/18 11:41 AM, Răzvan Crainea wrote:
>>>>>
>>>>>
>>>>> Hi, Jock!
>>>>>
>>>>> Good news :) I've just figured out the pkg memleak in mc_compact, and
>>>>> I've
>>>>> already pushed a fix upstream. Please pull the latest version and run a
>>>>> few
>>>>> tests, and let us know how they go.
>>>>> In the meantime, I will investigate the shm leak.
>>>>>
>>>>> Best regards,
>>>>> Razvan
>>>>>
>>>>> On 10/2/18 10:13 AM, Răzvan Crainea wrote:
>>>>>>
>>>>>>
>>>>>> Hi, Jock!
>>>>>>
>>>>>> I am pretty busy these days, so I couldn't pay too much attention to
>>>>>> this
>>>>>> thread. Nevertheless, I will try to prioritize it this week and get
>>>>>> back to
>>>>>> you.
>>>>>> I have also tracked down the pkg mem leak to the mc_compact() module,
>>>>>> but
>>>>>> I can't pinpoint for now exactly the issue. Although I have some
>>>>>> assumptions, I can't be sure until I run some tests.
>>>>>> If I need extra info during my debug, I will definitely let you know,
>>>>>> so
>>>>>> keep an eye on this thread.
>>>>>>
>>>>>> Best regards,
>>>>>> Răzvan
>>>>>>
>>>>>> On 10/1/18 9:14 PM, Jock McKechnie wrote:
>>>>>>>
>>>>>>>
>>>>>>> Greetings;
>>>>>>>
>>>>>>> I should have done this last week but, to be honest, it just didn't
>>>>>>> occur to me. I pulled the compression module out (and commented out
>>>>>>> mc_compact()) and re-ran my tests and the memory usage was massively
>>>>>>> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
>>>>>>> few minutes of calling ceasing almost all of the memory was released
>>>>>>> (went from 42MB down to 3.8MB within 6 minutes).
>>>>>>>
>>>>>>> So it appears that there's something leaking in the compression
>>>>>>> module.
>>>>>>>
>>>>>>> I'm going to poke around a few more tests (include compression.so
>>>>>>> without actually calling it, etc) but I wanted to pass this on as
>>>>>>> there appears to be a bug that needs addressing.
>>>>>>>
>>>>>>> Is there anything I can provide that might be useful in debugging
>>>>>>> this?
>>>>>>>
>>>>>>> As always, thank you for your help.
>>>>>>>
>>>>>>> - Jock
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
>>>>>>> <jock.mckechnie at gmail.com> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Thank you very much, Răzvan;
>>>>>>>>
>>>>>>>> We are using the compression module for mc_compact(). We use SIP
>>>>>>>> headers for passing around data (don't ask) and the compact headers
>>>>>>>> are mandatory or else we hit MTU/fragmentation problems. We're on
>>>>>>>> the
>>>>>>>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>>>>>>>> The mc_compact() lives in the main route[1] block and is simply
>>>>>>>> this,
>>>>>>>> in whole:
>>>>>>>> mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>>>>>>>
>>>>>>>> And my apologies for not including the shm dump segment - please
>>>>>>>> find
>>>>>>>> it below.
>>>>>>>> https://pastebin.com/itSTER7Y
>>>>>>>>
>>>>>>>> Let me know what else I can provide, this is fairly easy to
>>>>>>>> replicate
>>>>>>>> (although it takes about an hour of runtime pummeling it to get nice
>>>>>>>> easy to read dumps). Thank you, again, very much.
>>>>>>>>
>>>>>>>> - Jock
>>>>>>>>
>>>>>>>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea
>>>>>>>> <razvan at opensips.org>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi, Jock!
>>>>>>>>>
>>>>>>>>> I think you are actually heating two memory leaks: one in private
>>>>>>>>> memory and
>>>>>>>>> one in shared memory.
>>>>>>>>> In the memory dump you sent I can spot a pkg mem related to (I
>>>>>>>>> believe) the
>>>>>>>>> compression module. Can you tell me what functions of the
>>>>>>>>> compression
>>>>>>>>> module
>>>>>>>>> you are using and what flags are you using for them?
>>>>>>>>>
>>>>>>>>> Besides that, you claim there is a shared memory leak as well. Can
>>>>>>>>> you
>>>>>>>>> send
>>>>>>>>> us the logs for the shared memory? They are dumped by the first
>>>>>>>>> process
>>>>>>>>> (attendant) and should start with the string "Memory status
>>>>>>>>> (shm):".
>>>>>>>>> Can you
>>>>>>>>> post those logs as well?
>>>>>>>>>
>>>>>>>>> Best regards,
>>>>>>>>> Razvan
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Greetings OpenSIPS gurus;
>>>>>>>>>>
>>>>>>>>>> I've run into an issue that I'm not making any real headway with
>>>>>>>>>> on
>>>>>>>>>> my
>>>>>>>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not
>>>>>>>>>> appear
>>>>>>>>>> to be releasing the bulk of its shared memory when calls die off.
>>>>>>>>>> Eventually OpenSIPS runs itself out of shared memory and goes
>>>>>>>>>> beserk
>>>>>>>>>> when it can't reclaim memory for new transactions, or I've also
>>>>>>>>>> seen
>>>>>>>>>> the kernel oomkiller take it down.
>>>>>>>>>>
>>>>>>>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl
>>>>>>>>>> living
>>>>>>>>>> on its back that handles some API directed call decisions. Because
>>>>>>>>>> of
>>>>>>>>>> the inevitable delay in the API lookups I am running a large
>>>>>>>>>> number
>>>>>>>>>> of
>>>>>>>>>> threads (100) to ensure that I don't start dropping packets under
>>>>>>>>>> high
>>>>>>>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>>>>>>>> seeing packets being ignored which got the End User very
>>>>>>>>>> excitable).
>>>>>>>>>> I
>>>>>>>>>> appreciate having lots of children amplifies memory usage but my
>>>>>>>>>> belief is during regular load on our side (~6000 calls at ~30 -
>>>>>>>>>> 40CPS)
>>>>>>>>>> I should have a full day's runtime easily w/o running into
>>>>>>>>>> problems.
>>>>>>>>>> The issue is that because the memory is not released overnight it
>>>>>>>>>> starts the next day with half its shared memory gone, and then
>>>>>>>>>> things
>>>>>>>>>> go to pot the next day when it runs through all its remaining
>>>>>>>>>> shared
>>>>>>>>>> memory.
>>>>>>>>>>
>>>>>>>>>> I have it configured for a very large (I think) amount of memory
>>>>>>>>>> to
>>>>>>>>>> attempt to compensate short-term - 2048MB shared memory, 128MB
>>>>>>>>>> package
>>>>>>>>>> memory. This is not a well thought-out number, just me throwing
>>>>>>>>>> RAM
>>>>>>>>>> at
>>>>>>>>>> the problem. (For reference this is on a VM running with 4 cores
>>>>>>>>>> and
>>>>>>>>>> 4096MB of RAM)
>>>>>>>>>>
>>>>>>>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>>>>>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>>>>>>>> https://imgur.com/a/oapKYJW
>>>>>>>>>>
>>>>>>>>>> You'll see that the first graph, the main shmem:, shows the usage
>>>>>>>>>> gets
>>>>>>>>>> tied up and remains after calls die off (these examples run for 20
>>>>>>>>>> minutes after the calls stopped - but I can leave it overnight and
>>>>>>>>>> the
>>>>>>>>>> shmem: usage won't drop). This being said, the other graphed
>>>>>>>>>> modules
>>>>>>>>>> show that it releases memory correctly after calls quit like they
>>>>>>>>>> should. None of the other modules graphed (eg, _all_ of the
>>>>>>>>>> modules I
>>>>>>>>>> was using in OpenSIPS) showed no significant usage and I've
>>>>>>>>>> omitted
>>>>>>>>>> them.
>>>>>>>>>>
>>>>>>>>>> Interesting that the 'perl' module shows no stats ever which makes
>>>>>>>>>> me
>>>>>>>>>> wonder if the stats on this module works. I am not storing
>>>>>>>>>> anything
>>>>>>>>>> inside perl, besides lots of local session variables, and all
>>>>>>>>>> channel
>>>>>>>>>> variables are pushed back into OpenSIPS AVPs, but I assume there
>>>>>>>>>> must
>>>>>>>>>> be some memory overhead, although it may not significantly change.
>>>>>>>>>>
>>>>>>>>>> When I shut OpenSIPS down I get child dumps like the following:
>>>>>>>>>> (I've
>>>>>>>>>> included only one for brevity, but they're all much of a muchness)
>>>>>>>>>> https://pastebin.com/KG2qxxBa
>>>>>>>>>>
>>>>>>>>>> If I'm reading this right (and there's a strong possibility I'm
>>>>>>>>>> misunderstanding what I'm looking at) the memory is tied up in
>>>>>>>>>> io_wait
>>>>>>>>>> and the parser. I'm not sure what this signifies, however.
>>>>>>>>>>
>>>>>>>>>> So I guess what I'm asking is:
>>>>>>>>>> Am I understanding what I'm looking at correctly and that OpenSIPS
>>>>>>>>>> is
>>>>>>>>>> not releasing its shared memory which causes it to run out of
>>>>>>>>>> memory
>>>>>>>>>> quicker the following day? (Which lines up with the symptoms of
>>>>>>>>>> shorter runtimes before failure)
>>>>>>>>>>
>>>>>>>>>> Based on the dumps where should I look for the source of this
>>>>>>>>>> issue?
>>>>>>>>>>
>>>>>>>>>> Finally, and unrelated to the above, I attempted to mail the
>>>>>>>>>> official
>>>>>>>>>> OpenSIPS peeps to talk about paid support but none of the 'contact
>>>>>>>>>> us'
>>>>>>>>>> submits appear to work - when I click on the submit button it
>>>>>>>>>> changes
>>>>>>>>>> shade but does not appear to act. Certainly I never received any
>>>>>>>>>> response. If there was a direct eMail address I could send to that
>>>>>>>>>> would be preferable.
>>>>>>>>>>
>>>>>>>>>> As always, my thanks for your time.
>>>>>>>>>>
>>>>>>>>>> - Jock
>>>>>>>>>>
>>>>>>>>>> PS -
>>>>>>>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>>>>>>>> loadmodule "sl.so"
>>>>>>>>>> loadmodule "tm.so"
>>>>>>>>>> loadmodule "rr.so"
>>>>>>>>>> loadmodule "maxfwd.so"
>>>>>>>>>> loadmodule "textops.so"
>>>>>>>>>> loadmodule "uri.so"
>>>>>>>>>> loadmodule "mi_fifo.so"
>>>>>>>>>> loadmodule "siptrace.so"
>>>>>>>>>> loadmodule "sipmsgops.so"
>>>>>>>>>> loadmodule "dialog.so"
>>>>>>>>>> loadmodule "uac_auth.so"
>>>>>>>>>> loadmodule "uac.so"
>>>>>>>>>> loadmodule "perl.so"
>>>>>>>>>> loadmodule "acc.so"
>>>>>>>>>> loadmodule "proto_hep.so"
>>>>>>>>>> loadmodule "proto_udp.so"
>>>>>>>>>> loadmodule "compression.so"
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Users mailing list
>>>>>>>>>> Users at lists.opensips.org
>>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Răzvan Crainea
>>>>>>>>> OpenSIPS Core Developer
>>>>>>>>> http://www.opensips-solutions.com
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.opensips.org
>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at lists.opensips.org
>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Răzvan Crainea
>>>> OpenSIPS Core Developer
>>>> http://www.opensips-solutions.com
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>
>
> --
> Răzvan Crainea
> OpenSIPS Core Developer
> http://www.opensips-solutions.com
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
More information about the Users
mailing list