[OpenSIPS-Users] Chasing down unreleased memory
Răzvan Crainea
razvan at opensips.org
Mon Oct 8 04:01:22 EDT 2018
Hi, Jock!
Regarding the SHM issue, are you seeing any ERROR messages in your logs?
Best regards,
Razvan
On 10/2/18 6:07 PM, Jock McKechnie wrote:
> Here's the package dump from the test I ran - I'm not able to identify
> anything "Better" in this output, but I'll leave it up to a trained
> eye.
>
> I git pulled opensips-2.4 and built that with all of the memory
> debugging included and then ran the same tests I ran yesterday.
> Hopefully this is the right release, if not, let me know.
>
> - Jock
>
> Memory status (pkg):
> qm_status (0x7f4cf82d7010):
> heap size= 134217728
> used= 11682552, used+overhead=12043128, free=122174600
> max used (+overhead)= 12067040
> INFO:core:sig_usr: signal 15 received
> Memory status (pkg):
> qm_status (0x7f4cf82d7010):
> heap size= 134217728
> used= 11682552, used+overhead=12043128, free=122174600
> max used (+overhead)= 12067040
> dumping summary of all alloc'ed. fragments:
> ----------------------------------------------------
> total_bytes | num_allocations x [file: func, line]
> ----------------------------------------------------
> 8 : 1 x [compression.c: build_hdr_masks, line 229]
> 128 : 1 x [compression_helpers.c: parse_whitelist, line 160]
> 16 : 1 x [ut.h: pkg_str_extend, line 841]
> 16 : 2 x [script_var.c: add_var, line 59]
> 2097152 : 1 x [io_wait.c: init_io_wait, line 568]
> 45056 : 351 x [cfg.lex: addstr, line 890]
> 64 : 2 x [rr_cb.c: register_rrcb, line 57]
> 39000 : 195 x [route_struct.c: mk_action, line 106]
> 13552 : 105 x [pvar.c: pv_parse_format, line 4119]
> 272 : 17 x [cfg.y: new_string, line 2703]
> 496 : 12 x [mod_fix.c: fixup_spve, line 948]
> 16 : 1 x [siptrace.c: sip_trace_fixup, line 1279]
> 728 : 1 x [siptrace.c: parse_siptrace_id, line 525]
> 24 : 1 x [compression_helpers.c: search_hdr, line 122]
> 160 : 10 x [cfg.y: yyparse, line 999]
> 864 : 18 x [evi/evi_params.c: evi_param_create, line 42]
> 248 : 10 x [mem/shm_mem.c: init_shm_post_yyparse, line 516]
> 6291456 : 1 x [io_wait.c: init_io_wait, line 559]
> 128 : 1 x [compression.c: set_wh_param, line 420]
> 96 : 2 x [script_var.c: add_var, line 52]
> 48 : 2 x [sl_cb.c: register_slcb, line 61]
> 368 : 9 x [sipmsgops.c: fixup_method, line 886]
> 8 : 1 x [acc_logic.c: do_acc_fixup, line 1221]
> 64 : 4 x [socket_info.c: fix_socket_list, line 641]
> 48 : 1 x [sr_module_deps.c: alloc_module_dep, line 54]
> 8 : 1 x [compression_helpers.c: parse_whitelist, line 167]
> 704 : 6 x [route.c: fix_expr, line 189]
> 48 : 1 x [ipc.c: ipc_register_handler, line 120]
> 80 : 1 x [mi/mi_trace.c: try_load_trace_api, line 55]
> 32 : 1 x [transformations.c: tr_parse_nparam, line 2490]
> 1600 : 20 x [pvar.c: pv_add_extra, line 4690]
> 8 : 1 x [compression.c: build_hdr_masks, line 242]
> 1832 : 15 x [route.c: fix_actions, line 750]
> 144 : 16 x [map.c: map_get, line 150]
> 32 : 1 x [map.c: map_create, line 79]
> 3145728 : 1 x [io_wait.c: init_io_wait, line 621]
> 32 : 1 x [transformations.c: tr_parse_nparam, line 2514]
> 8192 : 1 x [mi_writer.c: mi_writer_init, line 49]
> 48 : 1 x [dlg_vals.c: fetch_dlg_value, line 180]
> 40 : 1 x [flags.c: get_flag_id_by_name, line 202]
> 64 : 4 x [cfg.y: yyparse, line 514]
> 192 : 4 x [cfg.y: mk_listen_id, line 2685]
> 16 : 1 x [acc_logic.c: do_acc_fixup, line 1184]
> 240 : 6 x [acc_extra.c: add_extra, line 155]
> 64 : 4 x [socket_info.c: new_sock_info, line 125]
> 960 : 4 x [socket_info.c: new_sock_info, line 119]
> 8320 : 80 x [cfg.y: yyparse, line 1456]
> 24 : 1 x [dlg_hash.c: state_changed_event_init, line 863]
> 144 : 3 x [transformations.c: parse_transformation, line 2376]
> 304 : 5 x [script_cb.c: add_callback, line 60]
> 3744 : 1 x [mi/mi.c: register_mi_cmd, line 146]
> 864 : 18 x [sr_module.c: register_module, line 150]
> 104 : 4 x [socket_info.c: fix_socket_list, line 720]
> 6720 : 120 x [route_struct.c: mk_elem, line 70]
> 96 : 4 x [cfg.y: yyparse, line 1175]
> 56 : 7 x [cfg.y: yyparse, line 1236]
> 24 : 1 x [acc.c: init_acc_evi, line 1094]
> 768 : 7 x [mod_fix.c: fixup_regexp, line 310]
> 208 : 2 x [context.c: register_context_destroy, line 66]
> 32 : 4 x [socket_info.c: fix_socket_list, line 590]
> 96 : 1 x [context.c: context_alloc, line 119]
> 8 : 1 x [pvar.c: pv_init_extra_list, line 4626]
> 104 : 1 x [transformations.c: tr_parse_nparam, line 2476]
> 160 : 1 x [acc_extra.c: add_tag, line 111]
> 4104 : 1 x [xlog.c: buf_init, line 69]
> 1232 : 1 x [compression_helpers.c: wrap_realloc, line 444]
> 16 : 1 x [io_wait.c: init_io_wait, line 576]
> 4368 : 78 x [route_struct.c: mk_exp, line 54]
> 896 : 16 x [map.c: map_get, line 139]
> 80 : 2 x [pvar.c: new_pv_context, line 4765]
> ----------------------------------------------------
> dumping free list stats :
> hash= 380. fragments no.: 1, unused: 0#012#011#011 bucket
> size: 3040 - 3040 (first 3040)
> hash= 462. fragments no.: 1, unused: 0#012#011#011 bucket
> size: 3696 - 3696 (first 3696)
> hash= 492. fragments no.: 1, unused: 0#012#011#011 bucket
> size: 3936 - 3936 (first 3936)
> hash= 1655. fragments no.: 1, unused: 0#012#011#011 bucket
> size: 13240 - 13240 (first 13240)
> hash= 2061. fragments no.: 1, unused: 0#012#011#011 bucket
> size: 67108864 - 134217728 (first 122150688)
> -----------------------------
>
> On Tue, Oct 2, 2018 at 8:19 AM, Jock McKechnie <jock.mckechnie at gmail.com> wrote:
>> Good morning Răzvan,
>>
>> I completely understand being busy, I was confident you weren't just
>> ignoring me :)
>>
>> I will build the latest from git with the debugging bits and pieces
>> and report back after testing.
>>
>> I had been running the latest 2.4.2 off yum.opensips.org when I
>> originally identified the problem, but have since hand compiled it to
>> enable all the debugging widgets:
>> version: opensips 2.4.2 (x86_64/linux)
>> flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
>> PKG_MALLOC, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
>> MAX_URI_SIZE 1024, BUF_SIZE 65535
>> poll method support: poll, epoll, sigio_rt, select.
>> git revision: 66114a66c
>> main.c compiled on 07:56:12 Sep 20 2018 with gcc 4.8.5
>>
>> I'll get back to you soon;
>>
>> - Jock
>>
>> On Tue, Oct 2, 2018 at 4:00 AM, Răzvan Crainea <razvan at opensips.org> wrote:
>>> Hi, Jock!
>>>
>>> Can you confirm exactly the opensips version you are running? (opensips -V).
>>>
>>> Best regards,
>>> Razvan
>>>
>>>
>>> On 10/2/18 11:41 AM, Răzvan Crainea wrote:
>>>>
>>>> Hi, Jock!
>>>>
>>>> Good news :) I've just figured out the pkg memleak in mc_compact, and I've
>>>> already pushed a fix upstream. Please pull the latest version and run a few
>>>> tests, and let us know how they go.
>>>> In the meantime, I will investigate the shm leak.
>>>>
>>>> Best regards,
>>>> Razvan
>>>>
>>>> On 10/2/18 10:13 AM, Răzvan Crainea wrote:
>>>>>
>>>>> Hi, Jock!
>>>>>
>>>>> I am pretty busy these days, so I couldn't pay too much attention to this
>>>>> thread. Nevertheless, I will try to prioritize it this week and get back to
>>>>> you.
>>>>> I have also tracked down the pkg mem leak to the mc_compact() module, but
>>>>> I can't pinpoint for now exactly the issue. Although I have some
>>>>> assumptions, I can't be sure until I run some tests.
>>>>> If I need extra info during my debug, I will definitely let you know, so
>>>>> keep an eye on this thread.
>>>>>
>>>>> Best regards,
>>>>> Răzvan
>>>>>
>>>>> On 10/1/18 9:14 PM, Jock McKechnie wrote:
>>>>>>
>>>>>> Greetings;
>>>>>>
>>>>>> I should have done this last week but, to be honest, it just didn't
>>>>>> occur to me. I pulled the compression module out (and commented out
>>>>>> mc_compact()) and re-ran my tests and the memory usage was massively
>>>>>> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
>>>>>> few minutes of calling ceasing almost all of the memory was released
>>>>>> (went from 42MB down to 3.8MB within 6 minutes).
>>>>>>
>>>>>> So it appears that there's something leaking in the compression module.
>>>>>>
>>>>>> I'm going to poke around a few more tests (include compression.so
>>>>>> without actually calling it, etc) but I wanted to pass this on as
>>>>>> there appears to be a bug that needs addressing.
>>>>>>
>>>>>> Is there anything I can provide that might be useful in debugging this?
>>>>>>
>>>>>> As always, thank you for your help.
>>>>>>
>>>>>> - Jock
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
>>>>>> <jock.mckechnie at gmail.com> wrote:
>>>>>>>
>>>>>>> Thank you very much, Răzvan;
>>>>>>>
>>>>>>> We are using the compression module for mc_compact(). We use SIP
>>>>>>> headers for passing around data (don't ask) and the compact headers
>>>>>>> are mandatory or else we hit MTU/fragmentation problems. We're on the
>>>>>>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>>>>>>> The mc_compact() lives in the main route[1] block and is simply this,
>>>>>>> in whole:
>>>>>>> mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>>>>>>
>>>>>>> And my apologies for not including the shm dump segment - please find
>>>>>>> it below.
>>>>>>> https://pastebin.com/itSTER7Y
>>>>>>>
>>>>>>> Let me know what else I can provide, this is fairly easy to replicate
>>>>>>> (although it takes about an hour of runtime pummeling it to get nice
>>>>>>> easy to read dumps). Thank you, again, very much.
>>>>>>>
>>>>>>> - Jock
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi, Jock!
>>>>>>>>
>>>>>>>> I think you are actually heating two memory leaks: one in private
>>>>>>>> memory and
>>>>>>>> one in shared memory.
>>>>>>>> In the memory dump you sent I can spot a pkg mem related to (I
>>>>>>>> believe) the
>>>>>>>> compression module. Can you tell me what functions of the compression
>>>>>>>> module
>>>>>>>> you are using and what flags are you using for them?
>>>>>>>>
>>>>>>>> Besides that, you claim there is a shared memory leak as well. Can you
>>>>>>>> send
>>>>>>>> us the logs for the shared memory? They are dumped by the first
>>>>>>>> process
>>>>>>>> (attendant) and should start with the string "Memory status (shm):".
>>>>>>>> Can you
>>>>>>>> post those logs as well?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Razvan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greetings OpenSIPS gurus;
>>>>>>>>>
>>>>>>>>> I've run into an issue that I'm not making any real headway with on
>>>>>>>>> my
>>>>>>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not appear
>>>>>>>>> to be releasing the bulk of its shared memory when calls die off.
>>>>>>>>> Eventually OpenSIPS runs itself out of shared memory and goes beserk
>>>>>>>>> when it can't reclaim memory for new transactions, or I've also seen
>>>>>>>>> the kernel oomkiller take it down.
>>>>>>>>>
>>>>>>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl living
>>>>>>>>> on its back that handles some API directed call decisions. Because of
>>>>>>>>> the inevitable delay in the API lookups I am running a large number
>>>>>>>>> of
>>>>>>>>> threads (100) to ensure that I don't start dropping packets under
>>>>>>>>> high
>>>>>>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>>>>>>> seeing packets being ignored which got the End User very excitable).
>>>>>>>>> I
>>>>>>>>> appreciate having lots of children amplifies memory usage but my
>>>>>>>>> belief is during regular load on our side (~6000 calls at ~30 -
>>>>>>>>> 40CPS)
>>>>>>>>> I should have a full day's runtime easily w/o running into problems.
>>>>>>>>> The issue is that because the memory is not released overnight it
>>>>>>>>> starts the next day with half its shared memory gone, and then things
>>>>>>>>> go to pot the next day when it runs through all its remaining shared
>>>>>>>>> memory.
>>>>>>>>>
>>>>>>>>> I have it configured for a very large (I think) amount of memory to
>>>>>>>>> attempt to compensate short-term - 2048MB shared memory, 128MB
>>>>>>>>> package
>>>>>>>>> memory. This is not a well thought-out number, just me throwing RAM
>>>>>>>>> at
>>>>>>>>> the problem. (For reference this is on a VM running with 4 cores and
>>>>>>>>> 4096MB of RAM)
>>>>>>>>>
>>>>>>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>>>>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>>>>>>> https://imgur.com/a/oapKYJW
>>>>>>>>>
>>>>>>>>> You'll see that the first graph, the main shmem:, shows the usage
>>>>>>>>> gets
>>>>>>>>> tied up and remains after calls die off (these examples run for 20
>>>>>>>>> minutes after the calls stopped - but I can leave it overnight and
>>>>>>>>> the
>>>>>>>>> shmem: usage won't drop). This being said, the other graphed modules
>>>>>>>>> show that it releases memory correctly after calls quit like they
>>>>>>>>> should. None of the other modules graphed (eg, _all_ of the modules I
>>>>>>>>> was using in OpenSIPS) showed no significant usage and I've omitted
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> Interesting that the 'perl' module shows no stats ever which makes me
>>>>>>>>> wonder if the stats on this module works. I am not storing anything
>>>>>>>>> inside perl, besides lots of local session variables, and all channel
>>>>>>>>> variables are pushed back into OpenSIPS AVPs, but I assume there must
>>>>>>>>> be some memory overhead, although it may not significantly change.
>>>>>>>>>
>>>>>>>>> When I shut OpenSIPS down I get child dumps like the following: (I've
>>>>>>>>> included only one for brevity, but they're all much of a muchness)
>>>>>>>>> https://pastebin.com/KG2qxxBa
>>>>>>>>>
>>>>>>>>> If I'm reading this right (and there's a strong possibility I'm
>>>>>>>>> misunderstanding what I'm looking at) the memory is tied up in
>>>>>>>>> io_wait
>>>>>>>>> and the parser. I'm not sure what this signifies, however.
>>>>>>>>>
>>>>>>>>> So I guess what I'm asking is:
>>>>>>>>> Am I understanding what I'm looking at correctly and that OpenSIPS is
>>>>>>>>> not releasing its shared memory which causes it to run out of memory
>>>>>>>>> quicker the following day? (Which lines up with the symptoms of
>>>>>>>>> shorter runtimes before failure)
>>>>>>>>>
>>>>>>>>> Based on the dumps where should I look for the source of this issue?
>>>>>>>>>
>>>>>>>>> Finally, and unrelated to the above, I attempted to mail the official
>>>>>>>>> OpenSIPS peeps to talk about paid support but none of the 'contact
>>>>>>>>> us'
>>>>>>>>> submits appear to work - when I click on the submit button it changes
>>>>>>>>> shade but does not appear to act. Certainly I never received any
>>>>>>>>> response. If there was a direct eMail address I could send to that
>>>>>>>>> would be preferable.
>>>>>>>>>
>>>>>>>>> As always, my thanks for your time.
>>>>>>>>>
>>>>>>>>> - Jock
>>>>>>>>>
>>>>>>>>> PS -
>>>>>>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>>>>>>> loadmodule "sl.so"
>>>>>>>>> loadmodule "tm.so"
>>>>>>>>> loadmodule "rr.so"
>>>>>>>>> loadmodule "maxfwd.so"
>>>>>>>>> loadmodule "textops.so"
>>>>>>>>> loadmodule "uri.so"
>>>>>>>>> loadmodule "mi_fifo.so"
>>>>>>>>> loadmodule "siptrace.so"
>>>>>>>>> loadmodule "sipmsgops.so"
>>>>>>>>> loadmodule "dialog.so"
>>>>>>>>> loadmodule "uac_auth.so"
>>>>>>>>> loadmodule "uac.so"
>>>>>>>>> loadmodule "perl.so"
>>>>>>>>> loadmodule "acc.so"
>>>>>>>>> loadmodule "proto_hep.so"
>>>>>>>>> loadmodule "proto_udp.so"
>>>>>>>>> loadmodule "compression.so"
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.opensips.org
>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Răzvan Crainea
>>>>>>>> OpenSIPS Core Developer
>>>>>>>> http://www.opensips-solutions.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opensips.org
>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Răzvan Crainea
>>> OpenSIPS Core Developer
>>> http://www.opensips-solutions.com
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>
--
Răzvan Crainea
OpenSIPS Core Developer
http://www.opensips-solutions.com
More information about the Users
mailing list