[OpenSIPS-Users] Chasing down unreleased memory

Răzvan Crainea razvan at opensips.org
Tue Oct 2 11:32:20 EDT 2018


Hi, Jock!

There is actually something better - I am not seeing two of the 
fragments that were leaking before - nor the one in get_hdr_field() 
(8389 unreleased fragments) nor the one in parse_via(). So I'd argue 
that the pkg leak was fixed.
Still pending the shm leak, which we believe to be related to the 
local_route. Can you tell us whether you are using local_route, and what 
kind of operations you're doing inside it?

Best regards,
Răzvan

On 10/2/18 6:07 PM, Jock McKechnie wrote:
> Here's the package dump from the test I ran - I'm not able to identify
> anything "Better" in this output, but I'll leave it up to a trained
> eye.
> 
> I git pulled opensips-2.4 and built that with all of the memory
> debugging included and then ran the same tests I ran yesterday.
> Hopefully this is the right release, if not, let me know.
> 
>   - Jock
> 
> Memory status (pkg):
>   qm_status (0x7f4cf82d7010):
>    heap size= 134217728
>    used= 11682552, used+overhead=12043128, free=122174600
>    max used (+overhead)= 12067040
>   INFO:core:sig_usr: signal 15 received
>   Memory status (pkg):
>   qm_status (0x7f4cf82d7010):
>    heap size= 134217728
>    used= 11682552, used+overhead=12043128, free=122174600
>    max used (+overhead)= 12067040
>    dumping summary of all alloc'ed. fragments:
>   ----------------------------------------------------
>   total_bytes | num_allocations x [file: func, line]
>   ----------------------------------------------------
>             8 : 1 x [compression.c: build_hdr_masks, line 229]
>           128 : 1 x [compression_helpers.c: parse_whitelist, line 160]
>            16 : 1 x [ut.h: pkg_str_extend, line 841]
>            16 : 2 x [script_var.c: add_var, line 59]
>       2097152 : 1 x [io_wait.c: init_io_wait, line 568]
>         45056 : 351 x [cfg.lex: addstr, line 890]
>            64 : 2 x [rr_cb.c: register_rrcb, line 57]
>         39000 : 195 x [route_struct.c: mk_action, line 106]
>         13552 : 105 x [pvar.c: pv_parse_format, line 4119]
>           272 : 17 x [cfg.y: new_string, line 2703]
>           496 : 12 x [mod_fix.c: fixup_spve, line 948]
>            16 : 1 x [siptrace.c: sip_trace_fixup, line 1279]
>           728 : 1 x [siptrace.c: parse_siptrace_id, line 525]
>            24 : 1 x [compression_helpers.c: search_hdr, line 122]
>           160 : 10 x [cfg.y: yyparse, line 999]
>           864 : 18 x [evi/evi_params.c: evi_param_create, line 42]
>           248 : 10 x [mem/shm_mem.c: init_shm_post_yyparse, line 516]
>       6291456 : 1 x [io_wait.c: init_io_wait, line 559]
>           128 : 1 x [compression.c: set_wh_param, line 420]
>            96 : 2 x [script_var.c: add_var, line 52]
>            48 : 2 x [sl_cb.c: register_slcb, line 61]
>           368 : 9 x [sipmsgops.c: fixup_method, line 886]
>             8 : 1 x [acc_logic.c: do_acc_fixup, line 1221]
>            64 : 4 x [socket_info.c: fix_socket_list, line 641]
>            48 : 1 x [sr_module_deps.c: alloc_module_dep, line 54]
>             8 : 1 x [compression_helpers.c: parse_whitelist, line 167]
>           704 : 6 x [route.c: fix_expr, line 189]
>            48 : 1 x [ipc.c: ipc_register_handler, line 120]
>            80 : 1 x [mi/mi_trace.c: try_load_trace_api, line 55]
>            32 : 1 x [transformations.c: tr_parse_nparam, line 2490]
>          1600 : 20 x [pvar.c: pv_add_extra, line 4690]
>             8 : 1 x [compression.c: build_hdr_masks, line 242]
>          1832 : 15 x [route.c: fix_actions, line 750]
>           144 : 16 x [map.c: map_get, line 150]
>            32 : 1 x [map.c: map_create, line 79]
>       3145728 : 1 x [io_wait.c: init_io_wait, line 621]
>            32 : 1 x [transformations.c: tr_parse_nparam, line 2514]
>          8192 : 1 x [mi_writer.c: mi_writer_init, line 49]
>            48 : 1 x [dlg_vals.c: fetch_dlg_value, line 180]
>            40 : 1 x [flags.c: get_flag_id_by_name, line 202]
>            64 : 4 x [cfg.y: yyparse, line 514]
>           192 : 4 x [cfg.y: mk_listen_id, line 2685]
>            16 : 1 x [acc_logic.c: do_acc_fixup, line 1184]
>           240 : 6 x [acc_extra.c: add_extra, line 155]
>            64 : 4 x [socket_info.c: new_sock_info, line 125]
>           960 : 4 x [socket_info.c: new_sock_info, line 119]
>          8320 : 80 x [cfg.y: yyparse, line 1456]
>            24 : 1 x [dlg_hash.c: state_changed_event_init, line 863]
>           144 : 3 x [transformations.c: parse_transformation, line 2376]
>           304 : 5 x [script_cb.c: add_callback, line 60]
>          3744 : 1 x [mi/mi.c: register_mi_cmd, line 146]
>           864 : 18 x [sr_module.c: register_module, line 150]
>           104 : 4 x [socket_info.c: fix_socket_list, line 720]
>          6720 : 120 x [route_struct.c: mk_elem, line 70]
>            96 : 4 x [cfg.y: yyparse, line 1175]
>            56 : 7 x [cfg.y: yyparse, line 1236]
>            24 : 1 x [acc.c: init_acc_evi, line 1094]
>           768 : 7 x [mod_fix.c: fixup_regexp, line 310]
>           208 : 2 x [context.c: register_context_destroy, line 66]
>            32 : 4 x [socket_info.c: fix_socket_list, line 590]
>            96 : 1 x [context.c: context_alloc, line 119]
>             8 : 1 x [pvar.c: pv_init_extra_list, line 4626]
>           104 : 1 x [transformations.c: tr_parse_nparam, line 2476]
>           160 : 1 x [acc_extra.c: add_tag, line 111]
>          4104 : 1 x [xlog.c: buf_init, line 69]
>          1232 : 1 x [compression_helpers.c: wrap_realloc, line 444]
>            16 : 1 x [io_wait.c: init_io_wait, line 576]
>          4368 : 78 x [route_struct.c: mk_exp, line 54]
>           896 : 16 x [map.c: map_get, line 139]
>            80 : 2 x [pvar.c: new_pv_context, line 4765]
>   ----------------------------------------------------
>    dumping free list stats :
>   hash= 380. fragments no.:     1, unused:     0#012#011#011 bucket
> size:      3040 -      3040 (first      3040)
>   hash= 462. fragments no.:     1, unused:     0#012#011#011 bucket
> size:      3696 -      3696 (first      3696)
>   hash= 492. fragments no.:     1, unused:     0#012#011#011 bucket
> size:      3936 -      3936 (first      3936)
>   hash= 1655. fragments no.:     1, unused:     0#012#011#011 bucket
> size:     13240 -     13240 (first     13240)
>   hash= 2061. fragments no.:     1, unused:     0#012#011#011 bucket
> size:  67108864 - 134217728 (first 122150688)
>   -----------------------------
> 
> On Tue, Oct 2, 2018 at 8:19 AM, Jock McKechnie <jock.mckechnie at gmail.com> wrote:
>> Good morning Răzvan,
>>
>> I completely understand being busy, I was confident you weren't just
>> ignoring me :)
>>
>> I will build the latest from git with the debugging bits and pieces
>> and report back after testing.
>>
>> I had been running the latest 2.4.2 off yum.opensips.org when I
>> originally identified the problem, but have since hand compiled it to
>> enable all the debugging widgets:
>> version: opensips 2.4.2 (x86_64/linux)
>> flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
>> PKG_MALLOC, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
>> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
>> MAX_URI_SIZE 1024, BUF_SIZE 65535
>> poll method support: poll, epoll, sigio_rt, select.
>> git revision: 66114a66c
>> main.c compiled on 07:56:12 Sep 20 2018 with gcc 4.8.5
>>
>> I'll get back to you soon;
>>
>>   - Jock
>>
>> On Tue, Oct 2, 2018 at 4:00 AM, Răzvan Crainea <razvan at opensips.org> wrote:
>>> Hi, Jock!
>>>
>>> Can you confirm exactly the opensips version you are running? (opensips -V).
>>>
>>> Best regards,
>>> Razvan
>>>
>>>
>>> On 10/2/18 11:41 AM, Răzvan Crainea wrote:
>>>>
>>>> Hi, Jock!
>>>>
>>>> Good news :) I've just figured out the pkg memleak in mc_compact, and I've
>>>> already pushed a fix upstream. Please pull the latest version and run a few
>>>> tests, and let us know how they go.
>>>> In the meantime, I will investigate the shm leak.
>>>>
>>>> Best regards,
>>>> Razvan
>>>>
>>>> On 10/2/18 10:13 AM, Răzvan Crainea wrote:
>>>>>
>>>>> Hi, Jock!
>>>>>
>>>>> I am pretty busy these days, so I couldn't pay too much attention to this
>>>>> thread. Nevertheless, I will try to prioritize it this week and get back to
>>>>> you.
>>>>> I have also tracked down the pkg mem leak to the mc_compact() module, but
>>>>> I can't pinpoint for now exactly the issue. Although I have some
>>>>> assumptions, I can't be sure until I run some tests.
>>>>> If I need extra info during my debug, I will definitely let you know, so
>>>>> keep an eye on this thread.
>>>>>
>>>>> Best regards,
>>>>> Răzvan
>>>>>
>>>>> On 10/1/18 9:14 PM, Jock McKechnie wrote:
>>>>>>
>>>>>> Greetings;
>>>>>>
>>>>>> I should have done this last week but, to be honest, it just didn't
>>>>>> occur to me. I pulled the compression module out (and commented out
>>>>>> mc_compact()) and re-ran my tests and the memory usage was massively
>>>>>> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
>>>>>> few minutes of calling ceasing almost all of the memory was released
>>>>>> (went from 42MB down to 3.8MB within 6 minutes).
>>>>>>
>>>>>> So it appears that there's something leaking in the compression module.
>>>>>>
>>>>>> I'm going to poke around a few more tests (include compression.so
>>>>>> without actually calling it, etc) but I wanted to pass this on as
>>>>>> there appears to be a bug that needs addressing.
>>>>>>
>>>>>> Is there anything I can provide that might be useful in debugging this?
>>>>>>
>>>>>> As always, thank you for your help.
>>>>>>
>>>>>>    - Jock
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
>>>>>> <jock.mckechnie at gmail.com> wrote:
>>>>>>>
>>>>>>> Thank you very much, Răzvan;
>>>>>>>
>>>>>>> We are using the compression module for mc_compact(). We use SIP
>>>>>>> headers for passing around data (don't ask) and the compact headers
>>>>>>> are mandatory or else we hit MTU/fragmentation problems. We're on the
>>>>>>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>>>>>>> The mc_compact() lives in the main route[1] block and is simply this,
>>>>>>> in whole:
>>>>>>>       mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>>>>>>
>>>>>>> And my apologies for not including the shm dump segment - please find
>>>>>>> it below.
>>>>>>> https://pastebin.com/itSTER7Y
>>>>>>>
>>>>>>> Let me know what else I can provide, this is fairly easy to replicate
>>>>>>> (although it takes about an hour of runtime pummeling it to get nice
>>>>>>> easy to read dumps). Thank you, again, very much.
>>>>>>>
>>>>>>>    - Jock
>>>>>>>
>>>>>>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi, Jock!
>>>>>>>>
>>>>>>>> I think you are actually heating two memory leaks: one in private
>>>>>>>> memory and
>>>>>>>> one in shared memory.
>>>>>>>> In the memory dump you sent I can spot a pkg mem related to (I
>>>>>>>> believe) the
>>>>>>>> compression module. Can you tell me what functions of the compression
>>>>>>>> module
>>>>>>>> you are using and what flags are you using for them?
>>>>>>>>
>>>>>>>> Besides that, you claim there is a shared memory leak as well. Can you
>>>>>>>> send
>>>>>>>> us the logs for the shared memory? They are dumped by the first
>>>>>>>> process
>>>>>>>> (attendant) and should start with the string "Memory status (shm):".
>>>>>>>> Can you
>>>>>>>> post those logs as well?
>>>>>>>>
>>>>>>>> Best regards,
>>>>>>>> Razvan
>>>>>>>>
>>>>>>>>
>>>>>>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Greetings OpenSIPS gurus;
>>>>>>>>>
>>>>>>>>> I've run into an issue that I'm not making any real headway with on
>>>>>>>>> my
>>>>>>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not appear
>>>>>>>>> to be releasing the bulk of its shared memory when calls die off.
>>>>>>>>> Eventually OpenSIPS runs itself out of shared memory and goes beserk
>>>>>>>>> when it can't reclaim memory for new transactions, or I've also seen
>>>>>>>>> the kernel oomkiller take it down.
>>>>>>>>>
>>>>>>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl living
>>>>>>>>> on its back that handles some API directed call decisions. Because of
>>>>>>>>> the inevitable delay in the API lookups I am running a large number
>>>>>>>>> of
>>>>>>>>> threads (100) to ensure that I don't start dropping packets under
>>>>>>>>> high
>>>>>>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>>>>>>> seeing packets being ignored which got the End User very excitable).
>>>>>>>>> I
>>>>>>>>> appreciate having lots of children amplifies memory usage but my
>>>>>>>>> belief is during regular load on our side (~6000 calls at ~30 -
>>>>>>>>> 40CPS)
>>>>>>>>> I should have a full day's runtime easily w/o running into problems.
>>>>>>>>> The issue is that because the memory is not released overnight it
>>>>>>>>> starts the next day with half its shared memory gone, and then things
>>>>>>>>> go to pot the next day when it runs through all its remaining shared
>>>>>>>>> memory.
>>>>>>>>>
>>>>>>>>> I have it configured for a very large (I think) amount of memory to
>>>>>>>>> attempt to compensate short-term - 2048MB shared memory, 128MB
>>>>>>>>> package
>>>>>>>>> memory. This is not a well thought-out number, just me throwing RAM
>>>>>>>>> at
>>>>>>>>> the problem. (For reference this is on a VM running with 4 cores and
>>>>>>>>> 4096MB of RAM)
>>>>>>>>>
>>>>>>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>>>>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>>>>>>> https://imgur.com/a/oapKYJW
>>>>>>>>>
>>>>>>>>> You'll see that the first graph, the main shmem:, shows the usage
>>>>>>>>> gets
>>>>>>>>> tied up and remains after calls die off (these examples run for 20
>>>>>>>>> minutes after the calls stopped - but I can leave it overnight and
>>>>>>>>> the
>>>>>>>>> shmem: usage won't drop). This being said, the other graphed modules
>>>>>>>>> show that it releases memory correctly after calls quit like they
>>>>>>>>> should. None of the other modules graphed (eg, _all_ of the modules I
>>>>>>>>> was using in OpenSIPS) showed no significant usage and I've omitted
>>>>>>>>> them.
>>>>>>>>>
>>>>>>>>> Interesting that the 'perl' module shows no stats ever which makes me
>>>>>>>>> wonder if the stats on this module works. I am not storing anything
>>>>>>>>> inside perl, besides lots of local session variables, and all channel
>>>>>>>>> variables are pushed back into OpenSIPS AVPs, but I assume there must
>>>>>>>>> be some memory overhead, although it may not significantly change.
>>>>>>>>>
>>>>>>>>> When I shut OpenSIPS down I get child dumps like the following: (I've
>>>>>>>>> included only one for brevity, but they're all much of a muchness)
>>>>>>>>> https://pastebin.com/KG2qxxBa
>>>>>>>>>
>>>>>>>>> If I'm reading this right (and there's a strong possibility I'm
>>>>>>>>> misunderstanding what I'm looking at) the memory is tied up in
>>>>>>>>> io_wait
>>>>>>>>> and the parser. I'm not sure what this signifies, however.
>>>>>>>>>
>>>>>>>>> So I guess what I'm asking is:
>>>>>>>>> Am I understanding what I'm looking at correctly and that OpenSIPS is
>>>>>>>>> not releasing its shared memory which causes it to run out of memory
>>>>>>>>> quicker the following day? (Which lines up with the symptoms of
>>>>>>>>> shorter runtimes before failure)
>>>>>>>>>
>>>>>>>>> Based on the dumps where should I look for the source of this issue?
>>>>>>>>>
>>>>>>>>> Finally, and unrelated to the above, I attempted to mail the official
>>>>>>>>> OpenSIPS peeps to talk about paid support but none of the 'contact
>>>>>>>>> us'
>>>>>>>>> submits appear to work - when I click on the submit button it changes
>>>>>>>>> shade but does not appear to act. Certainly I never received any
>>>>>>>>> response. If there was a direct eMail address I could send to that
>>>>>>>>> would be preferable.
>>>>>>>>>
>>>>>>>>> As always, my thanks for your time.
>>>>>>>>>
>>>>>>>>>     - Jock
>>>>>>>>>
>>>>>>>>> PS -
>>>>>>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>>>>>>> loadmodule "sl.so"
>>>>>>>>> loadmodule "tm.so"
>>>>>>>>> loadmodule "rr.so"
>>>>>>>>> loadmodule "maxfwd.so"
>>>>>>>>> loadmodule "textops.so"
>>>>>>>>> loadmodule "uri.so"
>>>>>>>>> loadmodule "mi_fifo.so"
>>>>>>>>> loadmodule "siptrace.so"
>>>>>>>>> loadmodule "sipmsgops.so"
>>>>>>>>> loadmodule "dialog.so"
>>>>>>>>> loadmodule "uac_auth.so"
>>>>>>>>> loadmodule "uac.so"
>>>>>>>>> loadmodule "perl.so"
>>>>>>>>> loadmodule "acc.so"
>>>>>>>>> loadmodule "proto_hep.so"
>>>>>>>>> loadmodule "proto_udp.so"
>>>>>>>>> loadmodule "compression.so"
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Users mailing list
>>>>>>>>> Users at lists.opensips.org
>>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Răzvan Crainea
>>>>>>>> OpenSIPS Core Developer
>>>>>>>>     http://www.opensips-solutions.com
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opensips.org
>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Users mailing list
>>>>>> Users at lists.opensips.org
>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>
>>>>>
>>>>
>>>
>>> --
>>> Răzvan Crainea
>>> OpenSIPS Core Developer
>>>    http://www.opensips-solutions.com
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users at lists.opensips.org
> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
> 

-- 
Răzvan Crainea
OpenSIPS Core Developer
   http://www.opensips-solutions.com



More information about the Users mailing list