[OpenSIPS-Users] Chasing down unreleased memory
Jock McKechnie
jock.mckechnie at gmail.com
Tue Oct 2 11:07:15 EDT 2018
Here's the package dump from the test I ran - I'm not able to identify
anything "Better" in this output, but I'll leave it up to a trained
eye.
I git pulled opensips-2.4 and built that with all of the memory
debugging included and then ran the same tests I ran yesterday.
Hopefully this is the right release, if not, let me know.
- Jock
Memory status (pkg):
qm_status (0x7f4cf82d7010):
heap size= 134217728
used= 11682552, used+overhead=12043128, free=122174600
max used (+overhead)= 12067040
INFO:core:sig_usr: signal 15 received
Memory status (pkg):
qm_status (0x7f4cf82d7010):
heap size= 134217728
used= 11682552, used+overhead=12043128, free=122174600
max used (+overhead)= 12067040
dumping summary of all alloc'ed. fragments:
----------------------------------------------------
total_bytes | num_allocations x [file: func, line]
----------------------------------------------------
8 : 1 x [compression.c: build_hdr_masks, line 229]
128 : 1 x [compression_helpers.c: parse_whitelist, line 160]
16 : 1 x [ut.h: pkg_str_extend, line 841]
16 : 2 x [script_var.c: add_var, line 59]
2097152 : 1 x [io_wait.c: init_io_wait, line 568]
45056 : 351 x [cfg.lex: addstr, line 890]
64 : 2 x [rr_cb.c: register_rrcb, line 57]
39000 : 195 x [route_struct.c: mk_action, line 106]
13552 : 105 x [pvar.c: pv_parse_format, line 4119]
272 : 17 x [cfg.y: new_string, line 2703]
496 : 12 x [mod_fix.c: fixup_spve, line 948]
16 : 1 x [siptrace.c: sip_trace_fixup, line 1279]
728 : 1 x [siptrace.c: parse_siptrace_id, line 525]
24 : 1 x [compression_helpers.c: search_hdr, line 122]
160 : 10 x [cfg.y: yyparse, line 999]
864 : 18 x [evi/evi_params.c: evi_param_create, line 42]
248 : 10 x [mem/shm_mem.c: init_shm_post_yyparse, line 516]
6291456 : 1 x [io_wait.c: init_io_wait, line 559]
128 : 1 x [compression.c: set_wh_param, line 420]
96 : 2 x [script_var.c: add_var, line 52]
48 : 2 x [sl_cb.c: register_slcb, line 61]
368 : 9 x [sipmsgops.c: fixup_method, line 886]
8 : 1 x [acc_logic.c: do_acc_fixup, line 1221]
64 : 4 x [socket_info.c: fix_socket_list, line 641]
48 : 1 x [sr_module_deps.c: alloc_module_dep, line 54]
8 : 1 x [compression_helpers.c: parse_whitelist, line 167]
704 : 6 x [route.c: fix_expr, line 189]
48 : 1 x [ipc.c: ipc_register_handler, line 120]
80 : 1 x [mi/mi_trace.c: try_load_trace_api, line 55]
32 : 1 x [transformations.c: tr_parse_nparam, line 2490]
1600 : 20 x [pvar.c: pv_add_extra, line 4690]
8 : 1 x [compression.c: build_hdr_masks, line 242]
1832 : 15 x [route.c: fix_actions, line 750]
144 : 16 x [map.c: map_get, line 150]
32 : 1 x [map.c: map_create, line 79]
3145728 : 1 x [io_wait.c: init_io_wait, line 621]
32 : 1 x [transformations.c: tr_parse_nparam, line 2514]
8192 : 1 x [mi_writer.c: mi_writer_init, line 49]
48 : 1 x [dlg_vals.c: fetch_dlg_value, line 180]
40 : 1 x [flags.c: get_flag_id_by_name, line 202]
64 : 4 x [cfg.y: yyparse, line 514]
192 : 4 x [cfg.y: mk_listen_id, line 2685]
16 : 1 x [acc_logic.c: do_acc_fixup, line 1184]
240 : 6 x [acc_extra.c: add_extra, line 155]
64 : 4 x [socket_info.c: new_sock_info, line 125]
960 : 4 x [socket_info.c: new_sock_info, line 119]
8320 : 80 x [cfg.y: yyparse, line 1456]
24 : 1 x [dlg_hash.c: state_changed_event_init, line 863]
144 : 3 x [transformations.c: parse_transformation, line 2376]
304 : 5 x [script_cb.c: add_callback, line 60]
3744 : 1 x [mi/mi.c: register_mi_cmd, line 146]
864 : 18 x [sr_module.c: register_module, line 150]
104 : 4 x [socket_info.c: fix_socket_list, line 720]
6720 : 120 x [route_struct.c: mk_elem, line 70]
96 : 4 x [cfg.y: yyparse, line 1175]
56 : 7 x [cfg.y: yyparse, line 1236]
24 : 1 x [acc.c: init_acc_evi, line 1094]
768 : 7 x [mod_fix.c: fixup_regexp, line 310]
208 : 2 x [context.c: register_context_destroy, line 66]
32 : 4 x [socket_info.c: fix_socket_list, line 590]
96 : 1 x [context.c: context_alloc, line 119]
8 : 1 x [pvar.c: pv_init_extra_list, line 4626]
104 : 1 x [transformations.c: tr_parse_nparam, line 2476]
160 : 1 x [acc_extra.c: add_tag, line 111]
4104 : 1 x [xlog.c: buf_init, line 69]
1232 : 1 x [compression_helpers.c: wrap_realloc, line 444]
16 : 1 x [io_wait.c: init_io_wait, line 576]
4368 : 78 x [route_struct.c: mk_exp, line 54]
896 : 16 x [map.c: map_get, line 139]
80 : 2 x [pvar.c: new_pv_context, line 4765]
----------------------------------------------------
dumping free list stats :
hash= 380. fragments no.: 1, unused: 0#012#011#011 bucket
size: 3040 - 3040 (first 3040)
hash= 462. fragments no.: 1, unused: 0#012#011#011 bucket
size: 3696 - 3696 (first 3696)
hash= 492. fragments no.: 1, unused: 0#012#011#011 bucket
size: 3936 - 3936 (first 3936)
hash= 1655. fragments no.: 1, unused: 0#012#011#011 bucket
size: 13240 - 13240 (first 13240)
hash= 2061. fragments no.: 1, unused: 0#012#011#011 bucket
size: 67108864 - 134217728 (first 122150688)
-----------------------------
On Tue, Oct 2, 2018 at 8:19 AM, Jock McKechnie <jock.mckechnie at gmail.com> wrote:
> Good morning Răzvan,
>
> I completely understand being busy, I was confident you weren't just
> ignoring me :)
>
> I will build the latest from git with the debugging bits and pieces
> and report back after testing.
>
> I had been running the latest 2.4.2 off yum.opensips.org when I
> originally identified the problem, but have since hand compiled it to
> enable all the debugging widgets:
> version: opensips 2.4.2 (x86_64/linux)
> flags: STATS: On, SHM_EXTRA_STATS, DISABLE_NAGLE, USE_MCAST, SHM_MMAP,
> PKG_MALLOC, QM_MALLOC, DBG_MALLOC, FAST_LOCK-ADAPTIVE_WAIT
> ADAPTIVE_WAIT_LOOPS=1024, MAX_RECV_BUFFER_SIZE 262144, MAX_LISTEN 16,
> MAX_URI_SIZE 1024, BUF_SIZE 65535
> poll method support: poll, epoll, sigio_rt, select.
> git revision: 66114a66c
> main.c compiled on 07:56:12 Sep 20 2018 with gcc 4.8.5
>
> I'll get back to you soon;
>
> - Jock
>
> On Tue, Oct 2, 2018 at 4:00 AM, Răzvan Crainea <razvan at opensips.org> wrote:
>> Hi, Jock!
>>
>> Can you confirm exactly the opensips version you are running? (opensips -V).
>>
>> Best regards,
>> Razvan
>>
>>
>> On 10/2/18 11:41 AM, Răzvan Crainea wrote:
>>>
>>> Hi, Jock!
>>>
>>> Good news :) I've just figured out the pkg memleak in mc_compact, and I've
>>> already pushed a fix upstream. Please pull the latest version and run a few
>>> tests, and let us know how they go.
>>> In the meantime, I will investigate the shm leak.
>>>
>>> Best regards,
>>> Razvan
>>>
>>> On 10/2/18 10:13 AM, Răzvan Crainea wrote:
>>>>
>>>> Hi, Jock!
>>>>
>>>> I am pretty busy these days, so I couldn't pay too much attention to this
>>>> thread. Nevertheless, I will try to prioritize it this week and get back to
>>>> you.
>>>> I have also tracked down the pkg mem leak to the mc_compact() module, but
>>>> I can't pinpoint for now exactly the issue. Although I have some
>>>> assumptions, I can't be sure until I run some tests.
>>>> If I need extra info during my debug, I will definitely let you know, so
>>>> keep an eye on this thread.
>>>>
>>>> Best regards,
>>>> Răzvan
>>>>
>>>> On 10/1/18 9:14 PM, Jock McKechnie wrote:
>>>>>
>>>>> Greetings;
>>>>>
>>>>> I should have done this last week but, to be honest, it just didn't
>>>>> occur to me. I pulled the compression module out (and commented out
>>>>> mc_compact()) and re-ran my tests and the memory usage was massively
>>>>> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
>>>>> few minutes of calling ceasing almost all of the memory was released
>>>>> (went from 42MB down to 3.8MB within 6 minutes).
>>>>>
>>>>> So it appears that there's something leaking in the compression module.
>>>>>
>>>>> I'm going to poke around a few more tests (include compression.so
>>>>> without actually calling it, etc) but I wanted to pass this on as
>>>>> there appears to be a bug that needs addressing.
>>>>>
>>>>> Is there anything I can provide that might be useful in debugging this?
>>>>>
>>>>> As always, thank you for your help.
>>>>>
>>>>> - Jock
>>>>>
>>>>> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
>>>>> <jock.mckechnie at gmail.com> wrote:
>>>>>>
>>>>>> Thank you very much, Răzvan;
>>>>>>
>>>>>> We are using the compression module for mc_compact(). We use SIP
>>>>>> headers for passing around data (don't ask) and the compact headers
>>>>>> are mandatory or else we hit MTU/fragmentation problems. We're on the
>>>>>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>>>>>> The mc_compact() lives in the main route[1] block and is simply this,
>>>>>> in whole:
>>>>>> mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>>>>>
>>>>>> And my apologies for not including the shm dump segment - please find
>>>>>> it below.
>>>>>> https://pastebin.com/itSTER7Y
>>>>>>
>>>>>> Let me know what else I can provide, this is fairly easy to replicate
>>>>>> (although it takes about an hour of runtime pummeling it to get nice
>>>>>> easy to read dumps). Thank you, again, very much.
>>>>>>
>>>>>> - Jock
>>>>>>
>>>>>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org>
>>>>>> wrote:
>>>>>>>
>>>>>>> Hi, Jock!
>>>>>>>
>>>>>>> I think you are actually heating two memory leaks: one in private
>>>>>>> memory and
>>>>>>> one in shared memory.
>>>>>>> In the memory dump you sent I can spot a pkg mem related to (I
>>>>>>> believe) the
>>>>>>> compression module. Can you tell me what functions of the compression
>>>>>>> module
>>>>>>> you are using and what flags are you using for them?
>>>>>>>
>>>>>>> Besides that, you claim there is a shared memory leak as well. Can you
>>>>>>> send
>>>>>>> us the logs for the shared memory? They are dumped by the first
>>>>>>> process
>>>>>>> (attendant) and should start with the string "Memory status (shm):".
>>>>>>> Can you
>>>>>>> post those logs as well?
>>>>>>>
>>>>>>> Best regards,
>>>>>>> Razvan
>>>>>>>
>>>>>>>
>>>>>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> Greetings OpenSIPS gurus;
>>>>>>>>
>>>>>>>> I've run into an issue that I'm not making any real headway with on
>>>>>>>> my
>>>>>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not appear
>>>>>>>> to be releasing the bulk of its shared memory when calls die off.
>>>>>>>> Eventually OpenSIPS runs itself out of shared memory and goes beserk
>>>>>>>> when it can't reclaim memory for new transactions, or I've also seen
>>>>>>>> the kernel oomkiller take it down.
>>>>>>>>
>>>>>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl living
>>>>>>>> on its back that handles some API directed call decisions. Because of
>>>>>>>> the inevitable delay in the API lookups I am running a large number
>>>>>>>> of
>>>>>>>> threads (100) to ensure that I don't start dropping packets under
>>>>>>>> high
>>>>>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>>>>>> seeing packets being ignored which got the End User very excitable).
>>>>>>>> I
>>>>>>>> appreciate having lots of children amplifies memory usage but my
>>>>>>>> belief is during regular load on our side (~6000 calls at ~30 -
>>>>>>>> 40CPS)
>>>>>>>> I should have a full day's runtime easily w/o running into problems.
>>>>>>>> The issue is that because the memory is not released overnight it
>>>>>>>> starts the next day with half its shared memory gone, and then things
>>>>>>>> go to pot the next day when it runs through all its remaining shared
>>>>>>>> memory.
>>>>>>>>
>>>>>>>> I have it configured for a very large (I think) amount of memory to
>>>>>>>> attempt to compensate short-term - 2048MB shared memory, 128MB
>>>>>>>> package
>>>>>>>> memory. This is not a well thought-out number, just me throwing RAM
>>>>>>>> at
>>>>>>>> the problem. (For reference this is on a VM running with 4 cores and
>>>>>>>> 4096MB of RAM)
>>>>>>>>
>>>>>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>>>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>>>>>> https://imgur.com/a/oapKYJW
>>>>>>>>
>>>>>>>> You'll see that the first graph, the main shmem:, shows the usage
>>>>>>>> gets
>>>>>>>> tied up and remains after calls die off (these examples run for 20
>>>>>>>> minutes after the calls stopped - but I can leave it overnight and
>>>>>>>> the
>>>>>>>> shmem: usage won't drop). This being said, the other graphed modules
>>>>>>>> show that it releases memory correctly after calls quit like they
>>>>>>>> should. None of the other modules graphed (eg, _all_ of the modules I
>>>>>>>> was using in OpenSIPS) showed no significant usage and I've omitted
>>>>>>>> them.
>>>>>>>>
>>>>>>>> Interesting that the 'perl' module shows no stats ever which makes me
>>>>>>>> wonder if the stats on this module works. I am not storing anything
>>>>>>>> inside perl, besides lots of local session variables, and all channel
>>>>>>>> variables are pushed back into OpenSIPS AVPs, but I assume there must
>>>>>>>> be some memory overhead, although it may not significantly change.
>>>>>>>>
>>>>>>>> When I shut OpenSIPS down I get child dumps like the following: (I've
>>>>>>>> included only one for brevity, but they're all much of a muchness)
>>>>>>>> https://pastebin.com/KG2qxxBa
>>>>>>>>
>>>>>>>> If I'm reading this right (and there's a strong possibility I'm
>>>>>>>> misunderstanding what I'm looking at) the memory is tied up in
>>>>>>>> io_wait
>>>>>>>> and the parser. I'm not sure what this signifies, however.
>>>>>>>>
>>>>>>>> So I guess what I'm asking is:
>>>>>>>> Am I understanding what I'm looking at correctly and that OpenSIPS is
>>>>>>>> not releasing its shared memory which causes it to run out of memory
>>>>>>>> quicker the following day? (Which lines up with the symptoms of
>>>>>>>> shorter runtimes before failure)
>>>>>>>>
>>>>>>>> Based on the dumps where should I look for the source of this issue?
>>>>>>>>
>>>>>>>> Finally, and unrelated to the above, I attempted to mail the official
>>>>>>>> OpenSIPS peeps to talk about paid support but none of the 'contact
>>>>>>>> us'
>>>>>>>> submits appear to work - when I click on the submit button it changes
>>>>>>>> shade but does not appear to act. Certainly I never received any
>>>>>>>> response. If there was a direct eMail address I could send to that
>>>>>>>> would be preferable.
>>>>>>>>
>>>>>>>> As always, my thanks for your time.
>>>>>>>>
>>>>>>>> - Jock
>>>>>>>>
>>>>>>>> PS -
>>>>>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>>>>>> loadmodule "sl.so"
>>>>>>>> loadmodule "tm.so"
>>>>>>>> loadmodule "rr.so"
>>>>>>>> loadmodule "maxfwd.so"
>>>>>>>> loadmodule "textops.so"
>>>>>>>> loadmodule "uri.so"
>>>>>>>> loadmodule "mi_fifo.so"
>>>>>>>> loadmodule "siptrace.so"
>>>>>>>> loadmodule "sipmsgops.so"
>>>>>>>> loadmodule "dialog.so"
>>>>>>>> loadmodule "uac_auth.so"
>>>>>>>> loadmodule "uac.so"
>>>>>>>> loadmodule "perl.so"
>>>>>>>> loadmodule "acc.so"
>>>>>>>> loadmodule "proto_hep.so"
>>>>>>>> loadmodule "proto_udp.so"
>>>>>>>> loadmodule "compression.so"
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Users mailing list
>>>>>>>> Users at lists.opensips.org
>>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Răzvan Crainea
>>>>>>> OpenSIPS Core Developer
>>>>>>> http://www.opensips-solutions.com
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Users mailing list
>>>>>>> Users at lists.opensips.org
>>>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> Users mailing list
>>>>> Users at lists.opensips.org
>>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>>
>>>>
>>>
>>
>> --
>> Răzvan Crainea
>> OpenSIPS Core Developer
>> http://www.opensips-solutions.com
>>
>> _______________________________________________
>> Users mailing list
>> Users at lists.opensips.org
>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
More information about the Users
mailing list