[OpenSIPS-Users] Chasing down unreleased memory

Jock McKechnie jock.mckechnie at gmail.com
Mon Oct 1 16:31:56 EDT 2018


To complete the thought process in the previous eMail,

I have now tested three times-
Restart OpenSIPS, a sharp ramp at 50 calls per second to 5500
channels, hold for ~20 minutes, stop the calls, let the existing calls
drop and then stop OpenSIPS after 6 minutes.

With:
1) compression.so removed
2) compression.so included, but mc_compact() commented out
3) compression.so included, mc_compact() being called on all calls.

Usage is consistent on 1 and 2 - clearly including mc_compact() causes
all sorts of things to go astray in memory. What's got me puzzled is
that while the shmem shows the drastic memory usage (peak at 124MB vs
22MB at the same call volume) none of the groups I have set up - which
should include pretty much every module - show a corresponding spike
in memory usage. Wherever the memory is going it's not being tracked
by the statistics debug feature and placed in one of the groups.

Based on the memory dump it looks like everything is in the shared
memory. There's a very, very significant difference between the two
dumps, ala:

With compression loaded and mc_compact() called:
-----------------------------
Memory status (shm):
qm_status (0x7f347b9a7000):
 heap size= 2147483648
 used= 80347088, used+overhead=87638120, free=2059845528
 max used (+overhead)= 127204952
 dumping summary of all alloc'ed. fragments:
----------------------------------------------------
total_bytes | num_allocations x [file: func, line]
----------------------------------------------------
       2376 : 1 x [mem/module_info.c: alloc_group_stat, line 156]
       5200 : 321 x [statistics.c: build_stat_name, line 122]
         32 : 1 x [dlg_timer.c: init_dlg_reinvite_ping_timer, line 185]
        312 : 39 x [mi/mi.c: register_mi_cmd, line 174]
         80 : 1 x [siptrace.c: mod_init, line 798]
        320 : 36 x [mem/module_info.c: init_new_stat, line 82]
         80 : 10 x [map.c: map_get, line 150]
         32 : 1 x [map.c: map_create, line 79]
       5184 : 1 x [core_stats.c: init_pkg_stats, line 173]
          8 : 1 x [usr_avp.c: init_extra_avps, line 83]
         72 : 1 x [hep.c: parse_hep_id, line 693]
       1144 : 36 x [statistics.c: register_stat2, line 433]
          8 : 1 x [mem/shm_mem.c: shm_mem_init_mallocs, line 390]
         56 : 7 x [evi/event_interface.c: evi_publish_event, line 75]
        544 : 6 x [timer.c: new_os_timer, line 146]
   80329592 : 50214 x [msg_translator.c: build_req_buf_from_sip_req, line 2287]
          8 : 1 x [usr_avp.c: init_extra_avps, line 74]
        864 : 1 x [core_stats.c: init_pkg_stats, line 174]
          8 : 1 x [timer.c: init_timer, line 83]
        400 : 1 x [evi/event_interface.c: evi_publish_event, line 61]
        184 : 9 x [mem/module_info.c: alloc_group_stat, line 177]
          8 : 1 x [daemonize.c: set_osips_state, line 576]
          8 : 1 x [dlg_timer.c: init_dlg_reinvite_ping_timer, line 192]
        560 : 10 x [map.c: map_get, line 139]
          8 : 1 x [daemonize.c: create_status_pipe, line 92]
----------------------------------------------------


Without compression:
-----------------------------
Memory status (shm):
qm_status (0x7fd92c74b000):
 heap size= 2147483648
 used= 17496, used+overhead=339592, free=2147144056
 max used (+overhead)= 47411616
 dumping summary of all alloc'ed. fragments:
----------------------------------------------------
total_bytes | num_allocations x [file: func, line]
----------------------------------------------------
       2376 : 1 x [mem/module_info.c: alloc_group_stat, line 156]
       5200 : 321 x [statistics.c: build_stat_name, line 122]
         32 : 1 x [dlg_timer.c: init_dlg_reinvite_ping_timer, line 185]
        312 : 39 x [mi/mi.c: register_mi_cmd, line 174]
         80 : 1 x [siptrace.c: mod_init, line 798]
        320 : 36 x [mem/module_info.c: init_new_stat, line 82]
         80 : 10 x [map.c: map_get, line 150]
         32 : 1 x [map.c: map_create, line 79]
       5184 : 1 x [core_stats.c: init_pkg_stats, line 173]
          8 : 1 x [usr_avp.c: init_extra_avps, line 83]
         72 : 1 x [hep.c: parse_hep_id, line 693]
       1144 : 36 x [statistics.c: register_stat2, line 433]
          8 : 1 x [mem/shm_mem.c: shm_mem_init_mallocs, line 390]
         56 : 7 x [evi/event_interface.c: evi_publish_event, line 75]
        544 : 6 x [timer.c: new_os_timer, line 146]
          8 : 1 x [usr_avp.c: init_extra_avps, line 74]
        864 : 1 x [core_stats.c: init_pkg_stats, line 174]
          8 : 1 x [timer.c: init_timer, line 83]
        400 : 1 x [evi/event_interface.c: evi_publish_event, line 61]
        184 : 9 x [mem/module_info.c: alloc_group_stat, line 177]
          8 : 1 x [daemonize.c: set_osips_state, line 576]
          8 : 1 x [dlg_timer.c: init_dlg_reinvite_ping_timer, line 192]
        560 : 10 x [map.c: map_get, line 139]
          8 : 1 x [daemonize.c: create_status_pipe, line 92]
----------------------------------------------------


The obvious thing that stands out:
 80329592 : 50214 x [msg_translator.c: build_req_buf_from_sip_req, line 2287]

That's a big number, so I presume (with my limited understanding) this
is our culprit. I'm just not sure what to do about it.

Cheers;

 - JP

On Mon, Oct 1, 2018 at 1:14 PM, Jock McKechnie <jock.mckechnie at gmail.com> wrote:
> Greetings;
>
> I should have done this last week but, to be honest, it just didn't
> occur to me. I pulled the compression module out (and commented out
> mc_compact()) and re-ran my tests and the memory usage was massively
> lower (losing 100k of memory per minute vs 3-5MB/minute) and within a
> few minutes of calling ceasing almost all of the memory was released
> (went from 42MB down to 3.8MB within 6 minutes).
>
> So it appears that there's something leaking in the compression module.
>
> I'm going to poke around a few more tests (include compression.so
> without actually calling it, etc) but I wanted to pass this on as
> there appears to be a bug that needs addressing.
>
> Is there anything I can provide that might be useful in debugging this?
>
> As always, thank you for your help.
>
>  - Jock
>
> On Fri, Sep 21, 2018 at 8:12 AM, Jock McKechnie
> <jock.mckechnie at gmail.com> wrote:
>> Thank you very much, Răzvan;
>>
>> We are using the compression module for mc_compact(). We use SIP
>> headers for passing around data (don't ask) and the compact headers
>> are mandatory or else we hit MTU/fragmentation problems. We're on the
>> cusp of moving to SIP/TCP but haven't gotten there quite yet.
>> The mc_compact() lives in the main route[1] block and is simply this, in whole:
>>     mc_compact("P-Asserted-Identity|Remote-Party-ID|X-UCID");
>>
>> And my apologies for not including the shm dump segment - please find it below.
>> https://pastebin.com/itSTER7Y
>>
>> Let me know what else I can provide, this is fairly easy to replicate
>> (although it takes about an hour of runtime pummeling it to get nice
>> easy to read dumps). Thank you, again, very much.
>>
>>  - Jock
>>
>> On Fri, Sep 21, 2018 at 3:01 AM, Răzvan Crainea <razvan at opensips.org> wrote:
>>> Hi, Jock!
>>>
>>> I think you are actually heating two memory leaks: one in private memory and
>>> one in shared memory.
>>> In the memory dump you sent I can spot a pkg mem related to (I believe) the
>>> compression module. Can you tell me what functions of the compression module
>>> you are using and what flags are you using for them?
>>>
>>> Besides that, you claim there is a shared memory leak as well. Can you send
>>> us the logs for the shared memory? They are dumped by the first process
>>> (attendant) and should start with the string "Memory status (shm):". Can you
>>> post those logs as well?
>>>
>>> Best regards,
>>> Razvan
>>>
>>>
>>> On 9/20/18 9:43 PM, Jock McKechnie wrote:
>>>>
>>>> Greetings OpenSIPS gurus;
>>>>
>>>> I've run into an issue that I'm not making any real headway with on my
>>>> own. I have an OpenSIPS 2.4.2 running on CentOS7 that does not appear
>>>> to be releasing the bulk of its shared memory when calls die off.
>>>> Eventually OpenSIPS runs itself out of shared memory and goes beserk
>>>> when it can't reclaim memory for new transactions, or I've also seen
>>>> the kernel oomkiller take it down.
>>>>
>>>> The setup is, as mentioned, a 2.4.2 which has a chunk of perl living
>>>> on its back that handles some API directed call decisions. Because of
>>>> the inevitable delay in the API lookups I am running a large number of
>>>> threads (100) to ensure that I don't start dropping packets under high
>>>> levels of load (50 CPS+ - until I bumped children counts up I was
>>>> seeing packets being ignored which got the End User very excitable). I
>>>> appreciate having lots of children amplifies memory usage but my
>>>> belief is during regular load on our side (~6000 calls at ~30 - 40CPS)
>>>> I should have a full day's runtime easily w/o running into problems.
>>>> The issue is that because the memory is not released overnight it
>>>> starts the next day with half its shared memory gone, and then things
>>>> go to pot the next day when it runs through all its remaining shared
>>>> memory.
>>>>
>>>> I have it configured for a very large (I think) amount of memory to
>>>> attempt to compensate short-term - 2048MB shared memory, 128MB package
>>>> memory. This is not a well thought-out number, just me throwing RAM at
>>>> the problem. (For reference this is on a VM running with 4 cores and
>>>> 4096MB of RAM)
>>>>
>>>> I have graphed the memory stats out of the FIFO, after enabling
>>>> memstats and DBG_MALLOC/etc. The graphs can be found here:
>>>> https://imgur.com/a/oapKYJW
>>>>
>>>> You'll see that the first graph, the main shmem:, shows the usage gets
>>>> tied up and remains after calls die off (these examples run for 20
>>>> minutes after the calls stopped - but I can leave it overnight and the
>>>> shmem: usage won't drop). This being said, the other graphed modules
>>>> show that it releases memory correctly after calls quit like they
>>>> should. None of the other modules graphed (eg, _all_ of the modules I
>>>> was using in OpenSIPS) showed no significant usage and I've omitted
>>>> them.
>>>>
>>>> Interesting that the 'perl' module shows no stats ever which makes me
>>>> wonder if the stats on this module works. I am not storing anything
>>>> inside perl, besides lots of local session variables, and all channel
>>>> variables are pushed back into OpenSIPS AVPs, but I assume there must
>>>> be some memory overhead, although it may not significantly change.
>>>>
>>>> When I shut OpenSIPS down I get child dumps like the following: (I've
>>>> included only one for brevity, but they're all much of a muchness)
>>>> https://pastebin.com/KG2qxxBa
>>>>
>>>> If I'm reading this right (and there's a strong possibility I'm
>>>> misunderstanding what I'm looking at) the memory is tied up in io_wait
>>>> and the parser. I'm not sure what this signifies, however.
>>>>
>>>> So I guess what I'm asking is:
>>>> Am I understanding what I'm looking at correctly and that OpenSIPS is
>>>> not releasing its shared memory which causes it to run out of memory
>>>> quicker the following day? (Which lines up with the symptoms of
>>>> shorter runtimes before failure)
>>>>
>>>> Based on the dumps where should I look for the source of this issue?
>>>>
>>>> Finally, and unrelated to the above, I attempted to mail the official
>>>> OpenSIPS peeps to talk about paid support but none of the 'contact us'
>>>> submits appear to work - when I click on the submit button it changes
>>>> shade but does not appear to act. Certainly I never received any
>>>> response. If there was a direct eMail address I could send to that
>>>> would be preferable.
>>>>
>>>> As always, my thanks for your time.
>>>>
>>>>   - Jock
>>>>
>>>> PS -
>>>> List of modules I'm using in this config:loadmodule "signaling.so"
>>>> loadmodule "sl.so"
>>>> loadmodule "tm.so"
>>>> loadmodule "rr.so"
>>>> loadmodule "maxfwd.so"
>>>> loadmodule "textops.so"
>>>> loadmodule "uri.so"
>>>> loadmodule "mi_fifo.so"
>>>> loadmodule "siptrace.so"
>>>> loadmodule "sipmsgops.so"
>>>> loadmodule "dialog.so"
>>>> loadmodule "uac_auth.so"
>>>> loadmodule "uac.so"
>>>> loadmodule "perl.so"
>>>> loadmodule "acc.so"
>>>> loadmodule "proto_hep.so"
>>>> loadmodule "proto_udp.so"
>>>> loadmodule "compression.so"
>>>>
>>>> _______________________________________________
>>>> Users mailing list
>>>> Users at lists.opensips.org
>>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users
>>>>
>>>
>>> --
>>> Răzvan Crainea
>>> OpenSIPS Core Developer
>>>   http://www.opensips-solutions.com
>>>
>>> _______________________________________________
>>> Users mailing list
>>> Users at lists.opensips.org
>>> http://lists.opensips.org/cgi-bin/mailman/listinfo/users



More information about the Users mailing list