服务器后台程序的内存使用有关问题

服务器后台程序的内存使用问题

目前我开发的一个服务器后台程序存在这么一个问题,由于我的程序要不断的收发消息,并做统计,统计用的是stl的多重map,在统计中会不断的往map里赛数据。但是每次统计后我都会调用clear()去释放内存,但是似乎并不奏效,仍然会有泄漏的现象。查资料,map的clear是将map内容清空,但是内存并不归还给系统,而是缓冲在内存池里以方便下次调用,有人提出,可以新建一个map,将两个map做swap操作,互换内容,然后delete这个新map,达到释放的效果,但是不奏效。我也想到多重map,是否需要将多重map里的小map对象也一一clear,然后再clear这个多重map,但是也不奏效。难道stl真的存在这个问题吗?

如下是机器cat /proc/meminfo的内容:

MemTotal:       23929284 kB
MemFree:         1238556 kB
Buffers:          198708 kB
Cached:         16489584 kB
SwapCached:            0 kB
Active:          7130900 kB
Inactive:       10959804 kB
Active(anon):    1402564 kB
Inactive(anon):     2504 kB
Active(file):    5728336 kB
Inactive(file): 10957300 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      10485752 kB
SwapFree:       10485752 kB
Dirty:               312 kB
Writeback:             0 kB
AnonPages:       1402460 kB
Mapped:            19048 kB
Shmem:              2616 kB
Slab:            2303932 kB
SReclaimable:    1053828 kB
SUnreclaim:      1250104 kB
KernelStack:        1896 kB
PageTables:        15248 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    22450392 kB
Committed_AS:    2153752 kB
VmallocTotal:   34359738367 kB
VmallocUsed:       71416 kB
VmallocChunk:   34359661160 kB
HardwareCorrupted:     0 kB
AnonHugePages:         0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:    24576000 kB
DirectMap2M:           0 kB

引用红帽的解释,对各字段解释。http://www.redhat.com/advice/tips/meminfo.html

"Free," "buffer," "swap," "dirty." What does it all mean? If you said, "something to do with the Summer of '68", you may need a primer on 'meminfo'.

The entries in the /proc/meminfo can help explain what's going on with your memory usage, if you know how to read it.

Example of "cat /proc/meminfo":

root:	total:    	used:    	free:  		shared:	buffers:	cached:
Mem:  	1055760384	1041887232	13873152	0	100417536 	711233536
Swap: 	1077501952  	8540160 	1068961792
						
MemTotal:		1031016 kB	
MemFree:		13548 kB
MemShared:		0 kB
Buffers:		98064 kB
Cached:			692320 kB
SwapCached:		2244 kB
Active:			563112 kB
Inact_dirty:		309584 kB
Inact_clean:		79508 kB
Inact_target:		190440 kB
HighTotal:		130992 kB
HighFree:		1876 kB
LowTotal:		900024 kB
LowFree:		11672 kB
SwapTotal:		1052248 kB
SwapFree:		1043908 kB
Committed_AS:		332340 kB
						

The information comes in the form of both high-level and low-level statistics. At the top you see a quick summary of the most common values people would like to look at. Below you find the individual values we will discuss. First we will discuss the high-level statistics.

High-Level Statistics

  • MemTotal: Total usable ram (i.e. physical ram minus a few reserved bits and the kernel binary code)
  • MemFree: Is sum of LowFree+HighFree (overall stat)
  • MemShared: 0; is here for compat reasons but always zero.
  • Buffers: Memory in buffer cache. mostly useless as metric nowadays
  • Cached: Memory in the pagecache (diskcache) minus SwapCache
  • SwapCache: Memory that once was swapped out, is swapped back in but still also is in the swapfile (if memory is needed it doesn't need to be swapped out AGAIN because it is already in the swapfile. This saves I/O)

Detailed Level Statistics
VM Statistics

VM splits the cache pages into "active" and "inactive" memory. The idea is that if you need memory and some cache needs to be sacrificed for that, you take it from inactive since that's expected to be not used. The vm checks what is used on a regular basis and moves stuff around.

When you use memory, the CPU sets a bit in the pagetable and the VM checks that bit occasionally, and based on that, it can move pages back to active. And within active there's an order of "longest ago not used" (roughly, it's a little more complex in reality). The longest-ago used ones can get moved to inactive. Inactive is split into two in the above kernel (2.4.18-24.8.0). Some have it three.

  • Active: Memory that has been used more recently and usually not reclaimed unless absolutely necessary.
  • Inact_dirty: Dirty means "might need writing to disk or swap." Takes more work to free. Examples might be files that have not been written to yet. They aren't written to memory too soon in order to keep the I/O down. For instance, if you're writing logs, it might be better to wait until you have a complete log ready before sending it to disk.
  • Inact_clean: Assumed to be easily freeable. The kernel will try to keep some clean stuff around always to have a bit of breathing room.
  • Inact_target: Just a goal metric the kernel uses for making sure there are enough inactive pages around. When exceeded, the kernel will not do work to move pages from active to inactive. A page can also get inactive in a few other ways, e.g. if you do a long sequential I/O, the kernel assumes you're not going to use that memory and makes it inactive preventively. So you can get more inactive pages than the target because the kernel marks some cache as "more likely to be never used" and lets it cheat in the "last used" order.

Memory Statistics

  • HighTotal: is the total amount of memory in the high region. Highmem is all memory above (approx) 860MB of physical RAM. Kernel uses indirect tricks to access the high memory region. Data cache can go in this memory region.
  • LowTotal: The total amount of non-highmem memory.
  • LowFree: The amount of free memory of the low memory region. This is the memory the kernel can address directly. All kernel datastructures need to go into low memory.
  • SwapTotal: Total amount of physical swap memory.
  • SwapFree: Total amount of swap memory free.
  • Committed_AS: An estimate of how much RAM you would need to make a 99.99% guarantee that there never is OOM (out of memory) for this workload. Normally the kernel will overcommit memory. That means, say you do a 1GB malloc, nothing happens, really. Only when you start USING that malloc memory you will get real memory on demand, and just as much as you use. So you sort of take a mortgage and hope the bank doesn't go bust. Other cases might include when you mmap a file that's shared only when you write to it and you get a private copy of that data. While it normally is shared between processes. The Committed_AS is a guesstimate of how much RAM/swap you would need worst-case.

不能解决这个问题,我比较纠结。请大家帮忙了。


经过多天的追踪,我想应该能说明这个问题了。昨天晚上我估计机器24G内存的使用情况,按照速度估计应该是用完了快,用完就会报警,报警就要挨说,我还是等着吧顺便看世界杯。结果世界杯看完了,内存一直是剩下个100来M,就是用不完,然后我就去睡觉了。

今天早上醒来,我觉得这个机器似乎一直有内存用,不像有out of memory的趋势。我灵机一动,想到当时我觉得内存有泄漏时,师傅让我写过一个最简单的,使用stl map的,一直往map里塞数据,然后clear,然后再分配,再clear,观察内存的增长情况,当时的效果是,clear后就不再长了,我当时没有在意。今天我想,系统既然似乎是永远有内存可用,那么是不是有可能我的服务器程序已经把内存归还给系统了,只是linux系统没有把内存页面标记为未分配而已,那么我用这样一个test map的程序,先把内存抢过来,然后kill掉这个map程序,然后top下是否就可以看到free项变大呢。我写了一个最简单的程序:

#include<iostream>
#include<map>
#include <stdio.h>
#include <stdlib.h>
using namespace std;

#define unsigned int uint

int main()
{
	map<uint , uint> t;
	uint td = 0;
	uint m = 0;
	while(1)
	{
		t[td++] = 1;	

		if(td == 0xffffffff)
			break;
		m ++;
//		if(m == 0xffffff)
//		{
//			m = 0;
//			t.clear();
//		}
//		printf("%d\n", td);
	}
	while(1)
		sleep(10);

}

就是一直塞数据,看它占用多大内存。现在已经占用4g

服务器后台程序的内存使用有关问题

服务器后台程序的内存使用有关问题

然后kill掉它。

服务器后台程序的内存使用有关问题

服务器后台程序的内存使用有关问题

可以看到内存归还了。


我跟师傅说了下情况,师傅问我为什么判定内存没有泄漏,我说首先系统没有报out of memory,说明它一直有内存可用,内存分配情况一直是匀速的,不是那种程序一跑就暴涨的,说明我的程序分配内存情况是可控的。然后我觉得应该是程序一直在分配内存,根据这个帖子(http://blog.csdn.net/macky0668/article/details/4552289)说的,

下面我要讲一个内存空洞的问题:

一个场景,堆顶有一块正在使用的内存,而下面有很大的连续内存已经被释放掉了,那么这块内存是否能够被释放?其对应的物理内存是否能够被释放?

很遗憾,不能。

这也就是说,只要堆顶的部分申请内存还在占用,我在下面释放的内存再多,都不会被返回到系统中,仍然占用着物理内存。为什么会这样呢?

这主要是与内核在处理堆的时候,过于简单,它只能通过调整堆顶指针的方式来调整调整程序占用的线性区;而又只能通过调整线性区的方式,来释放内存。所以只要堆顶不减小,占用的内存就不会释放。

我觉得我的程序属于这种情况,内存应该是没有占着,而且可能已经归还给系统了,只是系统没有把内存标记为未分配,所以在top中看到内存越来越少。很可能就是程序一直在堆顶分配,或者没反给系统,或者反给系统后,系统没有标记回收。不管怎样,当有一个新的进程一直要分配内存的时候,系统总是能左手倒右手找出空间分配给新程序,所以我写了这个map测试程序。最后,经过map测试程序,发现总是会有内存可以榨出来给新进程使用,并且在kill掉a.out后,内存确实归还给了系统,在top中可以看到free项多了好多。另外,在程序运行的这段时间里,进程的内存占用一直是稳定的。所以,我认为机器内存越用越少不是因为内存泄漏,而是stl的内存池机制或者linux系统的内存管理机制导致的,并且系统可能不会有那种oops级别的错误,能影响到的可能只是系统的响应和增加内存的分配与回收次数。

服务器后台程序的内存使用有关问题服务器后台程序的内存使用有关问题

那现在我觉得可以下个结论是,程序没有内存泄漏。

不过,还有两个问题要确认,一个是内存一直在分配,是因为stl的内存池机制,还是linux系统的内存管理(linux会尽可能的把内存全分配出去,这点和windows自己定一个虚拟内存和内存的比例不同)机制导致。另一个问题是,这些越来越增长的内存,有没有归还给系统,还是说归还了,只是系统没有标记内存。这点需要看stl源码剖析和深入理解linux内核了。

anyway,一个bug级别的问题不那么urge了。