PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
5117 root 20 0 5121m 498m 1652 S 97 6.2 643:45.52 4.5g zfs-fuse
This makes no sense to me. I have 8G of real RAM, and the zfs-fuse
process seems to have no limits whatsoever that would cause this
problem:
$ sudo cat /proc/5117/limits
[sudo] password for dave:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited ms
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 77824 77824 processes
Max open files 1024 1024 files
Max locked memory 32768 32768 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 77824 77824 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
I haven't done anything significant other than install and run iozone
since I booted this system.
What in tarnation is going on here?!
--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com
That's not specific to zfs-fuse. The linux kernel memory manager is
keeping track of what's actually in use as well as what has been
requested. Many/most processes will have virtual sizes bigger than
their resident amount. With zfs-fuse, perhaps it's a bit dramatic, but
it's nothing to worry about. The EDA tools we use at work sometimes
look like that (except their virtual size can be >32GB, with 9-12G RSS).
Check vmstat and see if the system is actually swapping -- I bet it isn't.
Jonathan
> David Abrahams wrote:
>> I'm looking at my zfs-fuse process and noticing that it has a virtual
>> memory size of 4.5G of which only 500M is resident.
>> >>
>> What in tarnation is going on here?!
>
> That's not specific to zfs-fuse. The linux kernel memory manager is
> keeping track of what's actually in use as well as what has been
> requested. Many/most processes will have virtual sizes bigger than
> their resident amount. With zfs-fuse, perhaps it's a bit dramatic,
Yeah, a bit! It's my understanding that ZFS uses large amounts of
memory to gain speed. I've given it nearly 8G to work with and
stressing it as hard as I can. Why is it only using 500M?
> but it's nothing to worry about. The EDA tools we use at work
> sometimes look like that (except their virtual size can be >32GB, with
> 9-12G RSS). Check vmstat and see if the system is actually swapping
> -- I bet it isn't.
There's no swap activity, but I wasn't really worried that there was.
I'm more concerned that I may not be getting the speed I deserve ;-)
zfs, yes.
It uses most available memory for ARC (i.e. cache).
> I've given it nearly 8G to work with and
> stressing it as hard as I can. Why is it only using 500M?
>
Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded limit.
--
FAN
Then I still wonder: what are the other 4G it has allocated, but which
are now swapped out?
Careful, you are making an assumption here. I would actually guess that
the other 4G was allocated and *never used*, rather than being swapped
out. Like I said, the Linux kernel pays attention to actual memory usage,
and won't necessarily give processes physical RAM pages just because they
ask for them.
For comparison, I summed up the processes running on my VNC server and
here's what I get (NOTE: no zfs-fuse running on this machine. Just
standard Linux software):
Virtual RSS
8394080 2850132
So the virtual size is just about 3x of what's actually resident. Feel
free to wonder what that extra 4GB is for, but don't let it worry you.
The kernel memory manager is good at keeping everyone happy.
Jonathan
>>>> Yeah, a bit! It's my understanding that ZFS uses large amounts of
>>>> memory to gain speed.
>>>
>>> zfs, yes.
>>> It uses most available memory for ARC (i.e. cache).
>>>
>>>> I've given it nearly 8G to work with and
>>>> stressing it as hard as I can. Why is it only using 500M?
>>>>
>>>
>>> Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded
>>> limit.
>>
>> Then I still wonder: what are the other 4G it has allocated, but which
>> are now swapped out?
>
> Careful, you are making an assumption here. I would actually guess that
> the other 4G was allocated and *never used*, rather than being swapped
> out.
I did consider that possibility. Thanks for suggesting that it might be
real.
> Like I said, the Linux kernel pays attention to actual memory usage,
> and won't necessarily give processes physical RAM pages just because they
> ask for them.
Ayup.
> For comparison, I summed up the processes running on my VNC server and
> here's what I get (NOTE: no zfs-fuse running on this machine. Just
> standard Linux software):
>
> Virtual RSS
> 8394080 2850132
>
> So the virtual size is just about 3x of what's actually resident. Feel
> free to wonder what that extra 4GB is for, but don't let it worry you.
> The kernel memory manager is good at keeping everyone happy.
Okey, thanks.
>>>> Yeah, a bit! It's my understanding that ZFS uses large amounts of
>>>> memory to gain speed.
>>>
>>> zfs, yes.
>>> It uses most available memory for ARC (i.e. cache).
>>>
>>>> I've given it nearly 8G to work with and
>>>> stressing it as hard as I can. Why is it only using 500M?
>>>>
>>>
>>> Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded
>>> limit.
>>
>> Then I still wonder: what are the other 4G it has allocated, but which
>> are now swapped out?
>
> Careful, you are making an assumption here. I would actually guess that
> the other 4G was allocated and *never used*, rather than being swapped
> out. Like I said, the Linux kernel pays attention to actual memory usage,
> and won't necessarily give processes physical RAM pages just because they
> ask for them.
Sure, but I still don't understand why ZFS would allocate 4G (even if it
never wires most of that down) if it is only going to use roughly 128M
because that's the ARC cache limit.
> This "ZFS-FUSE allocates much more memory than it needs" excuse is
> bullcrap.
Sorry, but I'm a little lost. Could you explain:
* who is making that excuse?
* what bad thing are they attempting to excuse?
> Here is my 4GB RAM system today:
>
> 1567 32 0 1109K 638.8M 197.7M 0K 0K 6% zfs-fuse
>
> May I remind you that those 638 MB VSS were ALMOST THREE GIGABYTES,
> before I made a single change:
>
> ulimit -v unlimited
> ulimit -c 512000
> ulimit -l unlimited
> ulimit -s unlimited
>
> remove the stack limit and you should see a STAGGERING decrease in
> memory usage.
Sooo... you *removed* the stack size limit and memory usage went down?
That is *really* confusing. Do you understand why it's happening?
>> Sooo... you *removed* the stack size limit and memory usage went down?
>> That is *really* confusing. Do you understand why it's happening?
>
> Probably related to this:
>
> http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory
Well, naturally ;-)
>> Sorry, but I'm a little lost. Could you explain:
>>
>> * who is making that excuse?
>> * what bad thing are they attempting to excuse?
>
> Someone above in this thread, saying that ZFS gobbles up memory like a
> professional callgirl gobbles up money. Truth be told, I used to
> think that, then some day I tinkered with the ulimits, removing them,
> and BLAM, "whoa, baby, have you lost weight?". And it really helped,
> man -- I was running 1 GB RAM back in those days.
>
> I honestly do not know how sharing that tip slipped my mind.
Maybe we need a wiki page of ZFS-Fuse tips; they're starting to
accumulate.
I read that article and I don't see anything about restricting stack size
causing increased memory usage (even virtual). I had thought that the
only thing that affects the virtual size of a process is when it allocates
memory.
> ulimit -v unlimited
> ulimit -c 512000
> ulimit -l unlimited
> ulimit -s unlimited
>
> remove the stack limit and you should see a STAGGERING decrease in
> memory usage.
And exactly where do you make that change? If I do that in my init
script, it doesn't seem to affect the actual zfs-fuse process.
I challenge those assumptions. If it is true that most of the 4GB of
zfs-fuse's VSS is just the extra preallocated stack space for each thread,
then that immediately answers the question of "why does zfs-fuse allocate
memory that it doesn't use?" Answer: it doesn't, fair enough. However,
in that case, it will not be using any extra physical RAM, it won't swap
more, and each context switch will be the same speed.
I'm not claiming that it's not worth setting stack to unlimited to see the
VSS drop down, nor am I claiming that the enormous VSS hasn't caused
problems in some situations (see the guy that was running many pools, and
was having trouble). I do maintain that the kernel will do "the right
thing" in most situations and also that virtual process size is not a
finite resource that really gets used up.
Anyway, go ahead and fix it but (I think) there are more productive uses
of everyone's time. This is really more of a discussion for the Linux
kernel VMA. (Say, to ask the question of why does each new thread get 8MB
stack size pre-allocated for it? Perhaps the answer would be "because it
doesn't cost anything.")
Jonathan
The right question, though, is "why does zfs-fuse need so many threads
even when it does nothing?" Right after startup, there are 156 of them
here... Most of them on __lll_lock_wait and pthread_cond_wait.
Mike
Really? Does it matter? Honestly I doubt it. From what perspective
does that cause you alarm?
Jonathan
Well, that's 156 times whatever stack size you use allocated for
not much... I know memory is cheap nowadays but that's not a valid
enough reason IMHO.
Mike
There is no excuse for wasting memory just because it's cheap. However,
the valid reason in this case is the architecture/design of the
implementation. Obviously it was chosen to be heavy on the use of
threads. Keep in mind, threads are very lightweight entities. Ignore
their default virtual size -- all that matters is the amount of memory
they actually use. Do you think a less-threaded implementation could be
written that would perform similarly and use less memory? What about
scaling across large numbers of cores? Besides, you are talking about a
total of a handful of kB worth of potential overhead. I'll say again,
your time is better spent elsewhere.
> So today I tried the "unlimited" trick and memory went down again. I
> suspect that with stacksize unlimited, the kernel is using a different
> algorithm for computing default stack size. I'll have to investigate.
One more data point: I currently have no ZFS pools set up. Just
starting zfs-fuse without 'ulimit -s unlimited' reserves 431m (and uses
19m). With the ulimit setting, it reserves 133m (and still uses 19m).
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
COMMAND
11288 root 15 0 1186m 441m 1992 S 49 5.5 202:07.48 slashd