Huge virtual memory use; mostly swapped out

87 views
Skip to first unread message

David Abrahams

unread,
Feb 8, 2009, 2:08:42 PM2/8/09
to zfs-...@googlegroups.com

I'm looking at my zfs-fuse process and noticing that it has a virtual
memory size of 4.5G of which only 500M is resident.

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ SWAP COMMAND
5117 root 20 0 5121m 498m 1652 S 97 6.2 643:45.52 4.5g zfs-fuse

This makes no sense to me. I have 8G of real RAM, and the zfs-fuse
process seems to have no limits whatsoever that would cause this
problem:

$ sudo cat /proc/5117/limits
[sudo] password for dave:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited ms
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 77824 77824 processes
Max open files 1024 1024 files
Max locked memory 32768 32768 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 77824 77824 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

I haven't done anything significant other than install and run iozone
since I booted this system.

What in tarnation is going on here?!

--
Dave Abrahams
BoostPro Computing
http://www.boostpro.com

Jonathan Schmidt

unread,
Feb 8, 2009, 2:32:29 PM2/8/09
to zfs-...@googlegroups.com

That's not specific to zfs-fuse. The linux kernel memory manager is
keeping track of what's actually in use as well as what has been
requested. Many/most processes will have virtual sizes bigger than
their resident amount. With zfs-fuse, perhaps it's a bit dramatic, but
it's nothing to worry about. The EDA tools we use at work sometimes
look like that (except their virtual size can be >32GB, with 9-12G RSS).
Check vmstat and see if the system is actually swapping -- I bet it isn't.

Jonathan

David Abrahams

unread,
Feb 8, 2009, 2:38:55 PM2/8/09
to zfs-...@googlegroups.com

on Sun Feb 08 2009, Jonathan Schmidt <jon-AT-jschmidt.ca> wrote:

> David Abrahams wrote:
>> I'm looking at my zfs-fuse process and noticing that it has a virtual
>> memory size of 4.5G of which only 500M is resident.
>> >>

>> What in tarnation is going on here?!
>
> That's not specific to zfs-fuse. The linux kernel memory manager is
> keeping track of what's actually in use as well as what has been
> requested. Many/most processes will have virtual sizes bigger than
> their resident amount. With zfs-fuse, perhaps it's a bit dramatic,

Yeah, a bit! It's my understanding that ZFS uses large amounts of
memory to gain speed. I've given it nearly 8G to work with and
stressing it as hard as I can. Why is it only using 500M?

> but it's nothing to worry about. The EDA tools we use at work
> sometimes look like that (except their virtual size can be >32GB, with
> 9-12G RSS). Check vmstat and see if the system is actually swapping
> -- I bet it isn't.

There's no swap activity, but I wasn't really worried that there was.
I'm more concerned that I may not be getting the speed I deserve ;-)

drewpca

unread,
Feb 9, 2009, 3:49:25 AM2/9/09
to zfs-fuse

Here's another data point. I do not make any special effort to stress
this setup, but I'm pretty sure I've read more than 278MB of data in
the last 6 days :)


top - 00:09:05 up 6 days, 14:26, 2 users, load average: 0.33, 0.51,
0.49
Tasks: 115 total, 1 running, 114 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Mem: 5088156k total, 5001312k used, 86844k free, 136248k
buffers
Swap: 3028244k total, 108k used, 3028136k free, 1895300k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
20383 root 20 0 1654m 278m 1612 S 0.0 5.6 622:23.16 zfs-fuse


NAME STATE READ WRITE CKSUM
stor3 ONLINE 0 0 0
mirror ONLINE 0 0 0
sde ONLINE 0 0 0
sdf ONLINE 0 0 0

NAME USED AVAIL REFER MOUNTPOINT
stor3 444G 12.8G 419G /stor3
stor3@2008-12-22 25.1G - 431G -
stor3/c1 0 12.8G 431G /stor3/c1


I catted a 178MB zfs file to /dev/null and it took 7.8 seconds, which
I think means my read rate is about 22MB/sec.
(timing result was 0.05s user 0.47s system 6% cpu 7.872 total)

Fajar A. Nugraha

unread,
Feb 9, 2009, 4:11:21 AM2/9/09
to zfs-...@googlegroups.com
On Mon, Feb 9, 2009 at 2:38 AM, David Abrahams <da...@boostpro.com> wrote:
> Yeah, a bit! It's my understanding that ZFS uses large amounts of
> memory to gain speed.

zfs, yes.
It uses most available memory for ARC (i.e. cache).

> I've given it nearly 8G to work with and
> stressing it as hard as I can. Why is it only using 500M?
>

Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded limit.

--
FAN

David Abrahams

unread,
Feb 9, 2009, 11:00:05 AM2/9/09
to zfs-...@googlegroups.com

Then I still wonder: what are the other 4G it has allocated, but which
are now swapped out?

Jonathan Schmidt

unread,
Feb 9, 2009, 1:40:12 PM2/9/09
to zfs-...@googlegroups.com
>>> Yeah, a bit! It's my understanding that ZFS uses large amounts of
>>> memory to gain speed.
>>
>> zfs, yes.
>> It uses most available memory for ARC (i.e. cache).
>>
>>> I've given it nearly 8G to work with and
>>> stressing it as hard as I can. Why is it only using 500M?
>>>
>>
>> Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded
>> limit.
>
> Then I still wonder: what are the other 4G it has allocated, but which
> are now swapped out?

Careful, you are making an assumption here. I would actually guess that
the other 4G was allocated and *never used*, rather than being swapped
out. Like I said, the Linux kernel pays attention to actual memory usage,
and won't necessarily give processes physical RAM pages just because they
ask for them.

For comparison, I summed up the processes running on my VNC server and
here's what I get (NOTE: no zfs-fuse running on this machine. Just
standard Linux software):

Virtual RSS
8394080 2850132

So the virtual size is just about 3x of what's actually resident. Feel
free to wonder what that extra 4GB is for, but don't let it worry you.
The kernel memory manager is good at keeping everyone happy.

Jonathan

David Abrahams

unread,
Feb 9, 2009, 2:42:42 PM2/9/09
to zfs-...@googlegroups.com

on Mon Feb 09 2009, "Jonathan Schmidt" <jon-AT-jschmidt.ca> wrote:

>>>> Yeah, a bit! It's my understanding that ZFS uses large amounts of
>>>> memory to gain speed.
>>>
>>> zfs, yes.
>>> It uses most available memory for ARC (i.e. cache).
>>>
>>>> I've given it nearly 8G to work with and
>>>> stressing it as hard as I can. Why is it only using 500M?
>>>>
>>>
>>> Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded
>>> limit.
>>
>> Then I still wonder: what are the other 4G it has allocated, but which
>> are now swapped out?
>
> Careful, you are making an assumption here. I would actually guess that
> the other 4G was allocated and *never used*, rather than being swapped
> out.

I did consider that possibility. Thanks for suggesting that it might be
real.

> Like I said, the Linux kernel pays attention to actual memory usage,
> and won't necessarily give processes physical RAM pages just because they
> ask for them.

Ayup.

> For comparison, I summed up the processes running on my VNC server and
> here's what I get (NOTE: no zfs-fuse running on this machine. Just
> standard Linux software):
>
> Virtual RSS
> 8394080 2850132
>
> So the virtual size is just about 3x of what's actually resident. Feel
> free to wonder what that extra 4GB is for, but don't let it worry you.
> The kernel memory manager is good at keeping everyone happy.

Okey, thanks.

David Abrahams

unread,
Feb 10, 2009, 8:58:44 PM2/10/09
to zfs-...@googlegroups.com

on Mon Feb 09 2009, "Jonathan Schmidt" <jon-AT-jschmidt.ca> wrote:

>>>> Yeah, a bit! It's my understanding that ZFS uses large amounts of
>>>> memory to gain speed.
>>>
>>> zfs, yes.
>>> It uses most available memory for ARC (i.e. cache).
>>>
>>>> I've given it nearly 8G to work with and
>>>> stressing it as hard as I can. Why is it only using 500M?
>>>>
>>>
>>> Because zfs-fuse by default only uses 128MB for ARC, and it's hardcoded
>>> limit.
>>
>> Then I still wonder: what are the other 4G it has allocated, but which
>> are now swapped out?
>
> Careful, you are making an assumption here. I would actually guess that
> the other 4G was allocated and *never used*, rather than being swapped
> out. Like I said, the Linux kernel pays attention to actual memory usage,
> and won't necessarily give processes physical RAM pages just because they
> ask for them.

Sure, but I still don't understand why ZFS would allocate 4G (even if it
never wires most of that down) if it is only going to use roughly 128M
because that's the ARC cache limit.

Rudd-O

unread,
Feb 12, 2009, 4:41:18 PM2/12/09
to zfs-fuse
This "ZFS-FUSE allocates much more memory than it needs" excuse is
bullcrap. Here is my 4GB RAM system today:

1567 32 0 1109K 638.8M 197.7M 0K 0K 6% zfs-
fuse

May I remind you that those 638 MB VSS were ALMOST THREE GIGABYTES,
before I made a single change:

ulimit -v unlimited
ulimit -c 512000
ulimit -l unlimited
ulimit -s unlimited

remove the stack limit and you should see a STAGGERING decrease in
memory usage.

David Abrahams

unread,
Feb 12, 2009, 4:46:47 PM2/12/09
to zfs-...@googlegroups.com, drago...@gmail.com

on Thu Feb 12 2009, Rudd-O <dragonfear-AT-gmail.com> wrote:

> This "ZFS-FUSE allocates much more memory than it needs" excuse is
> bullcrap.

Sorry, but I'm a little lost. Could you explain:

* who is making that excuse?
* what bad thing are they attempting to excuse?

> Here is my 4GB RAM system today:
>
> 1567 32 0 1109K 638.8M 197.7M 0K 0K 6% zfs-fuse
>
> May I remind you that those 638 MB VSS were ALMOST THREE GIGABYTES,
> before I made a single change:
>
> ulimit -v unlimited
> ulimit -c 512000
> ulimit -l unlimited
> ulimit -s unlimited
>
> remove the stack limit and you should see a STAGGERING decrease in
> memory usage.

Sooo... you *removed* the stack size limit and memory usage went down?
That is *really* confusing. Do you understand why it's happening?

Rudd-O

unread,
Feb 12, 2009, 4:59:31 PM2/12/09
to zfs-fuse
> Sooo... you *removed* the stack size limit and memory usage went down?
> That is *really* confusing. Do you understand why it's happening?

Probably related to this:

http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory

> Sorry, but I'm a little lost. Could you explain:
>
> * who is making that excuse?
> * what bad thing are they attempting to excuse?

Someone above in this thread, saying that ZFS gobbles up memory like a
professional callgirl gobbles up money. Truth be told, I used to
think that, then some day I tinkered with the ulimits, removing them,
and BLAM, "whoa, baby, have you lost weight?". And it really helped,
man -- I was running 1 GB RAM back in those days.

I honestly do not know how sharing that tip slipped my mind.

David Abrahams

unread,
Feb 12, 2009, 5:21:00 PM2/12/09
to zfs-...@googlegroups.com

on Thu Feb 12 2009, Rudd-O <dragonfear-AT-gmail.com> wrote:

>> Sooo... you *removed* the stack size limit and memory usage went down?
>> That is *really* confusing. Do you understand why it's happening?
>
> Probably related to this:
>
> http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory

Well, naturally ;-)

>> Sorry, but I'm a little lost. Could you explain:
>>
>> * who is making that excuse?
>> * what bad thing are they attempting to excuse?
>
> Someone above in this thread, saying that ZFS gobbles up memory like a
> professional callgirl gobbles up money. Truth be told, I used to
> think that, then some day I tinkered with the ulimits, removing them,
> and BLAM, "whoa, baby, have you lost weight?". And it really helped,
> man -- I was running 1 GB RAM back in those days.
>
> I honestly do not know how sharing that tip slipped my mind.

Maybe we need a wiki page of ZFS-Fuse tips; they're starting to
accumulate.

Jonathan Schmidt

unread,
Feb 12, 2009, 6:28:50 PM2/12/09
to zfs-...@googlegroups.com
>> Sooo... you *removed* the stack size limit and memory usage went down?
>> That is *really* confusing. Do you understand why it's happening?
>
> Probably related to this:
>
> http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory

I read that article and I don't see anything about restricting stack size
causing increased memory usage (even virtual). I had thought that the
only thing that affects the virtual size of a process is when it allocates
memory.

David Abrahams

unread,
Feb 12, 2009, 8:30:11 PM2/12/09
to zfs-...@googlegroups.com

on Thu Feb 12 2009, Rudd-O <dragonfear-AT-gmail.com> wrote:

> ulimit -v unlimited
> ulimit -c 512000
> ulimit -l unlimited
> ulimit -s unlimited
>
> remove the stack limit and you should see a STAGGERING decrease in
> memory usage.

And exactly where do you make that change? If I do that in my init
script, it doesn't seem to affect the actual zfs-fuse process.

Rudd-O

unread,
Feb 14, 2009, 9:21:24 PM2/14/09
to zfs-fuse
Change goes as a few lines before the zfs-fuse command, in your zfs
fuse launcher script.

I have a wiki at http://software-libre.rudd-o.com/ -- feel free to add
a zfs-fuse tips and tricks page there, and I'll add mine. I think the
wiki requires registration though, because I had problems with
spammers in the past.

Jeffrey Schiller

unread,
Feb 15, 2009, 6:01:01 PM2/15/09
to zfs-fuse
Here is what I have learned. ZFS uses *a lot* of threads. And the more
pools you have, the more threads you have, again by a lot.

I typically run with 4 or 5 pools. My zfs-fuse process used to crash
because it would fail to create threads. Virtual memory usage was
huge. Turns out when I did a "ulimit -s 2048" before starting zfs-fuse
memory usage went way down and I stopped having problems creating
threads. I suspect with the default on my system of a stacksize of
8192 resulted in the kernel allocating the full 8Meg of VM to each of
MANY threads. Reducing this to 2048 (2Meg) made things manageable.

So today I tried the "unlimited" trick and memory went down again. I
suspect that with stacksize unlimited, the kernel is using a different
algorithm for computing default stack size. I'll have to investigate.

-Jeff

Rudd-O

unread,
Feb 16, 2009, 12:25:29 PM2/16/09
to zfs-fuse
I think that with the unlimited stack option, the kernel no longer
preallocates the stack for each thread in the thread pool, or somehow
the stack and the heap are mixed or the stack itself is segmented in
the heap?

Anyway, I am happy that the trick worked for you. It's always nice to
use less RAM, to swap less, and (if I think unlimited stack is
implemented as I think it is) to help each context switch last just a
tad less.

Jonathan Schmidt

unread,
Feb 16, 2009, 1:19:16 PM2/16/09
to zfs-...@googlegroups.com

I challenge those assumptions. If it is true that most of the 4GB of
zfs-fuse's VSS is just the extra preallocated stack space for each thread,
then that immediately answers the question of "why does zfs-fuse allocate
memory that it doesn't use?" Answer: it doesn't, fair enough. However,
in that case, it will not be using any extra physical RAM, it won't swap
more, and each context switch will be the same speed.

I'm not claiming that it's not worth setting stack to unlimited to see the
VSS drop down, nor am I claiming that the enormous VSS hasn't caused
problems in some situations (see the guy that was running many pools, and
was having trouble). I do maintain that the kernel will do "the right
thing" in most situations and also that virtual process size is not a
finite resource that really gets used up.

Anyway, go ahead and fix it but (I think) there are more productive uses
of everyone's time. This is really more of a discussion for the Linux
kernel VMA. (Say, to ask the question of why does each new thread get 8MB
stack size pre-allocated for it? Perhaps the answer would be "because it
doesn't cost anything.")

Jonathan

Mike Hommey

unread,
Feb 17, 2009, 2:29:43 AM2/17/09
to zfs-...@googlegroups.com
On Mon, Feb 16, 2009 at 10:19:16AM -0800, Jonathan Schmidt wrote:
>
> > I think that with the unlimited stack option, the kernel no longer
> > preallocates the stack for each thread in the thread pool, or somehow
> > the stack and the heap are mixed or the stack itself is segmented in
> > the heap?
> >
> > Anyway, I am happy that the trick worked for you. It's always nice to
> > use less RAM, to swap less, and (if I think unlimited stack is
> > implemented as I think it is) to help each context switch last just a
> > tad less.
>
> I challenge those assumptions. If it is true that most of the 4GB of
> zfs-fuse's VSS is just the extra preallocated stack space for each thread,
> then that immediately answers the question of "why does zfs-fuse allocate
> memory that it doesn't use?" Answer: it doesn't, fair enough. However,
> in that case, it will not be using any extra physical RAM, it won't swap
> more, and each context switch will be the same speed.

The right question, though, is "why does zfs-fuse need so many threads
even when it does nothing?" Right after startup, there are 156 of them
here... Most of them on __lll_lock_wait and pthread_cond_wait.

Mike

Jonathan Schmidt

unread,
Feb 17, 2009, 1:00:11 PM2/17/09
to zfs-...@googlegroups.com

Really? Does it matter? Honestly I doubt it. From what perspective
does that cause you alarm?

Jonathan

Mike Hommey

unread,
Feb 17, 2009, 1:26:07 PM2/17/09
to zfs-...@googlegroups.com

Well, that's 156 times whatever stack size you use allocated for
not much... I know memory is cheap nowadays but that's not a valid
enough reason IMHO.

Mike

Jonathan Schmidt

unread,
Feb 17, 2009, 1:54:08 PM2/17/09
to zfs-...@googlegroups.com

There is no excuse for wasting memory just because it's cheap. However,
the valid reason in this case is the architecture/design of the
implementation. Obviously it was chosen to be heavy on the use of
threads. Keep in mind, threads are very lightweight entities. Ignore
their default virtual size -- all that matters is the amount of memory
they actually use. Do you think a less-threaded implementation could be
written that would perform similarly and use less memory? What about
scaling across large numbers of cores? Besides, you are talking about a
total of a handful of kB worth of potential overhead. I'll say again,
your time is better spent elsewhere.

David Abrahams

unread,
Feb 19, 2009, 8:26:32 PM2/19/09
to zfs-...@googlegroups.com

on Sun Feb 15 2009, Jeffrey Schiller <Jeffrey.Schiller-AT-gmail.com> wrote:

> So today I tried the "unlimited" trick and memory went down again. I
> suspect that with stacksize unlimited, the kernel is using a different
> algorithm for computing default stack size. I'll have to investigate.

One more data point: I currently have no ZFS pools set up. Just
starting zfs-fuse without 'ulimit -s unlimited' reserves 431m (and uses
19m). With the ulimit setting, it reserves 133m (and still uses 19m).

Paul Nowoczynski

unread,
Feb 19, 2009, 8:34:46 PM2/19/09
to zfs-...@googlegroups.com
Here's a data point that may be helpful in this discussion. I have
decoupled the zfs / dmu from fuse, I'm essentially running a different
front-end to zfs. Anyway, what I have a process which uses the dmu only
- no fuse. After processing several million zero-length files you can
see that the processes uses 1.2GB virt mem and 441MB rss.
paul

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+

COMMAND
11288 root 15 0 1186m 441m 1992 S 49 5.5 202:07.48 slashd

Reply all
Reply to author
Forward
0 new messages