Superpages on amd64 FreeBSD 7.2-STABLE

Linda Messerschmidt

unread,

Nov 26, 2009, 10:15:27 AM11/26/09

to freebsd...@freebsd.org

We have a squid proxy process with very large memory requirements (10
- 20 GB) on a machine with 24GB of RAM.

Unfortunately, we have to rotate the logs of this process once per
day. When we do, it fork()s and exec()s about 16-20 child processes
as helpers. Since it's got this multi-million-entry page table,
that's a disaster, because it has to copy all those page table entries
for each child, then throw them out. This takes a couple of minutes
of 100% CPU usage, during which time the machine is pretty much
unresponsive.

Someone on the squid list suggested we try the new superpages feature
(vm.pmap.pg_ps_enabled) in 7.2. We did, and after some tuning, we got
it to work.

Here's some "sysctl vm.pmap" for a similar machine with 16GB of RAM
that does NOT have this setting enabled:

vm.pmap.pv_entry_count: 2307899
vm.pmap.pde.promotions: 0
vm.pmap.pde.p_failures: 0
vm.pmap.pde.mappings: 0
vm.pmap.pde.demotions: 0
vm.pmap.pv_entry_max: 4276871
vm.pmap.pg_ps_enabled: 0

Now, here is the machine that does have it, just prior to the daily
rotation mentioned above:

vm.pmap.pv_entry_count: 61361
vm.pmap.pde.promotions: 23123
vm.pmap.pde.p_failures: 327946
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 17848
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

So it obviously this feature makes a huge difference and is a
brilliant idea. :-)

My (limited) understanding is that one of the primary benefits of this
feature is to help situations like ours... a page table that's 512x
smaller can be copied 512x faster. However, in practice this doesn't
happen. It's like fork() breaks up the squid process into 4kb pages
again. Here's the same machine's entries just after rotation:

vm.pmap.pv_entry_count: 1908056
vm.pmap.pde.promotions: 23212
vm.pmap.pde.p_failures: 413171
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21470
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

So some 3,600 superpages spontaneously turned into 1,850,000 4k pages.

Once this happens, squid seems reluctant to use more superpages until
its restarted. We get a lot of p_failures and a slow-but-steady
stream of demotions. Here's the same machine just now:

vm.pmap.pv_entry_count: 2022786
vm.pmap.pde.promotions: 25281
vm.pmap.pde.p_failures: 996027
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21683
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

And a few minutes later:

vm.pmap.pv_entry_count: 2021556
vm.pmap.pde.promotions: 25331
vm.pmap.pde.p_failures: 1001773
vm.pmap.pde.mappings: 1641
vm.pmap.pde.demotions: 21684
vm.pmap.pv_entry_max: 7330186
vm.pmap.pg_ps_enabled: 1

(There were *no* p_failures or demotions in the several hours prior to
rotation.)

This trend continues... the pv_entry_count bounces up and down even
though memory usage is increasing, so it's like it's trying to recover
and convert things back (promotions), but it's having a lot of trouble
(p_failures).

It's not clear to me if this might be a problem with the superpages
implementation, or if squid does something particularly horrible to
its memory when it forks to cause this, but I wanted to ask about it
on the list in case somebody who understands it better might know
whats going on. :-)

This is on FreeBSD-STABLE 7.2 amd64 r198976M.

Thanks!
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hacke...@freebsd.org"

Ryan Stone

unread,

Nov 26, 2009, 10:35:14 AM11/26/09

to Linda Messerschmidt, freebsd...@freebsd.org

Is squid multithreaded? My first guess would be that you have one
thread forking off all of these processes while other threads are
still doing work and writing to different parts of the address space.
I don't know the details of the superpages implementation but I could
definitely see that this could be a problem. When the process is
forked off the vm layer has to mark all of the pages in the parent
process as copy-on-write. If there are other threads in squid writing
to memory in this time, the vm layer will have to allocate a new page
of memory for every page that squid writes to. This breaks up the
superpage(superpages must be physically contiguous in memory). I
don't know if the superpage implementation tries to alleviate this
situation at all.

Linda Messerschmidt

unread,

Nov 26, 2009, 10:47:58 AM11/26/09

to Ryan Stone, freebsd...@freebsd.org

On Thu, Nov 26, 2009 at 10:34 AM, Ryan Stone <rys...@gmail.com> wrote:
> Is squid multithreaded?

No, it isn't:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
75086 squid 1 4 0 12571M 12584M kqread 6 31:31 0.68% squid

Thanks!

Dag-Erling Smørgrav

unread,

Nov 26, 2009, 10:51:14 AM11/26/09

to Linda Messerschmidt, freebsd...@freebsd.org

Linda Messerschmidt <linda.mes...@gmail.com> writes:
> Unfortunately, we have to rotate the logs of this process once per
> day. When we do, it fork()s and exec()s about 16-20 child processes
> as helpers.

s/fork/vfork/ and you should be fine.

DES
--
Dag-Erling Smørgrav - d...@des.no

Dag-Erling Smørgrav

unread,

Nov 26, 2009, 10:52:14 AM11/26/09

to Linda Messerschmidt, freebsd...@freebsd.org

Dag-Erling Smørgrav <d...@des.no> writes:
> Linda Messerschmidt <linda.mes...@gmail.com> writes:
> > Unfortunately, we have to rotate the logs of this process once per
> > day. When we do, it fork()s and exec()s about 16-20 child processes
> > as helpers.
> s/fork/vfork/ and you should be fine.

..and you should look into replacing Squid with Varnish, of course.

james toy

unread,

Nov 26, 2009, 11:16:37 AM11/26/09

to Linda Messerschmidt, freebsd...@freebsd.org, Ryan Stone

Hi Linda,

vfork() should mitigate this -- i suggest replacing.

respectfully,

=jt

Linda Messerschmidt

unread,

Nov 26, 2009, 12:11:48 PM11/26/09

to freebsd...@freebsd.org

I think I was not clear with my message, I apologize.

I did not mean to suggest that we were asking for help solving a
problem with squid rotation. I provided that information as
background to discuss what we observed as a potential misbehavior in
the new VM superpages feature, in the hope that if there is a problem
with the new feature, we can help find/resolve it or, if this is
working as intended, hopefully gain some insight as to what's going
on.

Thanks!

krad

unread,

Nov 26, 2009, 12:39:17 PM11/26/09

to Linda Messerschmidt, freebsd...@freebsd.org

2009/11/26 Linda Messerschmidt <linda.mes...@gmail.com>

Im sure you will get a lot of lovely answers to this but best keep things
simple. WHy not just syslog it of to another server and offload all the
compression to that box. You could even back it with zfs nad do on the fly
gzip compression at the file system level, or use syslog-ng to do it. If you
are worried about zfs and bsd use (open)*solaris or another filesystem with
with inline compression

Daniel O'Connor

unread,

Nov 27, 2009, 12:42:51 AM11/27/09

to freebsd...@freebsd.org, Linda Messerschmidt, krad

On Fri, 27 Nov 2009, krad wrote:
> Im sure you will get a lot of lovely answers to this but best keep
> things simple. WHy not just syslog it of to another server and
> offload all the compression to that box. You could even back it with
> zfs nad do on the fly gzip compression at the file system level, or
> use syslog-ng to do it. If you are worried about zfs and bsd use
> (open)*solaris �or another filesystem with with inline compression

Or send squids logs to a small buffer process which you then HUP when
rotating logs.

Also, I don't really understand why squid would fork when you tell it to
rotate, seems like a design defect.

--
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
-- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C

signature.asc

Adrian Chadd

unread,

Nov 27, 2009, 3:23:34 AM11/27/09

to Daniel O'Connor, freebsd...@freebsd.org, Linda Messerschmidt, krad

There's a bunch of other random crap that may be going on relating to
the helper processes (eg rewriters, auth, etc) which may also be
restarted.

Anyway. The thread is about superpage demotion and copying, not what
Squid is or isn't doing in her configuration. :)

Adrian

2009/11/27 Daniel O'Connor <doco...@gsoft.com.au>:

Daniel O'Connor

unread,

Nov 27, 2009, 6:27:59 AM11/27/09

to Adrian Chadd, freebsd...@freebsd.org, Linda Messerschmidt, krad

On Fri, 27 Nov 2009, Adrian Chadd wrote:
> There's a bunch of other random crap that may be going on relating to
> the helper processes (eg rewriters, auth, etc) which may also be
> restarted.

OK.

> Anyway. The thread is about superpage demotion and copying, not what
> Squid is or isn't doing in her configuration. :)

Yeah I understand that but if you can avoid the huge problem with a deft
rearrangement that may help your production environment and give you
more time for a real solution :)

signature.asc

John Baldwin

unread,

Dec 9, 2009, 10:17:44 AM12/9/09

to freebsd...@freebsd.org, Linda Messerschmidt

On Thursday 26 November 2009 10:14:20 am Linda Messerschmidt wrote:
> It's not clear to me if this might be a problem with the superpages
> implementation, or if squid does something particularly horrible to
> its memory when it forks to cause this, but I wanted to ask about it
> on the list in case somebody who understands it better might know
> whats going on. :-)

I talked with Alan Cox some about this off-list and there is a case that can
cause this behavior if the parent squid process takes write faults on a
superpage before the child process has called exec() then it can result in
superpages being fragmented and never reassembled. Using vfork() should
prevent this from happening. It is a known issue, but it will probably be
some time before it is addressed. There is lower hanging fruit in other areas
in the VM that will probably be worked on first.

--
John Baldwin

Linda Messerschmidt

unread,

Dec 10, 2009, 9:03:59 AM12/10/09

to John Baldwin, freebsd...@freebsd.org

On Wed, Dec 9, 2009 at 9:07 AM, John Baldwin <j...@freebsd.org> wrote:
> There is lower hanging fruit in other areas
> in the VM that will probably be worked on first.

OK, as long as somebody who knows more than me knows whats going on,
that's good enough for me. :)

Thanks!

Bernd Walter

unread,

Dec 10, 2009, 9:51:25 AM12/10/09

to John Baldwin, freebsd...@freebsd.org, Linda Messerschmidt

On Wed, Dec 09, 2009 at 09:07:33AM -0500, John Baldwin wrote:
> On Thursday 26 November 2009 10:14:20 am Linda Messerschmidt wrote:
> > It's not clear to me if this might be a problem with the superpages
> > implementation, or if squid does something particularly horrible to
> > its memory when it forks to cause this, but I wanted to ask about it
> > on the list in case somebody who understands it better might know
> > whats going on. :-)
>
> I talked with Alan Cox some about this off-list and there is a case that can
> cause this behavior if the parent squid process takes write faults on a
> superpage before the child process has called exec() then it can result in
> superpages being fragmented and never reassembled. Using vfork() should
> prevent this from happening. It is a known issue, but it will probably be
> some time before it is addressed. There is lower hanging fruit in other areas
> in the VM that will probably be worked on first.

For me the whole threads puzzles me.
Especially because vfork is often called a solution.

Scenario A
Parent with super page
fork/exec
This problem can happen because there is a race.
The parent now has it's super pages fragmented permanently!?
the child throws away his pages because of the exec!?

Scenario B
Parent with super page
vfork/exec
This problem won't happen because the child has no pseudo copy of the
parents memory and then starts with a completely new map.

Scenario C
Parent with super page
fork/ no exec
The problem can happen because the child shares the same memory over
it's complete lifetime.
The parent can get it's super pages fragmented over time.

I don't see a use case for scenario A, because vfork is there since
over 16 years.
I use fork myself, because it is easier sometimes, but people writing
big programms such as squid should know better.
If squid doesn't use vfork they likely have a reason.
With scenario C I don't see how vfork can help, since this is not a legal
case for vfork.
I use quid myself, but don't know how it handles it's childs.
But isn't the whole story about such slave childs that they share memory
with the master? - How can vfork be solution for this case?
How can fragmentation of super pages be avoided at all?

I obviously don't have enough clue about this to understand those details.
Hope that someone can enlighten me.

--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

Linda Messerschmidt

unread,

Dec 10, 2009, 10:24:26 AM12/10/09

to freebsd...@freebsd.org

Also...

On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter <ti...@cicely7.cicely.de> wrote:
> I use fork myself, because it is easier sometimes, but people writing
> big programms such as squid should know better.
> If squid doesn't use vfork they likely have a reason.

Actually they are probably going to switch to vfork(). They were
previously not using it because they thought there was some ambiguity
about whether it was going to be around long term.

I actually am not a huge fan of vfork() since it stalls the parent
process until the child exec()'s.

To me, this case actually highlights why that's an issue. If the
explanation is that stuff is happening in the parent process between
fork() and the child's exec() causes the fragmentation, that's stuff
that would be deferred in a vfork() regime, with unknown potential
consequences. (At a minimum, decreased performance.)

But that's personal and largely uninformed opinion. :)

Nate Eldredge

unread,

Dec 10, 2009, 10:43:47 AM12/10/09

to Linda Messerschmidt, freebsd...@freebsd.org

On Thu, 10 Dec 2009, Linda Messerschmidt wrote:

> Also...
>
> On Thu, Dec 10, 2009 at 9:50 AM, Bernd Walter <ti...@cicely7.cicely.de> wrote:
>> I use fork myself, because it is easier sometimes, but people writing
>> big programms such as squid should know better.
>> If squid doesn't use vfork they likely have a reason.
>
> Actually they are probably going to switch to vfork(). They were
> previously not using it because they thought there was some ambiguity
> about whether it was going to be around long term.

Well, the worst that would likely happen to vfork() is it would become an
alias of fork(), and you'd be back to where you are now (or better if
fork() were fixed in the meantime). I'd be more worried about the
mysterious bugs which it's so easy to introduce with vfork() if you do
anything at all nontrivial before exec() and accidentally touch the
parent's memory.

What about using posix_spawn(3)? This is implemented in terms of
vfork(), so you'll gain the same performance advantages, but it avoids
many of vfork's pitfalls. Also, since it's a POSIX standard function, you
needn't worry that it will go away or change its semantics someday.

> I actually am not a huge fan of vfork() since it stalls the parent
> process until the child exec()'s.

If you're doing so much work between vfork() and exec() that this delay is
significant, then I would think you're really abusing vfork().

> To me, this case actually highlights why that's an issue. If the
> explanation is that stuff is happening in the parent process between
> fork() and the child's exec() causes the fragmentation, that's stuff
> that would be deferred in a vfork() regime, with unknown potential
> consequences. (At a minimum, decreased performance.)

Not necessarily. In the fork() case, presumably copy-on-write is to blame
for the fragmentation. In the vfork() case, there's no copy at all.

--

Nate Eldredge
na...@thatsmathematics.com

Christian Brueffer

unread,

Dec 10, 2009, 11:05:49 AM12/10/09

to Mel Flynn, freebsd...@freebsd.org, Linda Messerschmidt

On Mon, Dec 07, 2009 at 04:20:03PM +0100, Mel Flynn wrote:

> On Thursday 26 November 2009 18:11:10 Linda Messerschmidt wrote:
>
> > I did not mean to suggest that we were asking for help solving a
> > problem with squid rotation. I provided that information as
> > background to discuss what we observed as a potential misbehavior in
> > the new VM superpages feature, in the hope that if there is a problem
> > with the new feature, we can help find/resolve it or, if this is
> > working as intended, hopefully gain some insight as to what's going
> > on.
>

> I tend to agree with this, though I don't know the nitty gritty of the
> implementation, it seems that:
> a) superpages aren't copied efficiently (at all?) on fork and probably other
> workloads
> b) vfork is encouraged for memory intensive applications, yet:
> BUGS
> This system call will be eliminated when proper system sharing mechanisms
> are implemented. Users should not depend on the memory sharing semantics
> of vfork() as it will, in that case, be made synonymous to fork(2).
>

FYI, this comment has been removed a couple of weeks ago in HEAD and
the STABLE branches.

- Christian

--
Christian Brueffer ch...@unixpages.org brue...@FreeBSD.org
GPG Key: http://people.freebsd.org/~brueffer/brueffer.key.asc
GPG Fingerprint: A5C8 2099 19FF AACA F41B B29B 6C76 178C A0ED 982D

Adrian Chadd

unread,

Dec 10, 2009, 11:19:51 AM12/10/09

to Linda Messerschmidt, freebsd...@freebsd.org

Depending upon the IPC method being used, the fork() may be followed
with calls to socket() and connect(), which may take a while.

The main process will stall if you have a busy proxy and there's some
temporary shortage of something which makes connect() take longer than
usual, the main process will stall, potentially causing the shortage
to become worse.

2c,

Adrian
(With his (ex-, kinda) Squid hacker hat on.)

2009/12/10 Linda Messerschmidt <linda.mes...@gmail.com>:

John Baldwin

unread,

Dec 10, 2009, 1:34:53 PM12/10/09

to ti...@cicely.de, freebsd...@freebsd.org, Linda Messerschmidt

Actually, the fact that vfork() doesn't let the parent execute until the
child has called exec() also closes the race, as it were, and that was the
primary reason in my mind for saying that vfork() would prevent it.

> Scenario C
> Parent with super page
> fork/ no exec
> The problem can happen because the child shares the same memory over
> it's complete lifetime.
> The parent can get it's super pages fragmented over time.
>
> I don't see a use case for scenario A, because vfork is there since
> over 16 years.
> I use fork myself, because it is easier sometimes, but people writing
> big programms such as squid should know better.
> If squid doesn't use vfork they likely have a reason.
> With scenario C I don't see how vfork can help, since this is not a legal
> case for vfork.
> I use quid myself, but don't know how it handles it's childs.
> But isn't the whole story about such slave childs that they share memory
> with the master? - How can vfork be solution for this case?
> How can fragmentation of super pages be avoided at all?

In Linda's case it was a fork to run a log rotation binary, so it was A).
For C) I think you would want to map the pages MAP_SHARED (or use minherit(2)
with INHERIT_SHARE) in which case they would not be COW'd on fork() and you
would keep the superpages. Assuming that you explicitly want to share the
memory with your child processes you already have to do this to really do
sharing anyway.

--
John Baldwin

Robert Watson

unread,

Dec 12, 2009, 8:39:31 AM12/12/09

to Nate Eldredge, freebsd...@freebsd.org, Linda Messerschmidt

On Thu, 10 Dec 2009, Nate Eldredge wrote:

> What about using posix_spawn(3)? This is implemented in terms of vfork(),
> so you'll gain the same performance advantages, but it avoids many of
> vfork's pitfalls. Also, since it's a POSIX standard function, you needn't
> worry that it will go away or change its semantics someday.

Just as a note here: while we do posix_spawn(3) as a library function, Mac OS
X does it as a system call. As a result, they can implement certain spawn
flags that we can't, among others, the ability to have the newly created
process/image be suspended before its first instruction executes. This would
be very useful when debugging the runtime linker, among other things. On the
other hand, it's quite a complex kernel code path...

Robert N M Watson
Computer Laboratory
University of Cambridge

Alan Cox

unread,

Dec 12, 2009, 2:51:53 PM12/12/09

to ti...@cicely.de, freebsd...@freebsd.org, Linda Messerschmidt, a...@cs.rice.edu

I'm not sure how you are defining "problem". If we define "problem" as I
would, i.e., that "re-promotion can never occur", then Scenario C is not
a problem scenario, only Scenario A is.

The source of the problem in Scenario A is basically that we have two ways
of handling copy-on-write faults. Before the exec() occurs, copy-on-write
faults are handled as you might intuit from the name, a new physical copy is
made. If the entirety of the 2MB region is written to before the exec(),
then
this region will be promoted to a superpage. However, once the exec()
occurs,
copy-on-write faults are "optimized". Specifically, the kernel recognizes
that
the underlying physical page is no longer shared with the child and simply
restores write access to it. It is the combination of these two methods
that
effectively blocks re-promotion because the underlying 4KB physical pages
within a 2MB region are no longer contiguous.

In other words, once the first page within a region has been copied, you
have
a choice to make: Do you perform avoidable copies or do you abandon the
possibility of ever creating a superpage. The former has a significant
one-time cost and the latter has a small recurring cost. Not knowing how
much the latter will add up to, I chose the former. However, that choice
may change in time, particularly, if I find an effective heuristic for
choosing
between the two options.

Anyway, please keep trying superpages with large memory applications like
this. Reports like this help me to prioritize my efforts.

Regards,
Alan