Time-critical problem at Sun: exploding smbd memory usage

David Collier-Brown

unread,

Aug 20, 2001, 9:22:04 AM8/20/01

to

I have a problem with 2.0.10, which we're getting ready
to do a Sun-wide rollout with.
2.0.7 on either Solaris or the Qube, the second and all
subsequent SMBDs are small, and initially all of the same
size:

PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME
COMMAND
6992 guest-sh 13 0 2624 2624 2348 S 0 0.4 4.1 0:00 smbd
6706 root 0 0 1312 1312 788 S 0 0.0 2.0 0:01 smbd

2.0.10 on Solaris, on the other hand, has the smbd's RSS (resident
set size) growing when they're first started, and then they grow
more during use. We're primarily concerned with the large initial
growth in RSS as each additional daemon is started, as we may have
some quite large number of clients per server...

root[csh]@huey[22]# uname -a
SunOS huey 5.8 Generic_108528-03 sun4u sparc SUNW,Ultra-60

On first starting the daemons:

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
634 root 1 0 0 3112K 1552K sleep 0:00 0.00% smbd

When one connection is made:

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
667 root 1 58 0 4584K 3536K sleep 0:00 0.23% smbd
634 root 1 40 0 3112K 1576K sleep 0:00 0.00% smbd

after 2 connections:

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
671 root 1 58 0 4256K 1984K sleep 0:00 0.03% smbd
667 root 1 58 0 4584K 3640K sleep 0:00 0.02% smbd
634 root 1 47 0 3112K 1576K sleep 0:00 0.00% smbd

After 3 connections:

PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
667 root 1 58 0 4584K 3672K sleep 0:00 0.00% smbd
671 root 1 58 0 4584K 2656K sleep 0:00 0.00% smbd
681 root 1 58 0 4584K 2576K sleep 0:00 0.12% smbd
634 root 1 48 0 3112K 1576K sleep 0:00 0.03% smbd

This is a risk issue for us, because we sized the systems based
on masurements done with 2.0.7, then applied the bugfixes by rolling'
forward to 2.0.10.
It looks rather as if the fix in 2.0.9/10 has a memory leak:

I've looked at 2.0.10, but I don't see why there should be a
problem... could the authors of the security fixes help us out
with this, please?

--dave
--
David Collier-Brown, | Always do right. This will gratify
Performance & Engineering Team | some people and astonish the rest.
Americas Customer Engineering | -- Mark Twain
(905) 415-2849 | dav...@canada.sun.com

Kris Desjardins

unread,

Aug 20, 2001, 2:11:14 PM8/20/01

to

We ran samba 2.0.7 on Solaris 7 and had the size reach 28MB per process
(200+ processes) before I had to kill -9 the parent and let the children
eventually timeout and die. I upgraded to 2.0.10 and applied the patch
below from a previous discussion but had no apparent effect, the processes
are 10MB now and growing.

I also have another problem with the Windows 2000 backup utility and samba.
It seems when backing up 2GB+ of data to a file on a samba share with verify
on, the job will never stop and the "Remaining time" increases. Thanks.

BOOL reset_stat_cache( void )
{
static BOOL initialised;
if (!lp_stat_cache()) return True;

if (!initialised) {
initialised = True;
return hash_table_init( &stat_cache, INIT_STAT_CACHE_SIZE,
(compare_function)(strcmp));
}
hash_clear(&stat_cache);
return hash_table_init( &stat_cache, INIT_STAT_CACHE_SIZE,
(compare_function)(strcmp));
} /* reset_stat_cache */

----- Original Message -----
From: "David Collier-Brown" <dav...@canada.sun.com>
To: "Gerald Carter" <gca...@valinux.com>
Cc: <David.Col...@sun.com>; "Jeremy Allison" <jer...@varesearch.com>;
<to...@aus.sun.com>; <cr...@aus.sun.com>; <all...@sun.com>;
<samba-t...@samba.org>
Sent: Monday, August 20, 2001 1:41 PM
Subject: Re: Time-critical problem at Sun: exploding smbd memory usage

> Gerald Carter wrote:
> > Strange. There were no major changes between 2.0.7 and 2.0.10. Only
> > security fixes. Does the memory usage flatline out after a while?
> > I normally see 4.5Mb for the RSS for running smbd on Solaris 2.6 - 8.
>
> Yes, it flattens out, with a claimed size of 17MB, and
> RSS of 15 (if prstat is to be beleived!).

David Collier-Brown

unread,

Aug 20, 2001, 3:29:30 PM8/20/01

to

Gerald Carter wrote:

>
> On Mon, 20 Aug 2001, David Collier-Brown wrote:
>
> > Gerald Carter wrote:
> > > Strange. There were no major changes between 2.0.7 and 2.0.10. Only
> > > security fixes. Does the memory usage flatline out after a while?
> > > I normally see 4.5Mb for the RSS for running smbd on Solaris 2.6 - 8.
> >
> > Yes, it flattens out, with a claimed size of 17MB, and
> > RSS of 15 (if prstat is to be beleived!).
>

> Ewww.. Ummm....That's not good :-( Can you give any more details?
> smb.conf? Homes? # of shares? etc...

Four main shares: homes, printers, /net and /usr/dist

The globals are:
workgroup = AUS
netbios name = homer
server string = Samba %v, ITsamba v1.1 on %L
encrypt passwords = no
password level = 4
log level = 1
time server = Yes
os level = 200
max log size = 8000
preferred master = no
domain master = no
wins support = no
wins server = famine.aus
wins proxy = Yes
guest account = samba
local master = no
name resolve order = host wins bcast
dns proxy = yes
preserve case = yes
short preserve case = yes
default case = lower
printcap name = lpstat
printing = sysv
load printers = yes
security = user
socket options = TCP_NODELAY
lpq cache time = 0
map to guest = Bad User
interfaces = 127.0.0.1 hme0 hme1
bind interfaces only = yes

The process map looks like this:
# pmap 3886, where 3386 is the smbd serving my smbclient

3886: /opt/samba/sbin/smbd -D -l /var/log/samba/2001-08-17.log.smb
00010000 808K read/exec /opt/samba/sbin/smbd
the program's excutables
000E8000 240K read/write/exec /opt/samba/sbin/smbd
the progam's globals
00124000 13360K read/write/exec [ heap ]
this, as you might expect, is where the space
is used up.

FF000000 1024K read/write/exec/shared [ shmid=0x771 ]
a shared memory area

FF120000 24K read/exec /usr/lib/nss_files.so.1
FF136000 8K read/write/exec /usr/lib/nss_files.so.1
and a bunch of shared library code and global data...

FF140000 24K read/exec /usr/lib/nss_nis.so.1
FF156000 8K read/write/exec /usr/lib/nss_nis.so.1
FF160000 16K read/exec /usr/lib/nss_compat.so.1
FF174000 8K read/write/exec /usr/lib/nss_compat.so.1
FF180000 672K read/exec /usr/lib/libc.so.1
FF238000 24K read/write/exec /usr/lib/libc.so.1
FF23E000 8K read/write/exec /usr/lib/libc.so.1
FF250000 16K read/exec
/usr/platform/sun4u/lib/libc_psr.so.1
FF260000 16K read/exec /usr/lib/libmp.so.2
FF274000 8K read/write/exec /usr/lib/libmp.so.2
FF280000 552K read/exec /usr/lib/libnsl.so.1
FF31A000 32K read/write/exec /usr/lib/libnsl.so.1
FF322000 32K read/write/exec /usr/lib/libnsl.so.1
FF330000 8K read/write/exec [ anon ]
FF350000 40K read/exec /usr/lib/libsocket.so.1
FF36A000 8K read/write/exec /usr/lib/libsocket.so.1
FF370000 8K read/exec /usr/lib/libsec.so.1
FF382000 8K read/write/exec /usr/lib/libsec.so.1
FF390000 8K read/exec /usr/lib/libdl.so.1
FF3A0000 8K read/write/exec [ anon ]
FF3B0000 136K read/exec /usr/lib/ld.so.1
FF3E2000 8K read/write/exec /usr/lib/ld.so.1
FFBEA000 24K read/write/exec [ stack ]
total 17136K

The parent smbd is almost the same, but without the
/usr/lib/nss_nis.so.1
and /usr/lib/nss_compat.so.1 libraries and no shared memory area.

We have enough swap and real memory that a workaround is
possible, but the growth is disquieting (;-))

Keith Farrar

unread,

Aug 20, 2001, 6:29:22 PM8/20/01

to

It's not just the number of printers, it's the total number of shares. We
have no printers defined, but lots of disk shares (roughly 900 on one box
and 1500 on a second host). The servers are Sun E450s, but the same type
of growth pattern occurs on Linux (redhat 7.1 x86).

The smbd processes start up at 6 MB each, then grow until killed by
process limits (currently 20 MB). Max observed growth is 115 MB... within
an hour.

The growth was much slower under 2.0.7, but happens quickly under 2.0.10+
and 2.2 .

It's not too hard to test: create an smb.conf file with 2000 static shares
(squirt it out with a script & reuse the same directory path). Then watch
the memory growth. Someone with familiarity with the code, access to a
memory leak finder, and a good debugging environment should take a look at
this (i.e. not me :).

-kaf

> Date: Mon, 20 Aug 2001 12:18:17 PDT
> From: David Collier-Brown <dav...@canada.sun.com>
> Reply-To: David.Col...@sun.com
> To: Gerald Carter <gca...@valinux.com>
> Cc: Kris Desjardins <kris_de...@hotmail.com>,
> David.Col...@sun.com, Jeremy Allison <jer...@valinux.com>,
> to...@aus.sun.com, cr...@aus.sun.com, all...@sun.com,
> samba-t...@samba.org

> Subject: Re: Time-critical problem at Sun: exploding smbd memory usage
>

> Gerald Carter wrote:

> >
> > On Mon, 20 Aug 2001, Kris Desjardins wrote:
> >
> > > We ran samba 2.0.7 on Solaris 7 and had the size reach 28MB per
> > > process (200+ processes) before I had to kill -9 the parent and let
> > > the children eventually timeout and die. I upgraded to 2.0.10 and
> > > applied the patch below from a previous discussion but had no apparent
> > > effect, the processes are 10MB now and growing.
> >

> > Can you track this down to either printing or file sharing?
>
> Hmmn, good thought...
>
> This look suspicious, we have a whole whack
> of printers! 294 of them????
>
> I remember some printer issues in 2.0, and
> the amount of space they tale, I wonder if this
> is what's biting us today...
>
> --dave
>

Michael E Osborne

unread,

Aug 20, 2001, 8:30:59 PM8/20/01

to

I don't know if this is related, but we once experienced a massive (10x)
increase in smbd (2.0.7) size which we eventually tracked down to a problem in
the smb.conf (a bad include statement) where we were causing the smb.conf to be
parsed over and over at smbd startup. This was under AIX 4.3.3. Eliminating the
re-parsing fixed the ballooning.

jer...@valinux.com on 08/20/2001 12:17:57 PM

To: far...@parc.xerox.com
cc: David.Col...@sun.com, Gerald Carter <gca...@valinux.com>, Kris
Desjardins <kris_de...@hotmail.com>, to...@aus.sun.com,
cr...@aus.sun.com, all...@sun.com, samba-t...@samba.org (bcc: Michael
E Osborne/JACADS/REC)

Subject: Re: Time-critical problem at Sun: exploding smbd memory usage

Keith Farrar wrote:
>
> It's not just the number of printers, it's the total number of shares. We
> have no printers defined, but lots of disk shares (roughly 900 on one box
> and 1500 on a second host). The servers are Sun E450s, but the same type
> of growth pattern occurs on Linux (redhat 7.1 x86).
>
> The smbd processes start up at 6 MB each, then grow until killed by
> process limits (currently 20 MB). Max observed growth is 115 MB... within
> an hour.
>
> The growth was much slower under 2.0.7, but happens quickly under 2.0.10+
> and 2.2 .
>
> It's not too hard to test: create an smb.conf file with 2000 static shares
> (squirt it out with a script & reuse the same directory path). Then watch
> the memory growth. Someone with familiarity with the code, access to a
> memory leak finder, and a good debugging environment should take a look at
> this (i.e. not me :).

Can you trigger growth by touching the smb.conf and
then hitting an smbd with a SIGHUP ?
If so, then it's smb.conf parse related.....

Jeremy.

Kevin (HxPro) Wheatley

unread,

Aug 21, 2001, 4:01:40 AM8/21/01

to

Keith Farrar wrote:
>
> Nope. Growth occurs without any include statements.
>
> Try the sample smb.conf file (create /var/tmp/smbshare directory), and
> watch the results of polling with "smbclient -L " (11 MB smbd on redhat,
> kernel 2.4.2-2smp).
>
> The smbd process also grows slowly with cycles of 'touch smb.conf; kill
> HUP $pid'.
>

running 2.2.1a on IRIX appears to behave badly when you touch smb.conf.
Usually it fills up the disks, I think its not limiting itself to the
max log size limit, so far it's only managed to happen during the middle
of the night so I'm not 100% on this. If you don't touch the file then
it runs fine for weeks at a time.

This is on a PDC. I think the running smbds keep writing to the old
rotated logfile smbds created after the 'touch' write to the new file,
this is unconfirmed but certainly the old file continues to grow.

Kevin

David Collier-Brown

unread,

Aug 21, 2001, 8:25:29 AM8/21/01

to

Michael E Osborne wrote:
> I don't know if this is related, but we once experienced a massive (10x)
> increase in smbd (2.0.7) size which we eventually tracked down to a problem in
> the smb.conf (a bad include statement) where we were causing the smb.conf to be
> parsed over and over at smbd startup. This was under AIX 4.3.3. Eliminating the
> re-parsing fixed the ballooning.

The standadrd smb.conf file contains an include
directive for local-smb.conf, which is usually
a file which contains just a comemnt...

I tries touching the smb.conf file and sending a
HUP to the smbd process, (thanks, Jeremy) while
watching it with prstat -p 26111, and the size
headed up to a total size of 29MB and an RSS of 27
over about 80-odd HUPs.

Exiting and restating smbclient created an smbd
with a 29/27MB size, so the problem behavior hasn't
changed.

Comemnting out the printers and repeating the touch/kill
loop caused no increase at all over about 500 touch/HUPs.

There was no effect wit/without the include option.

Looks like a printers issue, whish was a known problem
in the 2.0 timeframe. We had not expected this, as the
previous tests with 2.0.7, on both Solaris and Cobalt,
did not show this growth.

Kris Desjardins

unread,

Aug 21, 2001, 1:40:50 PM8/21/01

to

----- Original Message -----
From: "Gerald Carter" <gca...@valinux.com>
To: "Kris Desjardins" <kris_de...@hotmail.com>
Cc: <David.Col...@sun.com>; "Jeremy Allison" <jer...@valinux.com>;
<to...@aus.sun.com>; <cr...@aus.sun.com>; <all...@sun.com>;
<samba-t...@samba.org>

Sent: Monday, August 20, 2001 2:55 PM
Subject: Re: Time-critical problem at Sun: exploding smbd memory usage

> On Mon, 20 Aug 2001, Kris Desjardins wrote:
>
> > We ran samba 2.0.7 on Solaris 7 and had the size reach 28MB per
> > process (200+ processes) before I had to kill -9 the parent and let
> > the children eventually timeout and die. I upgraded to 2.0.10 and
> > applied the patch below from a previous discussion but had no apparent
> > effect, the processes are 10MB now and growing.
>
> Can you track this down to either printing or file sharing?
>

We have over 400 file shares and no printer shares setup.

# Global parameters
[global]
workgroup = TEST
netbios name = TESTER
netbios aliases = TESTER-46
server string = Tester
interfaces = zrl0 zrl1 hme0 lo0
bind interfaces only = YES
security = domain
encrypt passwords = Yes
password server = *
restrict anonymous = no
debug level = 1
log file = /usr/local/samba/var/log.%m
max log size = 1000
load printers = No
local master = No
dns proxy = No
wins server = xxx.xxx.xxx.xxx
create mask = 0700
hosts allow = xxx.xxx.xxx. xxx.xxx.xx. xxx.xxx.xxx.
strict locking = yes
deadtime = 900
keepalive = 3600

# Access to home directories

[homes]
comment = Home Directories
writeable = Yes
browseable = No

[im]
browseable = yes
guest ok = no
writable = yes
path = /im
comment = xxxx:/im

and many more like the following

[test$]
path = /test/udata1/test
valid users = test
writeable = yes
browseable = no

Wagner Guenter

unread,

Aug 21, 2001, 1:46:08 PM8/21/01

to

We have a problem, I think it may be related.

With Samba 2.0.7 AND ALSO with Samba 2.0.10
after each touch of smb.conf smbd increase about 2MB
but ONLY if we configure printers (70 pcs).

We use SuSE Linux 6.4 with Samba 2.0.7
and SuSE Linux 7.2 with Samba 2.0.10

Günter Wagner
g.wa...@mkg-bank.de

Gerald Carter

unread,

Aug 21, 2001, 9:55:24 PM8/21/01

to

On Tue, 21 Aug 2001, Richard Bollinger wrote:

> Try the attached patch to fix the printer related leakage on Solaris.
> We also have about 300 printers.

Thanks! I've merge the missing lp_talloc_free() into
lp_add_one_printer() for 2.2 and HEAD.

cheers, jerry
---------------------------------------------------------------------
www.valinux.com VA Linux Systems gcarter_at_valinux.com
www.samba.org SAMBA Team jerry_at_samba.org
www.plainjoe.org jerry_at_plainjoe.org
--"I never saved anything for the swim back." Ethan Hawk in Gattaca--

David Collier-Brown

unread,

Aug 22, 2001, 8:01:02 AM8/22/01

to

| post patch:
| PID USERNAME SIZE RSS STATE PRI NICE TIME CPU
PROCESS/NLWP
| 9097 root 3568K 1752K sleep 21 0 0:00.00 0.0% smbd/1

| I had dome some earlier testing and found that before applying the
| patch, the size of the daemon was determined by the number of
shares.

Thanks, team: this is much more sane than before!

Richard Bollinger

unread,

Aug 23, 2001, 8:48:01 AM8/23/01

to

Funny... I don't see the change in 2_2 CVS... did you really apply it?

Ahhh I see... Jeremy took it back out. Nice of him to do so, but I think he's wrong. The "main
loop" doesn't clean things up after each printer is added when we're using a [printers] clause in
smb.conf. That loop occurs inside pcap_printer_fn. Per my testing, this only seems to make a
difference on Solaris. Maybe fragmentation is occuring inside their malloc() free()?

Rich B

----- Original Message -----
From: "Gerald Carter" <gca...@valinux.com>
To: "Richard Bollinger" <rabol...@home.com>
Cc: <David.Col...@sun.com>; "Michael E Osborne" <mosb...@jacads.com>; <jer...@valinux.com>;
<far...@parc.xerox.com>; "Kris Desjardins" <kris_de...@hotmail.com>; <to...@aus.sun.com>;
<cr...@aus.sun.com>; <all...@sun.com>; <samba-t...@samba.org>
Sent: Tuesday, August 21, 2001 9:56 PM
Subject: Re: Time-critical problem at Sun: exploding smbd memory usage

David Collier-Brown

unread,

Aug 23, 2001, 9:19:53 AM8/23/01

to

Richard Bollinger wrote:
> Per my testing, this only seems to make a
> difference on Solaris. Maybe fragmentation is occuring inside their malloc() free()?

Odd that it's Solaris-specific: probably Linux
does something Elegant and Simple (:-))

--dave (Solaris Bigot!) c-b

David Collier-Brown

unread,

Aug 23, 2001, 12:50:48 PM8/23/01

to

Jeremy Allison wrote:

> The talloc delete in the main loop should free
> any memory allocated in the tallocs inside the
> printer allocation. Why does this cause the RSS
> to grow on Solaris ?
>
> insure on Linux does not flag this as a alloc
> bug (and believe me, it would....).

Ok, lets's look at this when the conference
is over: it may be a subtle Solaris bug,
and I like to find those.

--dave

Jeremy Allison

unread,

Aug 23, 2001, 12:42:41 PM8/23/01

to

Richard Bollinger wrote:
>
> Funny... I don't see the change in 2_2 CVS... did you really apply it?
>
> Ahhh I see... Jeremy took it back out. Nice of him to do so, but I think he's wrong. The "main
> loop" doesn't clean things up after each printer is added when we're using a [printers] clause in

> smb.conf. That loop occurs inside pcap_printer_fn. Per my testing, this only seems to make a

> difference on Solaris. Maybe fragmentation is occuring inside their malloc() free()?

I took it out as it's not safe to do that free
inside the printer loop. It's only safe to
do that talloc delete in the main loop, outside
of any incoming smb processing.

The talloc delete in the main loop should free
any memory allocated in the tallocs inside the
printer allocation. Why does this cause the RSS
to grow on Solaris ?

insure on Linux does not flag this as a alloc
bug (and believe me, it would....).

Jeremy.

Richard Bollinger

unread,

Aug 27, 2001, 3:48:21 AM8/27/01

to

This is a multi-part message in MIME format.

------=_NextPart_000_000F_01C12A1B.F3D52400
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 7bit

Try the attached patch to fix the printer related leakage on Solaris. We also have about 300
printers.

Rich Bollinger, Elliott Company

------=_NextPart_000_000F_01C12A1B.F3D52400
Content-Type: application/octet-stream;
name="fixleaks.patch"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment;
filename="fixleaks.patch"

*** ../samba-2.0.7/source.Linux/param/loadparm.c Fri Nov 17 09:24:27 2000=0A=
--- ./param/loadparm.c Sat Nov 18 00:46:50 2000=0A=
***************=0A=
*** 2711,2716 ****=0A=
--- 2711,2718 ----=0A=
if ((i=3Dlp_servicenumber(name)) >=3D 0)=0A=
string_set(&iSERVICE(i).comment,comment);=0A=
}=0A=
+ /* free up temporary memory */=0A=
+ lp_talloc_free();=0A=
}=0A=
=0A=
=
/************************************************************************=
***=0A=
*** ../samba-2.0.7/source.Linux/smbd/server.c Thu Mar 16 17:59:52 2000=0A=
--- ./smbd/server.c Fri Nov 17 22:58:15 2000=0A=
***************=0A=
*** 183,188 ****=0A=
--- 183,191 ----=0A=
fd_set lfds;=0A=
int num;=0A=
=0A=
+ /* free up temporary memory */=0A=
+ lp_talloc_free();=0A=
+ =0A=
memcpy((char *)&lfds, (char *)&listen_set, =0A=
sizeof(listen_set));=0A=
=0A=

------=_NextPart_000_000F_01C12A1B.F3D52400--

tony shepherd

unread,

Aug 27, 2001, 3:56:29 AM8/27/01

to

Many thanks to all of you for help in tracking down the problem and
providing a fix. I have applied the patch and it appears to be
working. I will let you know if our testing over the next week or so
turn up any problems.

Again, thanks for this. I did not expect the fix to be so quick.

regards

tony

Richard Bollinger wrote:
>
> Try the attached patch to fix the printer related leakage on Solaris. We also have about 300
> printers.
>
> Rich Bollinger, Elliott Company
>

> ----------------------------------------------------------------------
> Name: fixleaks.patch
> fixleaks.patch Type: unspecified type (application/octet-stream)
> Encoding: quoted-printable