Re: ndsd is dead and ncp volumes no longer available

Thomas Reiß

unread,

Jan 30, 2010, 6:07:52 AM1/30/10

to

andradac schrieb:
> Hello,
>
> I have seen a couple of posts where ndsd crashes and ncp volumes are no
> longer available but I have not seen a post that came to a solution...
> Hopefully someone can help me out...
>
> I am going to apologize for the lengthy post ahead of time! :)
>
> We have 3 node cluster consisting of three OES2 SP2 on SLES10 SP3
> servers. One of the cluster resources (ncs_data_home) is a resource
> that has all of our student home directories. Consistently the node
> who hosts that node will have ndsd crash. We then have to restart
> ndsd.
>
> The ncs_data_home cluster resource has over 30,000 home directories
> under a users folder.
>
> We started out with 4GB of memory in each cluster node and found that
> ndsd would crash at least once every 2 days on the node that serviced
> home directories.
>
> Increasing the memory for each node to 8GB of memory for each cluster
> node would yeild ndsd crashing probably once anywhere from 3-5 days.
>
> I have tried NCP tuning by increasing MAXIMUM_CACHED_FILES_PER_VOLUME,
> and MAXIMUM_CACHED_SUBDIRECTORIES.
>
> I have tried nss tuning, as per the tuning guide from novell
> documentation.
>
> I have tried edir tuning by increasing the database cache configuration
> to a hard limit that roughly matches the size of the dib set.
>
> All with no avail.
>
> We can cause ndsd to crash logging in as a admin level account or an
> account that has RF to the directory that contains the home directories
> and by using windows explorer to display the home directories listing.
>
> Sometime I manage to bring up all the directories in windows. It can
> take up to 25 minutes to bring up the explorer window with all the
> directories.
>
> When I do this, ndsd and ndpapp processes and memory show high
> utilization. Monitoring the process ndapp show zombie statuses.
>
> Some other background info
> -the cluster nodes are IBM HS21 blades with two quad core 2.33 Ghz Xeon
> procs connected to a IBM DS4700 using 4Gbp fc adaptors (no
> multipathing).
> -even though there are 30000+ home directories, the file space usage is
> under 50GB total with a total number file count of under 400,000, since
> not all students utilize their student computer account.
> -we have not LUM enabled our students accounts.
> -we use novell storage manager to manage student home directory
> quotas.
> -cluster nodes hold replicas of partitions that hold cluster, server
> and volume and user objects.
> -cluster nodes are lum enabled.
> -we were experiencing the same problems before we updated our servers
> from OES2 sp1 on SLES10.2.
>
> I did a test of migrating a copy of the student home directories to a
> single OES2 sp2 on SLES10.3 server and found ndsd still crashes.
> I performed a test whereby I gradually increasded the number of student
> home directories under the \\servername\home\users directory and timed
> how long it took windows explorer to display the directory listing.
>
> 1000-2000 directories = a few seconds
> 5,000 directories = 30 seconds or so
> 10,000 directories = 5 minutes
> 15,000 directories = ~15 minutes
> 20,000+ directories = ~20+ minutes to complete or get an not accessible
> error message which says I do not have permissions to use the network
> resource / the specified network name is no longer available or ndsd
> crashes.
>
[...]

Interesting...not really a Idea, but

- Did this happen also when you list the directory on a DOS Commandline?
- Did this happen also when you map the \\servername\home directory to a
drive letter?
- Did this happen also when you map with this syntax
MAP ROOT I:=.CN=cluster_server.O=ORGNAME:\home\users
- did you use the latest Novell Client?

Sorry, only some Ideas
Thomas

Alex Warmerdam

unread,

Jan 30, 2010, 9:03:37 AM1/30/10

to

On Sat, 30 Jan 2010 06:26:01 GMT, andradac
<andr...@no-mx.forums.novell.com> wrote:

Hi,

> Anyway, hopefully someone can lend some insight before I have to log a
> call with Novell.

Please take a look at your ndsd thread count, it seems that the
default 64 can kill everything. We are using 192.

Next, your running cluster and have them virtualized.
If you are using a netapp, take a look at the cache of the netapp. It
can't handle large amount of small files very well.

And can you tell more about the complete setup, 64B or 32B patch
level. bus configuration of pci level (shared bus or not) and so on.

warper2

unread,

Feb 1, 2010, 3:43:46 PM2/1/10

to

Alex Warmerdam wrote:

Alex

Can you point me to where to set this thread count.

Thanks

Alex Warmerdam

unread,

Feb 1, 2010, 4:27:26 PM2/1/10

to

On Mon, 01 Feb 2010 20:43:46 GMT, warper2
<rd...@rdbnetsolutionsdot.com> wrote:

Hi,

> Alex
>
> Can you point me to where to set this thread count.
>
> Thanks

To display current threads, use:

ndstrace -c threads

Maximum threads for the ndsd thread pool can be increased by adding
the following line to the /etc/nds.conf:

n4u.server.max-threads=##

or

ndsconfig set n4u.server.max-threads=##

##=The new number for the max-threads setting

A restart of ndsd is required for the setting to take effect.
Example on Solaris & Linux:
/etc/init.d/ndsd restart

If you are running cluster services, reboot the machine :(

Hans van den Heuvell

unread,

Feb 3, 2010, 2:42:06 AM2/3/10

to

On Wed, 03 Feb 2010 04:16:01 +0000, andradac wrote:

>
> I increased UDEVD_MAX_CHILDS to 128 from 64 and UDEVD_MAX_CHILDS_RUNNING
> to 32 from 16
>
> This seemed to help a little.
>
> Cheers,
> Carlos

Carlos,

The higher UDEV settings should only be needed to satisfy specific
dependencies at server startup.

See TID 7004877 for information on how to properly calculate UDEV
settings.

Regards
Hans

Hans van den Heuvell

unread,

Feb 3, 2010, 3:35:47 AM2/3/10

to

On Sat, 30 Jan 2010 06:26:01 +0000, andradac wrote:

> I have tried NCP tuning by increasing MAXIMUM_CACHED_FILES_PER_VOLUME,
> and MAXIMUM_CACHED_SUBDIRECTORIES.
>

Carlos,

Have you followed the NCP tuning suggestions from TID 7004888 for this ?

Regards
Hans

Arjan Nolle

unread,

Feb 3, 2010, 5:27:21 AM2/3/10

to

Hi *,

We have similar issues as you described, thinks are "looking" better
after we patched the cluster with the latest updates
(http://download.novell.com/Download?buildid=KH1eJeZOqDM) available in
the update channel.

But we keep fingers crossed ...

with regards,
Arjan Nolle

On 1/30/2010 7:26 AM, andradac wrote:
>
> Hello,
>
> I have seen a couple of posts where ndsd crashes and ncp volumes are no
> longer available but I have not seen a post that came to a solution...
> Hopefully someone can help me out...
>
> I am going to apologize for the lengthy post ahead of time! :)
>
> We have 3 node cluster consisting of three OES2 SP2 on SLES10 SP3
> servers. One of the cluster resources (ncs_data_home) is a resource
> that has all of our student home directories. Consistently the node
> who hosts that node will have ndsd crash. We then have to restart
> ndsd.
>
> The ncs_data_home cluster resource has over 30,000 home directories
> under a users folder.
>
> We started out with 4GB of memory in each cluster node and found that
> ndsd would crash at least once every 2 days on the node that serviced
> home directories.
>
> Increasing the memory for each node to 8GB of memory for each cluster
> node would yeild ndsd crashing probably once anywhere from 3-5 days.
>

> I have tried NCP tuning by increasing MAXIMUM_CACHED_FILES_PER_VOLUME,
> and MAXIMUM_CACHED_SUBDIRECTORIES.
>

> This is on a test server matches a cluster node that is also running
> that has 6GB of memory and I am the only user that is accessing it.
>
> I have ruled out the cluster being the problem. I am leaning towards a
> problem edir have a problem keeping track of the trustee info.

>
> Anyway, hopefully someone can lend some insight before I have to log a
> call with Novell.
>

> Cheers,
> Carlos
>
>

Alex Warmerdam

unread,

Feb 3, 2010, 5:22:40 PM2/3/10

to

On Mon, 01 Feb 2010 20:43:46 GMT, warper2
<rd...@rdbnetsolutionsdot.com> wrote:

Hi,

The updates from last sunday do help your case, by the looks of it.

Alex Warmerdam

unread,

Feb 3, 2010, 5:24:13 PM2/3/10

to

On Tue, 02 Feb 2010 13:46:02 GMT, elagrew
<ela...@no-mx.forums.novell.com> wrote:

Yup.

But i'm always setting it with ndsconfig.
>
> Actually, don't you mean:
>
> /etc/opt/novell/eDirectory/conf/nds.conf ?
>
> --El