Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

misc/50046: Slow rc multiuser boot in 7.0 and current

3 views
Skip to first unread message

fr...@phoenix.owl.de

unread,
Jul 12, 2015, 6:00:11 PM7/12/15
to
>Number: 50046
>Category: misc
>Synopsis: Slow rc multiuser boot in 7.0 and current
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: misc-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 12 22:00:00 +0000 2015
>Originator: Frank Wille
>Release: netbsd-7
>Organization:
NetBSD
>Environment:
NetBSD/amiga 7.0_RC1

>Description:
Since the workaround for PR 48714 the multiuser boot in 7.0 and current has become very slow, which especially affects smaller single-CPU platforms.

It has been reported on port-amiga (A3000 and A4000/68060) by myself:
http://mail-index.netbsd.org/port-amiga/2015/01/25/msg007942.html
On x68k (XM6i/68030) by Tetsuya Isaki:
http://mail-index.netbsd.org/port-amiga/2015/02/01/msg007951.html
And on Sparc (via qemu) by Arto Huusko:
http://mail-index.netbsd.org/netbsd-users/2015/07/12/msg016476.html

The above mentioned workaround introduces a background process in rc, which sends a "nop" every three seconds. Removing this passage from rc:

--- rc.orig 2015-01-16 22:17:31.000000000 +0100
+++ rc 2015-05-31 14:47:06.000000000 +0200
@@ -120,24 +120,6 @@
kill -0 $RC_PID >/dev/null 2>&1 || RC_PID=$$

#
- # As long as process $RC_PID is still running, send a "nop"
- # metadata message to the postprocessor every few seconds.
- # This should help flush partial lines that may appear when
- # rc.d scripts that are NOT marked with "KEYWORD: interactive"
- # nevertheless attempt to print prompts and wait for input.
- #
- (
- # First detach from tty, to avoid intercepting SIGINFO.
- eval "exec ${_rc_original_stdout_fd}<&-"
- eval "exec ${_rc_original_stderr_fd}<&-"
- exec </dev/null >/dev/null 2>&1
- while kill -0 $RC_PID ; do
- print_rc_metadata "nop"
- sleep 3
- done
- ) &

... makes booting noticable faster. In this case from 5 minutes to 4 minutes.
From /var/run/rc.log:

Original:
[/etc/rc starting at Sun May 31 13:27:54 CEST 2015]
...
[/etc/rc finished at Sun May 31 13:32:55 CEST 2015]

Without "nop":
[/etc/rc starting at Sun May 31 14:48:48 CEST 2015]
...
[/etc/rc finished at Sun May 31 14:52:59 CEST 2015]


There are probably other factors involved to make booting much slower
than under netbsd-6, but this one is unnecessary.

>How-To-Repeat:
Boot a slow single-CPU system into multiuser mode and compare the time it needs with netbsd-6.

>Fix:
Reverting the workaround for PR 48714 partly fixes it.


--
Posted automagically by a mail2news gateway at muc.de e.V.
Please direct questions, flames, donations, etc. to news-...@muc.de

Arto Huusko

unread,
Jul 13, 2015, 6:45:10 AM7/13/15
to
The following reply was made to PR misc/50046; it has been noted by GNATS.

From: Arto Huusko <arm...@gmail.com>
To: gnats...@netbsd.org
Cc:
Subject: Re: misc/50046
Date: Mon, 13 Jul 2015 13:43:07 +0300

Reverting revision 1.94 of etc/rc.subr fixes the slow boot.
The change causes a considerable amount of forking.
On a test system the number of forks right after boot
with rc.subr 1.94: 1080
with rc.subr 1.94 reverted: 280

Arto Huusko

Christos Zoulas

unread,
Jul 15, 2015, 10:34:47 AM7/15/15
to
On Jul 13, 10:45am, arm...@gmail.com (Arto Huusko) wrote:
-- Subject: Re: misc/50046

| From: Arto Huusko <arm...@gmail.com>
| To: gnats...@netbsd.org
| Cc:
| Subject: Re: misc/50046
| Date: Mon, 13 Jul 2015 13:43:07 +0300
|
| Reverting revision 1.94 of etc/rc.subr fixes the slow boot.
| The change causes a considerable amount of forking.
| On a test system the number of forks right after boot
| with rc.subr 1.94: 1080
| with rc.subr 1.94 reverted: 280

Can you please test the stdbuf change I posted on tech-userlevel (compile
and re-install libc), together with adding:

export STDBUF0=L
export STDBUF1=L
export STDBUF2=L

in the beginning of rc and reverting rc and rc.subr to pre-pinger changes?

Thanks,

christos

Christos Zoulas

unread,
Jul 15, 2015, 10:35:13 AM7/15/15
to
The following reply was made to PR misc/50046; it has been noted by GNATS.

Frank Wille

unread,
Jul 21, 2015, 11:02:10 AM7/21/15
to
Christos Zoulas wrote:

> Can you please test the stdbuf change I posted on tech-userlevel
> (compile and re-install libc), together with adding:
>
> export STDBUF0=L
> export STDBUF1=L
> export STDBUF2=L
>
> in the beginning of rc and reverting rc and rc.subr to pre-pinger
> changes?

I did some test runs today on an Amiga 1200, 68030/40MHz with 64MB. The
multiuser boot process takes here 8 minutes and 44 seconds, which was less
than half the time with NetBSD 6.


In the first tests I reverted /etc/rc to 1.167, but left the _rc_pid
definition in line 87 in. Otherwise booting will hang (it is needed by
rc.subr). Results ("new clib" includes your stdbuf changes):

old clib: 7:00 min.
new clib: 6:57 min.
new clib with STDBUFn=L: 6:58 min.
new clib with STDBUFn=U: 7:01 min.

The stdbuf change doesn't make a big difference here. But removing the "nop"
pinger code is already noticable.


In the next tests I reverted /etc/rc to 1.167 and /etc/rc.subr to 1.93, as
suggested by Arto Huusko:

old clib: 3:44 min.
new clib: 3:38 min.

So I can confirm that reverting /etc/rc.subr is the key. Would be great to
find a solution here.

--
Frank Wille

Frank Wille

unread,
Jul 21, 2015, 11:05:12 AM7/21/15
to
The following reply was made to PR misc/50046; it has been noted by GNATS.

From: Frank Wille <fr...@phoenix.owl.de>
To: Christos Zoulas <chri...@zoulas.com>,
gnats...@NetBSD.org,
misc-bu...@netbsd.org,
gnats...@netbsd.org,
netbs...@netbsd.org
Cc:
Subject: Re: misc/50046

Christos Zoulas

unread,
Jul 25, 2015, 10:30:08 AM7/25/15
to
On Jul 25, 2:25pm, mar...@duskware.de (Martin Husemann) wrote:
-- Subject: Re: port-arm/50087: all threaded programs crash on arm

Something is probably calling malloc() before pthread is initialized.
have you installed the debug sets? Why don't we see line numbers in libc?

christos

Christos Zoulas

unread,
Jul 27, 2015, 4:21:46 AM7/27/15
to
On Jul 27, 8:10am, chri...@zoulas.com (Christos Zoulas) wrote:
-- Subject: Re: port-arm/50087: all threaded programs crash on arm

I can't reproduce the failure on amd64. Could this be an evbarm specific
issue?

Nick Hudson

unread,
Jul 27, 2015, 6:04:53 AM7/27/15
to
On 07/27/15 09:21, Christos Zoulas wrote:
> On Jul 27, 8:10am, chri...@zoulas.com (Christos Zoulas) wrote:
> -- Subject: Re: port-arm/50087: all threaded programs crash on arm
>
> I can't reproduce the failure on amd64. Could this be an evbarm specific
> issue?
>
> christos
>
>
http://nxr.netbsd.org/xref/src/lib/libc/arch/arm/misc/arm_initfini.c#58

is this relevant?

Nick

Christos Zoulas

unread,
Jul 27, 2015, 6:31:11 AM7/27/15
to
On Jul 27, 9:31am, sk...@netbsd.org (Nick Hudson) wrote:
-- Subject: Re: port-arm/50087: all threaded programs crash on arm

Could be, but I added the code to amd64 and I could not reproduce the
failure.

christos
0 new messages