tty problems in recent head?

Giorgos Keramidas

unread,

Nov 28, 2008, 1:55:15 AM11/28/08

to

I just restored my laptop after a bit of 'fun' with a broken disk, and
rebuilt all my ports. Something in head/ @ svn rev 185370 seems to
cause problems to screen & xterm.

Exiting an xterm window causes xterm processes to be stuck in 'RUN' and
consume a lot of CPU:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
11211 keramida 1 106 0 7624K 4360K CPU0 0 0:46 51.86% xterm
11169 keramida 1 106 0 7624K 4504K RUN 1 1:12 49.66% xterm
11201 keramida 1 106 0 7624K 4360K RUN 1 0:47 49.07% xterm
11180 keramida 1 106 0 7624K 4360K RUN 1 1:07 48.88% xterm
...

_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Giorgos Keramidas

unread,

Nov 28, 2008, 2:06:50 AM11/28/08

to

On Fri, 28 Nov 2008 08:55:15 +0200, Giorgos Keramidas <kera...@ceid.upatras.gr> wrote:
> I just restored my laptop after a bit of 'fun' with a broken disk, and
> rebuilt all my ports. Something in head/ @ svn rev 185370 seems to
> cause problems to screen & xterm.
>
> Exiting an xterm window causes xterm processes to be stuck in 'RUN' and
> consume a lot of CPU:
>
> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
> 11211 keramida 1 106 0 7624K 4360K CPU0 0 0:46 51.86% xterm
> 11169 keramida 1 106 0 7624K 4504K RUN 1 1:12 49.66% xterm
> 11201 keramida 1 106 0 7624K 4360K RUN 1 0:47 49.07% xterm
> 11180 keramida 1 106 0 7624K 4360K RUN 1 1:07 48.88% xterm
> ...

Nevermind. This seems to be a problem only with xterm processes started
under XFCE4. Running under startx and plain 'twm' doesn't have the same
problem, so I'll have to look a bit more into this...

Giorgos Keramidas

unread,

Nov 28, 2008, 11:27:55 PM11/28/08

to

On Fri, 28 Nov 2008 09:06:50 +0200, Giorgos Keramidas <kera...@ceid.upatras.gr> wrote:
> On Fri, 28 Nov 2008 08:55:15 +0200, Giorgos Keramidas <kera...@ceid.upatras.gr> wrote:
>> I just restored my laptop after a bit of 'fun' with a broken disk, and
>> rebuilt all my ports. Something in head/ @ svn rev 185370 seems to
>> cause problems to screen & xterm.
>>
>> Exiting an xterm window causes xterm processes to be stuck in 'RUN' and
>> consume a lot of CPU:
>>
>> PID USERNAME THR PRI NICE SIZE RES STATE C TIME CPU COMMAND
>> 11211 keramida 1 106 0 7624K 4360K CPU0 0 0:46 51.86% xterm
>> 11169 keramida 1 106 0 7624K 4504K RUN 1 1:12 49.66% xterm
>> 11201 keramida 1 106 0 7624K 4360K RUN 1 0:47 49.07% xterm
>> 11180 keramida 1 106 0 7624K 4360K RUN 1 1:07 48.88% xterm
>> ...
>
> Nevermind. This seems to be a problem only with xterm processes started
> under XFCE4. Running under startx and plain 'twm' doesn't have the same
> problem, so I'll have to look a bit more into this...

The xterm processes that get stuck seem to be spinning near line 1854 of
sched_ule.c. Running `info threads' on a live kernel after xterm starts
spinning on a CPU shows:

129 Thread 100174 (PID=97493: xterm) sched_switch (td=0xc72fad80,
newtd=0xc7245000, flags=519) at /usr/src/sys/kern/sched_ule.c:1854

Since this part of sched_ule.c hasn't changed in a while

REV CHANGE AUTHOR
---------------------------------------------------------------------------------------------------
1848 171482 jeff cpu_switch(td, newtd, mtx);
1849 171482 jeff /*
1850 171482 jeff * We may return from cpu_switch on a different cpu. However,
1851 171482 jeff * we always return with td_lock pointing to the current cpu's
1852 171482 jeff * run queue lock.
1853 171482 jeff */
1854 171482 jeff cpuid = PCPU_GET(cpuid);
1855 171482 jeff tdq = TDQ_CPU(cpuid);
1856 174629 jeff lock_profile_obtain_lock_success(
1857 174629 jeff &TDQ_LOCKPTR(tdq)->lock_object, 0, 0, __FILE__, __LINE__);
1858 145256 jkoshy #ifdef HWPMC_HOOKS
1859 145256 jkoshy if (PMC_PROC_IS_USING_PMCS(td->td_proc))
1860 145256 jkoshy PMC_SWITCH_CONTEXT(td, PMC_FN_CSW_IN);
1861 145256 jkoshy #endif

any ideas why PCPU_GET() might spin like this?

Peter Wemm

unread,

Nov 28, 2008, 11:47:00 PM11/28/08

to

On Fri, Nov 28, 2008 at 8:27 PM, Giorgos Keramidas

It isn't. The 'info' command is misleading you. It is merely the
next instruction after returning from cpu_switch(). Something is
effectively in a yield loop.

--
Peter Wemm - pe...@wemm.org; pe...@FreeBSD.org; pe...@yahoo-inc.com; KI6FJV
"All of this is for nothing if we don't go to the stars" - JMS/B5
"If Java had true garbage collection, most programs would delete
themselves upon execution." -- Robert Sewell

Giorgos Keramidas

unread,

Nov 29, 2008, 10:51:31 AM11/29/08

to

On Fri, 28 Nov 2008 20:47:00 -0800, "Peter Wemm" <pe...@wemm.org> wrote:
> On Fri, Nov 28, 2008 at 8:27 PM, Giorgos Keramidas
> <kera...@ceid.upatras.gr> wrote:
>> Since this part of sched_ule.c hasn't changed in a while
>>
>> REV CHANGE AUTHOR
>> ---------------------------------------------------------------------------------------------------
>> 1848 171482 jeff cpu_switch(td, newtd, mtx);
>> 1849 171482 jeff /*
>> 1850 171482 jeff * We may return from cpu_switch on a different cpu. However,
>> 1851 171482 jeff * we always return with td_lock pointing to the current cpu's
>> 1852 171482 jeff * run queue lock.
>> 1853 171482 jeff */
>> 1854 171482 jeff cpuid = PCPU_GET(cpuid);
>> 1855 171482 jeff tdq = TDQ_CPU(cpuid);
>> 1856 174629 jeff lock_profile_obtain_lock_success(
>> 1857 174629 jeff &TDQ_LOCKPTR(tdq)->lock_object, 0, 0, __FILE__, __LINE__);
>> 1858 145256 jkoshy #ifdef HWPMC_HOOKS
>> 1859 145256 jkoshy if (PMC_PROC_IS_USING_PMCS(td->td_proc))
>> 1860 145256 jkoshy PMC_SWITCH_CONTEXT(td, PMC_FN_CSW_IN);
>> 1861 145256 jkoshy #endif
>>
>> any ideas why PCPU_GET() might spin like this?
>
> It isn't. The 'info' command is misleading you. It is merely the
> next instruction after returning from cpu_switch(). Something is
> effectively in a yield loop.

Thanks. I went back to 2008-11-16 15:45 +0000 and I can still see xterm
processes stuck after a while. Two potentially useful bits are:

1. My `/etc/make.conf' contained after the last restore from backups:

: # Are these two really safe?
: CFLAGS?= -O2 -fno-strict-aliasing -pipe
: COPTFLAGS?= -O -pipe
:
: #NO_CPU_CFLAGS= # Don't add -march=<cpu> to CFLAGS automatically
: #NO_CPU_COPTFLAGS= # Don't add -march=<cpu> to COPTFLAGS automatically

I commented both out, to see if it changes things. If the same happens
without -O2 optimizations, I'll keep going backwards to see if I can
locate the commit that this started happening.

Giorgos Keramidas

unread,

Nov 29, 2008, 11:54:26 AM11/29/08

to

Heh, interesting... GENERIC from /head@185376 compiled with a
`make.conf' that disables optimizations works fine so far:

# CFLAGS?= -O2 -fno-strict-aliasing -pipe
# COPTFLAGS?= -O -pipe

NO_CPU_CFLAGS= # Don't add -march=<cpu> to CFLAGS automatically

NO_CPU_COPTFLAGS= # Don't add -march=<cpu> to COPTFLAGS automatically

_______________________________________________

Garrett Cooper

unread,

Nov 29, 2008, 7:02:03 PM11/29/08

to

On Sat, Nov 29, 2008 at 8:54 AM, Giorgos Keramidas

<kera...@ceid.upatras.gr> wrote:
> On Sat, 29 Nov 2008 17:51:31 +0200, Giorgos Keramidas <kera...@ceid.upatras.gr> wrote:
>> On Fri, 28 Nov 2008 20:47:00 -0800, "Peter Wemm" <pe...@wemm.org> wrote:
>> without -O2 optimizations, I'll keep going backwards to see if I can
>> locate the commit that this started happening.
>
> Heh, interesting... GENERIC from /head@185376 compiled with a
> `make.conf' that disables optimizations works fine so far:
>
> # CFLAGS?= -O2 -fno-strict-aliasing -pipe
> # COPTFLAGS?= -O -pipe
> NO_CPU_CFLAGS= # Don't add -march=<cpu> to CFLAGS automatically
> NO_CPU_COPTFLAGS= # Don't add -march=<cpu> to COPTFLAGS automatically

Might be another compiler issue. There's an issue with
-fstrict-aliasing (at least) on all versions of g++ up to 4.2.3:
<http://lists.copyleft.no/pipermail/pyrex/2007-November/003071.html>.
Are there some compiler warnings when compiling kernel / screen?
Also, have you tried with just COPTFLAGS disabled?
-Garrett

Garrett Cooper

unread,

Nov 29, 2008, 7:03:40 PM11/29/08

to

On Sat, Nov 29, 2008 at 4:02 PM, Garrett Cooper <yane...@gmail.com> wrote:
> On Sat, Nov 29, 2008 at 8:54 AM, Giorgos Keramidas
> <kera...@ceid.upatras.gr> wrote:
>> On Sat, 29 Nov 2008 17:51:31 +0200, Giorgos Keramidas <kera...@ceid.upatras.gr> wrote:
>>> On Fri, 28 Nov 2008 20:47:00 -0800, "Peter Wemm" <pe...@wemm.org> wrote:
>>> without -O2 optimizations, I'll keep going backwards to see if I can
>>> locate the commit that this started happening.
>>
>> Heh, interesting... GENERIC from /head@185376 compiled with a
>> `make.conf' that disables optimizations works fine so far:
>>
>> # CFLAGS?= -O2 -fno-strict-aliasing -pipe
>> # COPTFLAGS?= -O -pipe
>> NO_CPU_CFLAGS= # Don't add -march=<cpu> to CFLAGS automatically
>> NO_CPU_COPTFLAGS= # Don't add -march=<cpu> to COPTFLAGS automatically
>
> Might be another compiler issue. There's an issue with
> -fstrict-aliasing (at least) on all versions of g++ up to 4.2.3:
> <http://lists.copyleft.no/pipermail/pyrex/2007-November/003071.html>.
> Are there some compiler warnings when compiling kernel / screen?
> Also, have you tried with just COPTFLAGS disabled?
> -Garrett

Actually, this may be the culprit:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35643

Giorgos Keramidas

unread,

Nov 30, 2008, 9:08:49 AM11/30/08

to

On Fri, 28 Nov 2008 20:47:00 -0800, "Peter Wemm" <pe...@wemm.org> wrote:

>> The xterm processes that get stuck seem to be spinning near line 1854 of
>> sched_ule.c. Running `info threads' on a live kernel after xterm starts
>> spinning on a CPU shows:

After a bit of help from kib@ and ed@ this seems to be wrong. The xterm
processes were spinning in userspace, calling ioctl(fd, FIONREAD, ...).

On Sat, 29 Nov 2008 16:02:03 -0800, "Garrett Cooper" <yane...@gmail.com> wrote:
> Might be another compiler issue. There's an issue with
> -fstrict-aliasing (at least) on all versions of g++ up to 4.2.3:
> <http://lists.copyleft.no/pipermail/pyrex/2007-November/003071.html>.
> Are there some compiler warnings when compiling kernel / screen?
> Also, have you tried with just COPTFLAGS disabled?

Today I managed to reproduce this with a kernel that doesn't use
optimizations at all, so fortunately this is not a compiler bug :)

I'm testing a patch by Ed Schouten to see if it fixes this, and I will
post more details after a day or so of running with the patch.