Re: powerd and nvidia drivers not playing nicely together (Was: Re: Systems running hot?)

2 views
Skip to first unread message

Bernd Walter

unread,
Dec 24, 2009, 5:46:26 AM12/24/09
to
On Wed, Dec 23, 2009 at 04:44:35PM +0200, Gleb Kurtsou wrote:
> On (21/12/2009 19:18), Doug Barton wrote:
> > b. f. wrote:
> > > On 12/21/09, Doug Barton <do...@freebsd.org> wrote:
> > >> b. f. wrote:
> > >>>> no X! So I think to myself, what else did I change last night.... oh
> > >
> > >>> acpi_perf? acpi_throttle? acpi_thermal? acpi_video?
> > >> I haven't done anything special with the acpi stuff. The only thing
> > >> that looks relevant from dmesg is: acpi_tz0: <Thermal Zone> on acpi0
> > >>
> > >
> > > Yes, but which components show up in 'sysctl -a | grep -ie acpi' ?
> >
> > It's a long list, but here you go:
> > http://people.freebsd.org/~dougb/acpi-grep.txt
> >
> > >>> Which nvidia driver?
> > >> The latest.
> > >
> > > Which video card?
> >
> > nvidia0: <GeForce Go 7300>
> I had similar problems with GeForce 8400M. GPU temperature could get up
> to 100C in X, which increased CPU temperature in its turn. I use
> powerd, and had lockups with *_cx_lowest settings. I run amd64, i386 was
> just fine on the same notebook.

It is not just nvidia.
I'm using two plain old PCI Matrox G400 and whenever I start X with
powerd enabled I have a full freeze within 24 hours.
It doesn't seem to be a problem to start powerd once X is runnning.
Maybe it is something like tuning some delay loop with reduced clock
rate, which then isn't long enough with increased speed.

--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.
_______________________________________________
freebsd...@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-curre...@freebsd.org"

Robert Noland

unread,
Dec 24, 2009, 9:25:58 AM12/24/09
to

FWIW, I run powerd on any machine that is capable. I don't think that
it works on my box that has older opteron and the agp mga in it, or at
least I haven't figured out the right settings for that one. i.e.
cpufreq doesn't attach...

I do run it on my core2duo's that I use for radeon, intel and nouveau
work and I've not seen any issues. One of those boxes just works, the
other doesn't use est since the BIOS doesn't provide P-State
information. powerd still runs reliably on both.

Also an atom 330 w/ intel gfx, which shows up as dual hyper-threaded
cores.

I have seen powerd hang a box, but it has been a while. IIRC, the box
that I had the issue on was a dual Xeon, and when the frequencies went
too low or perhaps got out of sync, it would lock up. I never
attributed the issue to X or drm. It has been well over a year since I
had access to that box, so I might be mis-remembering some details.

robert.

--
Robert Noland <rno...@FreeBSD.org>
FreeBSD

Kevin Oberman

unread,
Dec 24, 2009, 11:24:12 AM12/24/09
to
> Date: Thu, 24 Dec 2009 11:46:26 +0100
> From: Bernd Walter <ti...@cicely7.cicely.de>
> Sender: owner-free...@freebsd.org

Quick question...are you using throttling/TCC? If so, either turn it off
or limit how low it can run the CPU. When I was running throttling on
systems with old Matrox and Radeon cards, they would freeze if the
throttling went too low.

As mav pointed out at http://wiki.freebsd.org/TuningPowerConsumption,
TCC does little to conserve power and was not designed for that. TCC is
Thermal Control Circuit and is designed to keep the CPU form
over-temping. It works for this, but not power management. I'd love to
see it off (for power management) by default.
hint.p4tcc.0.disabled=1
hint.acpi_throttle.0.disabled=1
--
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: obe...@es.net Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

Bernd Walter

unread,
Dec 24, 2009, 12:48:10 PM12/24/09
to

I asume yes - not sure about all those modern fancy names.
In other words dev.cpu.?.freq changes.

> As mav pointed out at http://wiki.freebsd.org/TuningPowerConsumption,
> TCC does little to conserve power and was not designed for that. TCC is
> Thermal Control Circuit and is designed to keep the CPU form
> over-temping. It works for this, but not power management. I'd love to
> see it off (for power management) by default.
> hint.p4tcc.0.disabled=1
> hint.acpi_throttle.0.disabled=1

What is the difference between the hints and disabling powerd?

My system is a C2 quad on an Intel board running i386/PAE.
Only C1 is supported, which - to my knowledge - doesn't require powerd
and should be active by default.

[20]cicely7# sysctl dev.cpu
dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0
dev.cpu.0.temperature: 60
dev.cpu.0.freq: 2394
dev.cpu.0.freq_levels: 2394/89000 2094/77875 1795/66750 1496/55625 1197/44500 897/33375 598/22250 299/11125
dev.cpu.0.cx_supported: C1/1
dev.cpu.0.cx_lowest: C1
dev.cpu.0.cx_usage: 100.00%
dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0
dev.cpu.1.temperature: 59
dev.cpu.1.cx_supported: C1/1
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00%
dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.CPU2
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0
dev.cpu.2.temperature: 54
dev.cpu.2.cx_supported: C1/1
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_usage: 100.00%
dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.CPU3
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0
dev.cpu.3.temperature: 52
dev.cpu.3.cx_supported: C1/1
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00%

--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

Kevin Oberman

unread,
Dec 24, 2009, 3:22:23 PM12/24/09
to
> Date: Thu, 24 Dec 2009 18:48:10 +0100

The hints simply disable throttling and TCC for power management.

These are ALMOST identical techniques for controlling high CPU
temperature. They were never intended to be used for power
management. Both work by skipping N of 8 CPU cycles. When a system using
ACPI exceeds the value of hw.acpi.thermal.tz0._PSV, it will engage
TCC. Older systems used throttling under software control for the same
purpose, but FreeBSD did not implement it, as far as I know.

SpeedStep and its relatives on both Intel and AMD chips is designed for
power management and those are all I use on my systems. These are the
relevant sysctls:
dev.cpu.0.freq_levels: 2000/27000 1600/22600 1333/19666 1066/16733 800/13800
dev.cpu.0.cx_supported: C1/1 C2/1 C3/85 C4/185

I only have 5 "frequency" settings, but all work by actually slowing the
clock and reducing voltage, so they really save power. I also have 4 'C'
states which also can be a huge win as they allow the system to use far
less power when idle. Different systems have more or fewer available
states. C2 saves fairly little power. C3 (if available) is a big winner
and C4 and above are even better, but read mav's article for a better
description.

Now the bad news. As you note, you have only C1. At this time the
available frequencies are all from TCC, not SpeedStep. I thought all C2
chips supported EST. It should be listed in the CPU features2 at the
start of /var/run/dmesg.boot.

You should also have:
est0: <Enhanced SpeedStep Frequency Control> on cpu0
est1: <Enhanced SpeedStep Frequency Control> on cpu1
est2: <Enhanced SpeedStep Frequency Control> on cpu2
est3: <Enhanced SpeedStep Frequency Control> on cpu3
in the dmesg, but I suspect that, for some reason, you don't, and I
don't know why.

Unfortunately, most servers and desktops are pretty poor at power
management compared to laptops, though they are getting batter. My C2
Quad system does have C2, though no C3, but EST does work there.


--
R. Kevin Oberman, Network Engineer
Energy Sciences Network (ESnet)
Ernest O. Lawrence Berkeley National Laboratory (Berkeley Lab)
E-mail: obe...@es.net Phone: +1 510 486-8634
Key fingerprint:059B 2DDF 031C 9BA3 14A4 EADA 927D EBB3 987B 3751

Bernd Walter

unread,
Dec 24, 2009, 4:44:57 PM12/24/09
to

Well I do have them:
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz (2419.30-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x6fb Stepping = 11
Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,
HTT,TM,PBE>
Features2=0xe3bd<SSE3,RSVD2,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM>
AMD Features=0x20100000<NX,LM>
AMD Features2=0x1<LAHF>
Cores per package: 4
real memory = 9126805504 (8704 MB)
avail memory = 8125517824 (7749 MB)
ACPI APIC Table: <INTEL DG33FB >
FreeBSD/SMP: Multiprocessor System Detected: 4 CPUs
cpu0 (BSP): APIC ID: 0
cpu1 (AP): APIC ID: 1
cpu2 (AP): APIC ID: 2
cpu3 (AP): APIC ID: 3
ioapic0: Changing APIC ID to 2
ioapic0 <Version 2.0> irqs 0-23 on motherboard
acpi0: <INTEL DG33FB> on motherboard
acpi0: [ITHREAD]
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0
cpu0: <ACPI CPU> on acpi0
coretemp0: <CPU On-Die Thermal Sensors> on cpu0


est0: <Enhanced SpeedStep Frequency Control> on cpu0

p4tcc0: <CPU Frequency Thermal Control> on cpu0
cpu1: <ACPI CPU> on acpi0
coretemp1: <CPU On-Die Thermal Sensors> on cpu1


est1: <Enhanced SpeedStep Frequency Control> on cpu1

p4tcc1: <CPU Frequency Thermal Control> on cpu1
cpu2: <ACPI CPU> on acpi0
coretemp2: <CPU On-Die Thermal Sensors> on cpu2


est2: <Enhanced SpeedStep Frequency Control> on cpu2

p4tcc2: <CPU Frequency Thermal Control> on cpu2
cpu3: <ACPI CPU> on acpi0
coretemp3: <CPU On-Die Thermal Sensors> on cpu3


est3: <Enhanced SpeedStep Frequency Control> on cpu3

p4tcc3: <CPU Frequency Thermal Control> on cpu3

How would you know that the frequencies are from TCC and not SpeedStep?

Maybe I should mention, that the system is running 7.0-stable, so it
is not running recent code.
But my server is running an almost identic board with 8.0-RC1 amd64
and has similar sysctl output:
[139]cicely14# sysctl dev.cpu


dev.cpu.0.%desc: ACPI CPU
dev.cpu.0.%driver: cpu
dev.cpu.0.%location: handle=\_PR_.CPU0
dev.cpu.0.%pnpinfo: _HID=none _UID=0
dev.cpu.0.%parent: acpi0

dev.cpu.0.temperature: 34.0C


dev.cpu.0.freq: 2394
dev.cpu.0.freq_levels: 2394/89000 2094/77875 1795/66750 1496/55625 1197/44500 897/33375 598/22250 299/11125
dev.cpu.0.cx_supported: C1/1
dev.cpu.0.cx_lowest: C1

dev.cpu.0.cx_usage: 100.00% last 500us


dev.cpu.1.%desc: ACPI CPU
dev.cpu.1.%driver: cpu
dev.cpu.1.%location: handle=\_PR_.CPU1
dev.cpu.1.%pnpinfo: _HID=none _UID=0
dev.cpu.1.%parent: acpi0

dev.cpu.1.temperature: 32.0C
dev.cpu.1.cx_supported: C1/1
dev.cpu.1.cx_lowest: C1
dev.cpu.1.cx_usage: 100.00% last 500us


dev.cpu.2.%desc: ACPI CPU
dev.cpu.2.%driver: cpu
dev.cpu.2.%location: handle=\_PR_.CPU2
dev.cpu.2.%pnpinfo: _HID=none _UID=0
dev.cpu.2.%parent: acpi0

dev.cpu.2.temperature: 30.0C
dev.cpu.2.cx_supported: C1/1
dev.cpu.2.cx_lowest: C1
dev.cpu.2.cx_usage: 100.00% last 500us


dev.cpu.3.%desc: ACPI CPU
dev.cpu.3.%driver: cpu
dev.cpu.3.%location: handle=\_PR_.CPU3
dev.cpu.3.%pnpinfo: _HID=none _UID=0
dev.cpu.3.%parent: acpi0

dev.cpu.3.temperature: 30.0C
dev.cpu.3.cx_supported: C1/1
dev.cpu.3.cx_lowest: C1
dev.cpu.3.cx_usage: 100.00% last 500us

> Unfortunately, most servers and desktops are pretty poor at power
> management compared to laptops, though they are getting batter. My C2
> Quad system does have C2, though no C3, but EST does work there.

Yes - it is a desktop board and not the most modern - and not the
very best BIOS from my expirience.

--
B.Walter <be...@bwct.de> http://www.bwct.de
Modbus/TCP Ethernet I/O Baugruppen, ARM basierte FreeBSD Rechner uvm.

Reply all
Reply to author
Forward
0 new messages