Linux 2.6.29-rc6

6 views
Skip to first unread message

Linus Torvalds

unread,
Feb 22, 2009, 11:31:31 PM2/22/09
to Linux Kernel Mailing List

This is mostly lots of small fixes, with the stats being dominated by some
DocBook movement and an ia64 defconfig addition:

20.4% Documentation/DocBook/
3.9% Documentation/
2.0% arch/arm/
30.2% arch/ia64/configs/
5.5% arch/x86/
2.4% arch/
3.8% drivers/gpu/drm/i915/
2.3% drivers/scsi/
12.6% drivers/
2.2% fs/btrfs/
5.5% fs/cifs/
2.3% fs/

(the above is the "non-cumulative" dirstat, which doesn't add up
subdirectories cumulatively, and thus highlights individual directories
that contain changes, rather than the top-level directories).

But most of the changes are really pretty small, and the shortlog gives a
feel for it. About 350 files changed, averaging roughly 20 lines of
changes per file - but the average is somewhat misleading, because most
changes are just a couple of lines, and then the "big" changes are about
moving a few hundred lines of documentation or the 1601 lines of
defconfig.

Regressions fixed, small cleanups, and some changes to help future
merging.

Linus

---
Adam Baker (1):
V4L/DVB (10619): gspca - main: Destroy the URBs at disconnection time.

Adam Lackorzynski (1):
jsm: additional device support

Al Viro (1):
Fix incomplete __mntput locking

Alan Jenkins (1):
PM/hibernate: fix "swap breaks after hibernation failures"

Alex Chiang (3):
PCI: Documentation: fix minor PCIe HOWTO thinko
[IA64] Revert "prevent ia64 from invoking irq handlers on offline CPUs"
[IA64] Remove redundant cpu_clear() in __cpu_disable path

Alexey Dobriyan (3):
kbuild: fix tags generation of config symbols
mfd: fix sm501 section mismatches
eeepc: should depend on INPUT

Alexey Starikovskiy (1):
ACPI: EC: Add delay for slow MSI controller

Alok N Kataria (1):
x86, vmi: TSC going backwards check in vmi clocksource

Andi Kleen (4):
kbuild: create the source symlink earlier in the objdir
x86, mce: reinitialize per cpu features on resume
x86, mce: use force_sig_info to kill process in machine check
x86, mce: fix ifdef for 64bit thermal apic vector clear on shutdown

Andrew Vasquez (3):
[SCSI] qla2xxx: Properly acknowledge IDC notification messages.
[SCSI] qla2xxx: Mask out 'reserved' bits while processing FLT regions.
[SCSI] qla2xxx: Update version number to 8.03.00-k3.

Andrew Victor (2):
[ARM] 5390/1: AT91: Watchdog fixes
[ARM] 5391/1: AT91: Enable GPIO clocks earlier

Andrey Borzenkov (1):
PM: Fix pm_notifiers during user mode hibernation

Aneesh Kumar K.V (3):
ext4: Fix lockdep warning
ext4: Initialize preallocation list_head's properly
ext4: Implement range_cyclic in ext4_da_writepages instead of write_cache_pages

Anirban Chakraborty (2):
[SCSI] qla2xxx: Remove interrupt request bit check in the response processing path in multiq mode.
[SCSI] qla2xxx: Correct slab-error overwrite during vport creation and deletion.

Anssi Hannula (1):
HID: move tmff and zpff devices from ignore_list to blacklist

Arjan van de Ven (4):
scripts: add x86 register parser to markup_oops.pl
scripts: add x86 64 bit support to the markup_oops.pl script
Consolidate driver_probe_done() loops into one place
PM/resume: wait for device probing to finish

Arve Hjønnevåg (2):
PM: Wait for console in resume
PM: Fix suspend_console and resume_console to use only one semaphore

Atsushi Nemoto (1):
atmel_serial might lose modem status change

Avi Kivity (2):
KVM: Avoid using CONFIG_ in userspace visible headers
KVM: VMX: Flush volatile msrs before emulating rdmsr

Benjamin Herrenschmidt (1):
vmalloc: add __get_vm_area_caller()

Bernhard Walle (1):
Bernhard has moved

Bill Nottingham (1):
vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls.

Bjorn Helgaas (1):
ACPI: remove CONFIG_ACPI_SYSTEM

Boaz Harrosh (1):
bsg: Fix sense buffer bug in SG_IO

Brian King (3):
[SCSI] ibmvfc: Fix command timeout errors
[SCSI] ibmvfc: Fix rport relogin
[SCSI] ibmvfc: Increase cancel timeout

Chip Coldwell (1):
cciss: PCI power management reset for kexec

Chris Ball (1):
x86, olpc: fix model detection without OFW

Chris Mason (5):
Btrfs: process mount options on mount -o remount,
Btrfs: use larger metadata clusters in ssd mode
Btrfs: don't clean old snapshots on sync(1)
Btrfs: make a lockdep class for the extent buffer locks
Btrfs: check file pointer in btrfs_sync_file

Chris Wilson (16):
drm: Potential use-after-free on error path.
drm: Free the object ref on error.
drm/i915: Cleanup trivial leak on execbuffer error path.
drm/i915: hold mutex for unreference() in i915_gem_tiling.c
drm/i915: refleak along pin() error path.
drm: Do not leak a new reference for flink() on an existing name
drm/i915: Set framebuffer alignment based upon the fence constraints.
drm/i915: Release and unlock on mmap_gtt error path.
drm/i915: unpin for an invalid memory domain.
drm/i915: Unpin the ringbuffer if we fail to ioremap it.
drm/i915: Unpin the hws if we fail to kmap.
drm/i915: Unpin the fb on error during construction.
drm/i915: Cleanup the hws on ringbuffer constrution failure.
drm: Check for a NULL encoder when reverting on error path
drm: Propagate failure from setting crtc base.
drm/i915: Fix regression in 95ca9d

Christian Borntraeger (1):
[S390] Fix timeval regression on s390

Clemens Ladisch (2):
sound: usb-audio: fix uninitialized variable with M-Audio MIDI interfaces
sound: virtuoso: revert "do not overwrite EEPROM on Xonar D2/D2X"

Dan Carpenter (3):
ext4: Fix NULL dereference in ext4_ext_migrate()'s error handling
HID: unlock properly on error paths in hidraw_ioctl()
sx.c: avoid referencing freed memory if copy_from_user() fails

Dan Williams (1):
atmel-mci: fix initialization of dma slave data

Dave Hansen (1):
powerpc/mm: Fix numa reserve bootmem page selection

David Brownell (2):
omap_hsmmc: card detect irq bugfix
omap_hsmmc: only MMC1 allows HCTL.SDVS != 1.8V

David Howells (1):
mn10300: fix oprofile

David Vrabel (1):
wusb: whci-hcd: always lock whc->lock with interrupts disabled

David Woodhouse (2):
iommu: fix Intel IOMMU write-buffer flushing
Fix Intel IOMMU write-buffer flushing

Davide Libenzi (1):
timerfd: add flags check

Ed L. Cashin (1):
aoe: ignore vendor extension AoE responses

Eric Anholt (3):
drm/i915: Cut two args to set_to_gpu_domain that confused this tricky path.
drm/i915: Don't let a device flush to prepare buffers clear new write_domains.
drm/i915: Retire requests from i915_gem_busy_ioctl.

Eric Biederman (1):
seq_file: properly cope with pread

Felix Blyakher (2):
Revert "[XFS] use scalable vmap API"
Revert "[XFS] remove old vmap cache"

Frank Seidel (1):
MAINTAINERS: Switch hdaps to Frank Seidel

Frederic Weisbecker (1):
tracing/function-graph-tracer: trace the idle tasks

Geert Uytterhoeven (1):
m68k: atari - Rename "mfp" to "st_mfp"

Geoff Levand (1):
powerpc/ps3: Move ps3_mm_add_memory to device_initcall

Giuseppe Bilotta (2):
lis3lv02d: support both one- and two-byte sensors
lis3lv02d: add axes knowledge of HP Pavilion dv5 models

Gregory CLEMENT (1):
[ARM] 5400/1: Add support for inverted rdy_busy pin for Atmel nand device controller

H. Peter Anvin (1):
x86, mce: remove incorrect __cpuinit for mce_cpu_features()

Hannes Reinecke (1):
block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list

Hans Verkuil (2):
V4L/DVB (10625): ivtv: fix decoder crash regression
V4L/DVB (10626): ivtv: fix regression in get sliced vbi format

Hans de Goede (1):
hwmon: Fix ACPI resource check error handling

Hartley Sweeten (1):
[ARM] 5405/1: ep93xx: remove unused gesbc9312.h header

Heiko Carstens (1):
[S390] fix "mem=" handling in case of standby memory

Helmut Schaa (1):
sdhci: fix led naming

Herbert Xu (1):
crypto: lrw - Fix big endian support

Igor Mammedov (1):
[CIFS] Prevent OOPs when mounting with remote prefixpath.

Ilpo Järvinen (1):
sx.c: fix dbl statement if - add missing braces

Ingo Molnar (4):
sched: cpu hotplug fix
inotify: fix GFP_KERNEL related deadlock
x86: use the right protections for split-up pagetables
PM: Split up sysdev_[suspend|resume] from device_power_[down|up], fix

Isaku Yamahata (1):
[IA64] fixes configs and add default config for ia64 xen domU

James Smart (1):
[SCSI] scsi_scan: add missing interim SDEV_DEL state if slave_alloc fails

Jan Kara (3):
jbd2: Fix return value of jbd2_journal_start_commit()
Revert "ext4: wait on all pending commits in ext4_sync_fs()"
jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate()

Jean Delvare (2):
mfd: terminate pcf50633 i2c_device_id list
hwmon: (f71882fg) Hide misleading error message

Jean Pihet (2):
omap_hsmmc: recover from transfer failures
omap_hsmmc: Change while(); loops with finite version

Jeff Layton (3):
cifs: refactor new_inode() calls and inode initialization
cifs: properly handle case where CIFSGetSrvInodeNumber fails
cifs: posix fill in inode needed by posix open

Jeff Mahoney (2):
Btrfs: balance_level checks !child after access
Btrfs: remove btrfs_init_path

Jens Axboe (2):
block: fix bad definition of BIO_RW_SYNC
block: revert part of 18ce3751ccd488c78d3827e9f6bf54e6322676fb

Jeremy Fitzhardinge (2):
x86/cpa: make sure cpa is safe to call in lazy mmu mode
x86/paravirt: make arch_flush_lazy_mmu/cpu disable preemption

Jesse Barnes (4):
drm/i915: take struct mutex around fb unref
drm/i915: Keep refs on the object over the lifetime of vmas for GTT mmap.
drm/i915: suspend/resume GEM when KMS is active
drm/i915: fix WC mapping in non-GEM i915 code.

Jiri Slaby (3):
HID: fix bus endianity in file2alias
x86_64: acpi/wakeup_64 cleanup
x86_64: Fix S3 fail path

Johannes Weiner (3):
slab: introduce kzfree()
swsusp: dont fiddle with swappiness
swsusp: clean up shrink_all_zones()

John Stultz (1):
x86, hpet: fix for LS21 + HPET = boot hang

Joris van Rantwijk (1):
ALSA: usb-audio - Workaround for misdetected sample rate with CM6207

Josef Bacik (1):
Btrfs: make sure all pending extent operations are complete

Josh Hunt (1):
kbuild: add vmlinux to kernel rpm

Julia Lawall (3):
[SCSI] lpfc: introduce missing kfree
Btrfs: fs/btrfs/volumes.c: remove useless kzalloc
mfd: Fix egpio kzalloc return test

KAMEZAWA Hiroyuki (2):
mm: clean up for early_pfn_to_nid()
mm: fix memmap init for handling memory hole

Kristian Høgsberg (5):
drm: Release user fbs in drm_release
drm: Add locking around cursor gem operations.
drm: Bring PLL limits in sync with DDX values.
drm: Collapse identical i8xx_clock() and i9xx_clock().
drm: Use spread spectrum when the bios tells us it's ok.

Krzysztof Helt (1):
fbdev/drm: fix Kconfig submenu mess in "Graphics support"

Li Zefan (4):
cgroups: update documentation about css_set hash table
cgroups: fix possible use after free
README: fix a wrong filename
cpuset: various documentation fixes and updates

Linus Torvalds (2):
x86: Add IRQF_TIMER to legacy x86 timer interrupt descriptors
Linux 2.6.29-rc6

Luca Bigliardi (1):
uml: fix vde network backend in user mode linux

Makito SHIOKAWA (1):
[ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC

Marcelo Tosatti (4):
KVM: mmu_notifiers release method
KVM: PIT: fix i8254 pending count read
KVM: x86: disable kvmclock on non constant TSC hosts
KVM: x86: fix LAPIC pending count calculation

Mark Brown (5):
mfd: Initialise WM8350 interrupts earlier
mfd: Improve diagnostics for WM8350 ID register probe
mfd: Mark WM835x USB_SLV_500MA bit as accessible
mfd: Fix TWL4030 build on some ARM variants
mfd: Ensure all WM8350 IRQs are masked at startup

Mark McLoughlin (1):
KVM: Fix assigned devices circular locking dependency

Markus Metzger (1):
x86, ptrace, mm: fix double-free on race

Martin Peschke (1):
[SCSI] sg: fix device number in blktrace data

Matthew Wilcox (1):
PCI/MSI: fix msi_mask() shift fix

Mauro Carvalho Chehab (3):
V4L/DVB (10527): tuner: fix TUV1236D analog/digital setup
V4L/DVB (10572): Revert commit dda06a8e4610757def753ee3a541a0b1a1feb36b
8250: fix boot hang with serial console when using with Serial Over Lan port

Michael Buesch (2):
spi-gpio: sanitize MISO bitvalue
spi_bitbang: add more lowlevel function documentation

Michael Neuling (2):
powerpc/vsx: Fix VSX alignment handler for regs 32-63
bootgraph: fix for use with dot symbols

Michael Tokarev (1):
HID: blacklist Powercom USB UPS

Mike Christie (1):
[SCSI] libiscsi: Fix scsi command timeout oops in iscsi_eh_timed_out

Mike Frysinger (1):
kbuild,setlocalversion: shorten the make time when using svn

Mike Murphy (2):
PATCH [1/2] Documentation/driver-model/device.txt: fix struct device_attribute
PATCH [2/2] Documentation/filesystems/sysfs.txt: fix descriptions of device attributes

Neil Brown (1):
block: fix booting from partitioned md array

Nick Piggin (1):
mm: task dirty accounting fix

Nicolas Pitre (2):
[ARM] 5401/1: Orion: fix edge triggered GPIO interrupt support
[ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo()

Paul E. McKenney (1):
x86, rcu: fix strange load average and ksoftirqd behavior

Paul Moore (2):
cipso: Fix documentation comment
selinux: Fix the NetLabel glue code for setsockopt()

Paul Turner (1):
vfs: separate FMODE_PREAD/FMODE_PWRITE into separate flags

Pavel Machek (2):
Pavel has moved
hp accelerometer: add freefall detection

Pekka Paalanen (3):
mmiotrace: count events lost due to not recording
trace: mmiotrace to the tracer menu in Kconfig
doc: mmiotrace.txt, buffer size control change

Peter Oberparleiter (1):
[S390] sclp: handle empty event buffers

Peter Zijlstra (3):
futex: fix reference leak
timers: more consistently use clock vs timer
fs/super.c: add lockdep annotation to s_umount

Philipp Zabel (1):
mfd: fix htc-egpio iomem resource handling using resource_size

Philippe De Muyter (1):
floppy: request and release only the ports we actually use

Philippe Gerum (1):
powerpc/mm: Fix _PAGE_CHG_MASK to protect _PAGE_SPECIAL

Pierre Ossman (1):
Revert "sdhci: force high speed capability on some controllers"

Pierre Willenbrock (1):
drm/i915: Add missing mutex_lock(&dev->struct_mutex)

Qinghuang Feng (1):
Btrfs: remove unused code in split_state()

Rabin Vincent (2):
kbuild: add sys_* entries for syscalls in tags
mmc_test: fix basic read test

Rafael J. Wysocki (4):
USB/PCI: Fix resume breakage of controllers behind cardbus bridges
pm: fix build for CONFIG_PM unset
PM: fix build for CONFIG_PM unset
PM: Split up sysdev_[suspend|resume] from device_power_[down|up]

Rakib Mullick (1):
mfd: Fix sm501_register_gpio section mismatch

Randy Dunlap (7):
PCI: fix rom.c kernel-doc warning
PCI: fix struct pci_platform_pm_ops kernel-doc
PCI: fix missing kernel-doc and typos
x86: dell-laptop: depends on POWER_SUPPLY
docsrc: use config instead of menuconfig
docbook: split kernel-api for device-drivers
acpi/doc: add missing param value

Richard Hughes (1):
battery: don't assume we are fully charged when not charging or discharging

Robert Jennings (1):
[SCSI] ibmvscsi: Correct DMA mapping leak

Robin Holt (1):
[IA64] bte_copy of BTE_MAX_XFER trips BUG_ON.

Roel Kluin (4):
mfd: wm8350 tries reaches -1
FRV: __pte_to_swp_entry doesn't expand correctly
paride/pg.c: xs(): &&/|| confusion
[ARM] 5403/1: pxa25x_ep_fifo_flush() *ep->reg_udccs always set to 0

Roland Dreier (1):
drm/i915: Fix potential AB-BA deadlock in i915_gem_execbuffer()

Russell King (3):
[ARM] omap: fix omap2_divisor_to_clksel() error return value
[ARM] omap: fix _omap2_clksel_get_src_field()
[ARM] omap: fix clock reparenting in omap2_clk_set_parent()

Rusty Russell (2):
cpumask: fix powernow-k8: partial revert of 2fdf66b491ac706657946442789ec644cc317e1a
cpumask: Use cpu_*_mask accessors code: alpha

Sergei Shtylyov (1):
libata-sff: fix 32-bit PIO ATAPI regression

Sheng Yang (4):
KVM: Add kvm_arch_sync_events to sync with asynchronize events
KVM: Fix racy in kvm_free_assigned_irq
KVM: MMU: Map device MMIO as UC in EPT
KVM: Fix INTx for device assignment

Shyam...@Dell.com (1):
[SCSI] qla2xxx: fix Kernel Panic with Qlogic 2472 Card.

Steve Aarnio (1):
drm/i915: Don't add panel_fixed_mode to the probed modes list at LVDS init.

Steve French (4):
[CIFS] ipv6_addr_equal for address comparison
[CIFS] Fix oops in cifs_strfromUCS_le mounting to servers which do not specify their OS
[CIFS] improve posix semantics of file create
[CIFS] Fix multiuser mounts so server does not invalidate earlier security contexts

Steven Rostedt (3):
tracing: disable tracing while testing ring buffer
tracing: have function trace select kallsyms
tracing: limit the number of loops the ring buffer self test can make

Subhash Peddamallu (1):
fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free

Suresh Siddha (1):
x86, pat: fix warn_on_once() while mapping 0-1MB range with /dev/mem

Takashi Iwai (3):
Revert "Sound: hda - Restore PCI configuration space with interrupts off"
ALSA: usb-audio - Fix non-continuous rate detection
ALSA: jack - Use card->shortname for input name

Tejun Heo (2):
sata_nv: give up hardreset on nf2
vmalloc: call flush_cache_vunmap() from unmap_kernel_range()

Thomas Gleixner (3):
x86: warn if arch_flush_lazy_mmu_cpu is called in preemptible context
x86: CPA avoid repeated lazy mmu flush
x86, vm86: fix preemption bug

Tobias Klauser (1):
drm/i915: Storage class should be before const qualifier

Tobias Lorenz (2):
V4L/DVB (10532): Correction of Stereo detection/setting and signal strength indication
V4L/DVB (10533): fix LED status output

Tony Luck (2):
[IA64] Build fix for __early_pfn_to_nid() undefined link error
[IA64] xen_domu build fix

Tony Vroon (1):
fujitsu-laptop: Use RFKILL support bitmask from firmware

Trent Piepho (1):
V4L/DVB (10516a): zoran: Update MAINTAINERS entry

Wei Yongjun (2):
ext4: Fix to read empty directory blocks correctly in 64k
mn10300: fix typo && -> || in arch/mn10300/unit-asb2305/pci.c

Wim Van Sebroeck (1):
[WATCHDOG] iTCO_wdt: fix SMI_EN regression 2

Yan Zheng (2):
Btrfs: Avoid using __GFP_HIGHMEM with slab allocator
Btrfs: hold trans_mutex when using btrfs_record_root_in_trans

Yang Hongyang (1):
atyfb: remove unused local variable `pwr_command'

Yang Zhang (1):
KVM: ia64: fix fp fault/trap handler

Yauhen Kharuzhy (1):
s3cmci: Fix hangup in do_pio_write()

Yi Li (1):
MMC: fix bug - SDHC card capacity not correct

Zachary Amsden (1):
MAINTAINERS: paravirt-ops maintainers update

Zlatko Calusic (1):
Add support for VT6415 PCIE PATA IDE Host Controller

etienne (1):
drm/radeon: update sarea copies of last_ variables on resume.

wanzongshun (1):
[ARM] 5398/1: Add Wan ZongShun to MAINTAINERS for W90P910
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Karsten Wiese

unread,
Feb 23, 2009, 9:08:24 AM2/23/09
to Linus Torvalds, Eric Anholt, Linux Kernel Mailing List
Fix an oops in i915_gem_retire_requests()

dev_priv->hw_status_page can be NULL, if i915_gem_retire_requests()
is called from i915_gem_busy_ioctl().

Signed-off-by Karsten Wiese <f...@wemgehoertderstaat.de>
---
drivers/gpu/drm/i915/i915_gem.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 25b3374..28b726d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1051,6 +1051,9 @@ i915_gem_retire_requests(struct drm_device *dev)
drm_i915_private_t *dev_priv = dev->dev_private;
uint32_t seqno;

+ if (!dev_priv->hw_status_page)
+ return;
+
seqno = i915_get_gem_seqno(dev);

while (!list_empty(&dev_priv->mm.request_list)) {
--
1.6.0.6

Jesper Krogh

unread,
Feb 26, 2009, 6:16:28 AM2/26/09
to Linus Torvalds, linux-...@vger.kernel.org

Booting up 2.6.29-rc6 gave me this one in dmesg...

[ 21.136149] ck804xrom ck804xrom_init_one(): Unable to register
resource 0x00000000ff000000-0x00000000ffffffff - kernel bug?
[ 21.136258] resource map sanity check conflict: 0xff000000 0xffffffff
0xff700000 0xffffffff reserved
[ 21.136267] ------------[ cut here ]------------
[ 21.136269] WARNING: at arch/x86/mm/ioremap.c:208
__ioremap_caller+0x359/0x390()
[ 21.136271] Hardware name: Sun Fire X2200 M2 with Quad Core Processor
[ 21.136273] Info: mapping multiple BARs. Your kernel is fine.Modules
linked in: ck804xrom(+) mtd chipreg pcspkr(+) shpchp button pci_hotplug
i2c_nforce2 i2c_core map_funcs evdev ext3 jbd mbcache sg sd_mod usbhid
hid amd74xx sata_nv tg3 ata_generic libphy ehci_hcd libata ohci_hcd
forcedeth scsi_mod usbcore thermal processor fan thermal_sys fuse
[ 21.136289] Pid: 3843, comm: modprobe Not tainted 2.6.29-rc6 #2
[ 21.136291] Call Trace:
[ 21.136298] [<ffffffff8023d352>] warn_slowpath+0xf2/0x130
[ 21.136301] [<ffffffff8023d62a>] __call_console_drivers+0x6a/0x90
[ 21.136304] [<ffffffff8023e1fe>] printk+0x4e/0x60
[ 21.136306] [<ffffffff8023e1fe>] printk+0x4e/0x60
[ 21.136309] [<ffffffff8036b520>] match_pci_dev_by_id+0x0/0x60
[ 21.136313] [<ffffffff8024360e>] iomem_map_sanity_check+0xbe/0xd0
[ 21.136316] [<ffffffff80229799>] __ioremap_caller+0x359/0x390
[ 21.136320] [<ffffffffa01eb1f6>] init_ck804xrom+0x1f6/0x62c [ck804xrom]
[ 21.136322] [<ffffffffa01eb1f6>] init_ck804xrom+0x1f6/0x62c [ck804xrom]
[ 21.136326] [<ffffffff80275eac>] tracepoint_update_probe_range+0x1c/0xb0
[ 21.136329] [<ffffffffa01eb000>] init_ck804xrom+0x0/0x62c [ck804xrom]
[ 21.136332] [<ffffffff8020903b>] _stext+0x3b/0x160
[ 21.136335] [<ffffffff80359141>] __up_read+0x21/0xb0
[ 21.136340] [<ffffffff80256495>]
__blocking_notifier_call_chain+0x65/0x90
[ 21.136343] [<ffffffff80265604>] sys_init_module+0xb4/0x200
[ 21.136346] [<ffffffff8020c35b>] system_call_fastpath+0x16/0x1b
[ 21.136348] ---[ end trace f807e12658961c2d ]---


System is fully operational, but I didnt get it in 2.6.26.8 (most recent
kernel tried on this hardware).


--
Jesper

Marcin Slusarz

unread,
Feb 26, 2009, 12:17:53 PM2/26/09
to Jesper Krogh, Linus Torvalds, linux-...@vger.kernel.org, Dave Olsen, Ryan Jackson, David.W...@intel.com, linu...@lists.infradead.org

This message comes from this code in drivers/mtd/maps/ck804xrom.c:
/*
* Try to reserve the window mem region. If this fails then
* it is likely due to a fragment of the window being
* "reserved" by the BIOS. In the case that the
* request_mem_region() fails then once the rom size is
* discovered we will try to reserve the unreserved fragment.
*/
window->rsrc.name = MOD_NAME;
window->rsrc.start = window->phys;
window->rsrc.end = window->phys + window->size - 1;
window->rsrc.flags = IORESOURCE_MEM | IORESOURCE_BUSY;
if (request_resource(&iomem_resource, &window->rsrc)) {
window->rsrc.parent = NULL;
printk(KERN_ERR MOD_NAME
" %s(): Unable to register resource"
" 0x%.016llx-0x%.016llx - kernel bug?\n",
__func__,
(unsigned long long)window->rsrc.start,
(unsigned long long)window->rsrc.end);
}

So it's probably harmless.
Adding CC's.

Marcin

Linus Torvalds

unread,
Feb 26, 2009, 12:53:43 PM2/26/09
to Jesper Krogh, David Woodhouse, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
Dave Olsen <dol...@lnxi.com>,
Ryan Jackson <rjac...@lnxi.com>, David.W...@intel.com,
linu...@lists.infradead.org


On Thu, 26 Feb 2009, Jesper Krogh wrote:
>
>
> Booting up 2.6.29-rc6 gave me this one in dmesg...
>
> [ 21.136149] ck804xrom ck804xrom_init_one(): Unable to register resource 0x00000000ff000000-0x00000000ffffffff - kernel bug?

Well, it _is_ a kernel bug, but it's in that stupid driver. It does
everything wrong, including printing out a scary message.

Piece of sh*t driver, in other words.

I mean, it even has a _comment_ about how the request_region is likely to
not succeed, and then it prints out that scary message when it
then doesn't do so.

Not to mention that the driver is likely _wrong_ to just unconditionally
try to enable that resource without *first* checking whether the resource
can actually be enabled or whether there are other resources in that same
window.

Quite frankly, I find that whole thing scary. The driver should be deleted
or at least marked EXPERIMENTAL or BROKEN.

It has a "BE VERY CAREFUL" in the Kconfig _help_ text, but is not marked
as being dangerous any other way.

That said, I really don't see why you would get this message _now_. The
total braindamage of that driver in no way seems new. Did you perhaps not
notice before, or did you just not enable it before?

> [ 21.136269] WARNING: at arch/x86/mm/ioremap.c:208 __ioremap_caller+0x359/0x390()

This is a different, but related warning, since the driver is doing an
ioremap across different resources. The warning is directly related to the
fact that the resource wasn't actually valid to begin with.

What does "cat /proc/iomem" say?

> System is fully operational, but I didnt get it in 2.6.26.8 (most recent
> kernel tried on this hardware).

The ioremap() warning is newish, and may be what made you notice the
previous (just one-line) crappy warning.

Quite frankly, having looked at that horrible driver, I would seriously
consider disabling it. Stuff like that should not be allowed to exist.

Linus

David Woodhouse

unread,
Feb 26, 2009, 2:22:26 PM2/26/09
to Linus Torvalds, Jesper Krogh, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
On Thu, 2009-02-26 at 17:53 +0000, Linus Torvalds wrote:
> Dave Olsen <dol...@lnxi.com>,
> Ryan Jackson <rjac...@lnxi.com>, David.W...@intel.com,
> linu...@lists.infradead.org
>
>
> On Thu, 26 Feb 2009, Jesper Krogh wrote:
> >
> >
> > Booting up 2.6.29-rc6 gave me this one in dmesg...
> >
> > [ 21.136149] ck804xrom ck804xrom_init_one(): Unable to register resource 0x00000000ff000000-0x00000000ffffffff - kernel bug?
>
> Well, it _is_ a kernel bug, but it's in that stupid driver. It does
> everything wrong, including printing out a scary message.
>
> Piece of sh*t driver, in other words.
>
> I mean, it even has a _comment_ about how the request_region is likely to
> not succeed, and then it prints out that scary message when it
> then doesn't do so.
>
> Not to mention that the driver is likely _wrong_ to just unconditionally
> try to enable that resource without *first* checking whether the resource
> can actually be enabled or whether there are other resources in that same
> window.
>
> Quite frankly, I find that whole thing scary. The driver should be deleted
> or at least marked EXPERIMENTAL or BROKEN.

It's giving you access to your BIOS flash so that you can overwrite it
from within Linux. It's _supposed_ to be scary :)

It's also always going to be a hack -- it's a PITA getting direct access
to that flash on most PeeCee chipsets. The driver operates on the
principle that it knows the hardware, and it can _make_ the flash appear
at the appropriate physical addresses. The theory, at least, is that it
knows better than the kernel does.

But yeah, it should probably at least look for other things which
already overlap with the region that it's trying to 'create'. Although
the comment leads me to believe that sometimes that's _expected_ and
shouldn't cause the driver to abort.

Dave, Ryan, are you still actively using this?

--
David Woodhouse Open Source Technology Centre
David.W...@intel.com Intel Corporation

---------------------------------------------------------------------
Intel Corporation (UK) Limited
Registered No. 1134945 (England)
Registered Office: Pipers Way, Swindon SN3 1RJ
VAT No: 860 2173 47

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

Jesper Krogh

unread,
Feb 26, 2009, 2:32:09 PM2/26/09
to Linus Torvalds, David Woodhouse, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
Linus Torvalds wrote:
> Dave Olsen <dol...@lnxi.com>,
> Ryan Jackson <rjac...@lnxi.com>, David.W...@intel.com,
> linu...@lists.infradead.org
>
>
> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>>
>> Booting up 2.6.29-rc6 gave me this one in dmesg...
>>
>> [ 21.136149] ck804xrom ck804xrom_init_one(): Unable to register resource 0x00000000ff000000-0x00000000ffffffff - kernel bug?
>
> Well, it _is_ a kernel bug, but it's in that stupid driver. It does
> everything wrong, including printing out a scary message.

I've seen that before.. (even reported it before). It just "slipped"
into the cut'n'paste It was the following stuff that I intended to report.

>> [ 21.136269] WARNING: at arch/x86/mm/ioremap.c:208 __ioremap_caller+0x359/0x390()
>
> This is a different, but related warning, since the driver is doing an
> ioremap across different resources. The warning is directly related to the
> fact that the resource wasn't actually valid to begin with.
>
> What does "cat /proc/iomem" say?

http://krogh.cc/~jesper/iomem.txt

>> System is fully operational, but I didnt get it in 2.6.26.8 (most recent
>> kernel tried on this hardware).
>
> The ioremap() warning is newish, and may be what made you notice the
> previous (just one-line) crappy warning.
>
> Quite frankly, having looked at that horrible driver, I would seriously
> consider disabling it. Stuff like that should not be allowed to exist.

Being a "stupid" user, I pick the easy way to build a fresh kernel:
1) pick the distro .config
2) make oldconfig
3) Let the kernel load what it think it needs.
4) Report if I see and strange stuff (warnings / bugs / oops) or
misbehaviour.

So I dont know if I need that driver for anything vital. Should I care?
Or shouldn't it "just work"?

--
Jesper

David Woodhouse

unread,
Feb 26, 2009, 2:37:45 PM2/26/09
to Jesper Krogh, Linus Torvalds, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
On Thu, 2009-02-26 at 19:31 +0000, Jesper Krogh wrote:
> 1) pick the distro .config
> 2) make oldconfig

So it should have been a module, not built-in?

> 3) Let the kernel load what it think it needs.

That part at least ought to be disabled -- we don't let this driver
autoload, because unless you _know_ you need it, you don't need it.

It's for overwriting your BIOS.

--
David Woodhouse Open Source Technology Centre
David.W...@intel.com Intel Corporation

--

Jesper Krogh

unread,
Feb 26, 2009, 2:47:21 PM2/26/09
to David Woodhouse, Linus Torvalds, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
David Woodhouse wrote:
> On Thu, 2009-02-26 at 19:31 +0000, Jesper Krogh wrote:
>> 1) pick the distro .config
>> 2) make oldconfig
>
> So it should have been a module, not built-in?

It is a module.. and it somehow gets auto-loaded on my system. (not
listed in /etc/modules).

$ grep -i ck804xrom /boot/config-2.6.29-rc6
CONFIG_MTD_CK804XROM=m

Same in the distro .config
$ grep -i ck804xrom /boot/config-2.6.24-23-server
CONFIG_MTD_CK804XROM=m


>> 3) Let the kernel load what it think it needs.
>
> That part at least ought to be disabled -- we don't let this driver
> autoload, because unless you _know_ you need it, you don't need it.
>
> It's for overwriting your BIOS.

Oh. Thanks for your time... I'll just make sure to disable it from now on.

--
Jesper

David Woodhouse

unread,
Feb 26, 2009, 2:50:32 PM2/26/09
to Jesper Krogh, Linus Torvalds, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List
On Thu, 2009-02-26 at 20:46 +0100, Jesper Krogh wrote:
> It is a module.. and it somehow gets auto-loaded on my system. (not
> listed in /etc/modules).

Oops, we should have disabled that, but it still has a
MODULE_DEVICE_TABLE(). I'll remove that, for a start...

--
David Woodhouse Open Source Technology Centre
David.W...@intel.com Intel Corporation

--

Jesper Krogh

unread,
Feb 26, 2009, 2:55:44 PM2/26/09
to Linus Torvalds, Linux Kernel Mailing List
2.6.29-rc6 seems to have trouble running ntpd reliable under load. My
nagios system has just alerted me of drifting time on the machine upgraded.

Feb 26 19:09:25 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 26 19:10:31 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 26 19:25:21 quad12 ntpd[4901]: time reset -0.915488 s
Feb 26 19:29:11 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 26 19:31:21 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 19:34:37 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 26 19:37:53 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 19:46:27 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 26 19:46:27 quad12 ntpd[4901]: time reset -0.961386 s
Feb 26 19:50:30 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 26 19:51:34 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 26 20:01:55 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 20:06:18 quad12 ntpd[4901]: time reset -0.979177 s
Feb 26 20:10:15 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 26 20:11:21 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 20:14:52 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 26 20:19:10 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 20:26:00 quad12 ntpd[4901]: time reset -0.923268 s
Feb 26 20:30:01 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 26 20:30:30 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 26 20:45:36 quad12 ntpd[4901]: time reset -0.919609 s
Feb 26 20:49:49 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13

2.6.26.8 doesnt have this problem.

The "current_clocsource" is the same on both systems.

$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
tsc


--
Jesper

Linus Torvalds

unread,
Feb 26, 2009, 3:32:31 PM2/26/09
to Jesper Krogh, David Woodhouse, Dave Olsen, Ryan Jackson, linu...@lists.infradead.org, Linux Kernel Mailing List

On Thu, 26 Feb 2009, Jesper Krogh wrote:

> Linus Torvalds wrote:
> > On Thu, 26 Feb 2009, Jesper Krogh wrote:
> > >
> > > Booting up 2.6.29-rc6 gave me this one in dmesg...
> > >
> > > [ 21.136149] ck804xrom ck804xrom_init_one(): Unable to register resource
> > > 0x00000000ff000000-0x00000000ffffffff - kernel bug?
> >
> > Well, it _is_ a kernel bug, but it's in that stupid driver. It does
> > everything wrong, including printing out a scary message.
>
> I've seen that before.. (even reported it before). It just "slipped" into the
> cut'n'paste It was the following stuff that I intended to report.

Ok. They very much are related. The new warning is just that - a new
warning.

> > > [ 21.136269] WARNING: at arch/x86/mm/ioremap.c:208
> > > __ioremap_caller+0x359/0x390()
> >
> > This is a different, but related warning, since the driver is doing an
> > ioremap across different resources. The warning is directly related to the
> > fact that the resource wasn't actually valid to begin with.
> >
> > What does "cat /proc/iomem" say?
>
> http://krogh.cc/~jesper/iomem.txt

Ok, so the thing conflicts with

ff700000-ffffffff : reserved
ff700000-ffffffff : pnp 00:0b

and that probably _is_ somehow related to the whole flash thing.

I guess the driver could use "insert_resource()" and the problem would go
away. Except I do think it should be marked very dangerous some way, so
that you can't even enable it unless you really really know you want to
(eg something like EXPERIMENTAL). Because I don't think this driver is
appropriate in any other case..

> Being a "stupid" user, I pick the easy way to build a fresh kernel: 1)
> pick the distro .config 2) make oldconfig 3) Let the kernel load what it
> think it needs. 4) Report if I see and strange stuff (warnings / bugs /
> oops) or misbehaviour.
>
> So I dont know if I need that driver for anything vital. Should I care?
> Or shouldn't it "just work"?

You definitely don't need it, and everything will work without it.

Linus

Jesper Krogh

unread,
Feb 26, 2009, 3:43:40 PM2/26/09
to Linus Torvalds, Linux Kernel Mailing List
Linus Torvalds wrote:
>
> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>> 2.6.26.8 doesnt have this problem.
>>
>> The "current_clocsource" is the same on both systems.
>>
>> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>> tsc
>
> What does the frequency calibrate to? It should be in the dmesg. Does it
> differ by a big amount?

Non-working:
$ dmesg | grep -i freq
[ 0.004007] Calibrating delay loop (skipped), value calculated using
timer frequency.. 4620.05 BogoMIPS (lpj=9240104)

2.6.26.8 doesn't have that information.

Carl-Daniel Hailfinger

unread,
Feb 26, 2009, 3:54:24 PM2/26/09
to David Woodhouse, Jesper Krogh, Ryan Jackson, linu...@lists.infradead.org, Dave Olsen, Linus Torvalds, Linux Kernel Mailing List
On 26.02.2009 20:36, David Woodhouse wrote:
> It's for overwriting your BIOS.
>

There's a pure userspace replacement for it. That replacement is even
packaged in most distros. See http://www.coreboot.org/Flashrom .


Regards,
Carl-Daniel

--
http://www.hailfinger.org/

john stultz

unread,
Feb 26, 2009, 4:19:44 PM2/26/09
to Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List
On Thu, Feb 26, 2009 at 12:43 PM, Jesper Krogh <jes...@krogh.cc> wrote:
> Linus Torvalds wrote:
>>
>> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>>>
>>> 2.6.26.8 doesnt have this problem.
>>>
>>> The "current_clocsource" is the same on both systems.
>>>
>>> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>>> tsc
>>
>> What does the frequency calibrate to? It should be in the dmesg. Does it
>> differ by a big amount?
>
> Non-working:
> $ dmesg | grep -i freq
> [    0.004007] Calibrating delay loop (skipped), value calculated using
> timer frequency.. 4620.05 BogoMIPS (lpj=9240104)
>
> 2.6.26.8 doesn't have that information.

I'm surprised the clocksource watchdog isn't catching it.

What's the output from:
cat /sys/devices/system/clocksource/clocksource0/available_clocksource

Also mind sending the full dmesg for both kernels?

thanks
-john

Jesper Krogh

unread,
Feb 26, 2009, 4:35:59 PM2/26/09
to john stultz, Linus Torvalds, Linux Kernel Mailing List
john stultz wrote:
> On Thu, Feb 26, 2009 at 12:43 PM, Jesper Krogh <jes...@krogh.cc> wrote:
>> Linus Torvalds wrote:
>>> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>>>> 2.6.26.8 doesnt have this problem.
>>>>
>>>> The "current_clocsource" is the same on both systems.
>>>>
>>>> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>>>> tsc
>>> What does the frequency calibrate to? It should be in the dmesg. Does it
>>> differ by a big amount?
>> Non-working:
>> $ dmesg | grep -i freq
>> [ 0.004007] Calibrating delay loop (skipped), value calculated using
>> timer frequency.. 4620.05 BogoMIPS (lpj=9240104)
>>
>> 2.6.26.8 doesn't have that information.
>
> I'm surprised the clocksource watchdog isn't catching it.
>
> What's the output from:
> cat /sys/devices/system/clocksource/clocksource0/available_clocksource

$ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
tsc acpi_pm jiffies

Same on both.

> Also mind sending the full dmesg for both kernels?

http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
http://krogh.cc/~jesper/dmesg-2.6.26.8.txt

--
Jesper

john stultz

unread,
Feb 26, 2009, 4:47:38 PM2/26/09
to Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner
On Thu, 2009-02-26 at 22:35 +0100, Jesper Krogh wrote:
> john stultz wrote:
> > On Thu, Feb 26, 2009 at 12:43 PM, Jesper Krogh <jes...@krogh.cc> wrote:
> >> Linus Torvalds wrote:
> >>> On Thu, 26 Feb 2009, Jesper Krogh wrote:
> >>>> 2.6.26.8 doesnt have this problem.
> >>>>
> >>>> The "current_clocsource" is the same on both systems.
> >>>>
> >>>> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> >>>> tsc
> >>> What does the frequency calibrate to? It should be in the dmesg. Does it
> >>> differ by a big amount?
> >> Non-working:
> >> $ dmesg | grep -i freq
> >> [ 0.004007] Calibrating delay loop (skipped), value calculated using
> >> timer frequency.. 4620.05 BogoMIPS (lpj=9240104)
> >>
> >> 2.6.26.8 doesn't have that information.
> >
> > I'm surprised the clocksource watchdog isn't catching it.
> >
> > What's the output from:
> > cat /sys/devices/system/clocksource/clocksource0/available_clocksource
>
> $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
> tsc acpi_pm jiffies

Hmm. Does booting w/ "clocksourc=acpi_pm" also show the severe (~550ppm,
which NTP can't handle) drift?

>From the dmesg, I don't see any major calibration difference right off.

So I'd suspect something like TSC halting in idle could be causing
problems, but the watchdog should catch that as well. My only guess at
this point is that the ACPI PM is halting in idle along with the TSC.

And you said this only happens under load?

-john

Linus Torvalds

unread,
Feb 26, 2009, 4:50:27 PM2/26/09
to Jesper Krogh, john stultz, Linux Kernel Mailing List

On Thu, 26 Feb 2009, Jesper Krogh wrote:
>
> > Also mind sending the full dmesg for both kernels?
>
> http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
> http://krogh.cc/~jesper/dmesg-2.6.26.8.txt

Try changing

#define QUICK_PIT_MS 15

in arch/x86/kernel/tsc.c into something bigger. Let's say just doubling
it to 30. Does that change anything?

Linus

john stultz

unread,
Feb 26, 2009, 4:54:55 PM2/26/09
to Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner, Len Brown
On Thu, 2009-02-26 at 22:35 +0100, Jesper Krogh wrote:
> > Also mind sending the full dmesg for both kernels?
>
> http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
> http://krogh.cc/~jesper/dmesg-2.6.26.8.txt

So one interesting difference:
2.6.26.8: TSC calibrated against PM_TIMER
2.6.29-rc6: Fast TSC calibration using PIT

Thomas, any thoughts as to why we might be calibrating off the PIT
instead of the PM_TIMER w/ 2.6.29?

Maybe does this line provide a hint?
FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)


thanks
-john

Thomas Gleixner

unread,
Feb 26, 2009, 4:55:31 PM2/26/09
to john stultz, Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List

But why would it do that on 29-rc6 and not on 2.6.28.8 ? I'm not aware
of changes which might cause that.

Thanks,

tglx

Jesper Krogh

unread,
Feb 26, 2009, 5:04:29 PM2/26/09
to Thomas Gleixner, john stultz, Linus Torvalds, Linux Kernel Mailing List

My comparison is 2.6.26.8 not 2.6.28.8 .. so fairly old.

It is a small cluster, so I'm slipping some test-kernels in when the
cluster is idle.

--
Jesper

Thomas Gleixner

unread,
Feb 26, 2009, 5:07:18 PM2/26/09
to john stultz, Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Thu, 26 Feb 2009, john stultz wrote:
> On Thu, 2009-02-26 at 22:35 +0100, Jesper Krogh wrote:
> > > Also mind sending the full dmesg for both kernels?
> >
> > http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
> > http://krogh.cc/~jesper/dmesg-2.6.26.8.txt
>
> So one interesting difference:
> 2.6.26.8: TSC calibrated against PM_TIMER
> 2.6.29-rc6: Fast TSC calibration using PIT
>
> Thomas, any thoughts as to why we might be calibrating off the PIT
> instead of the PM_TIMER w/ 2.6.29?

Yup, because we introduced the Fast PIT calibration in 2.6.28.

Is the delta anything NTP might get upset about:

2.6.26: time.c: Detected 2311.847 MHz processor.
2.6.29: Detected 2310.029 MHz processor.

If yes, then we need to fix NTP not the calibration code :)



> Maybe does this line provide a hint?
> FADT: X_PM1a_EVT_BLK.bit_width (16) does not match PM1_EVT_LEN (4)

Red herring.

Thanks,

tglx

Linus Torvalds

unread,
Feb 26, 2009, 5:25:20 PM2/26/09
to Thomas Gleixner, john stultz, Jesper Krogh, Linux Kernel Mailing List, Len Brown

On Thu, 26 Feb 2009, Thomas Gleixner wrote:
>
> Is the delta anything NTP might get upset about:
>
> 2.6.26: time.c: Detected 2311.847 MHz processor.
> 2.6.29: Detected 2310.029 MHz processor.
>
> If yes, then we need to fix NTP not the calibration code :)

Well, that _is_ about 500ppm difference, and we claim that we _should_
have reached 150ppm with the 15ms delay. We clearly don't seem to have
done that. I'm not quite sure why - we _should_ be finding the edge of the
PIT events to within roughly a microsecond (assuming that's about as long
as an "inb" takes), and that should give us a pretty good fast
calibration, but maybe I'm overlooking something.

Or - and this may be more likely - there are chipsets that aren't very
good at reading the PIT in a tight loop. That may explain why it's a
problem on Jesper's hardware, but we haven't gotten tons of reports of
this from others.

I see that it's a SunFire X2200, which I think uses an nVidia HT
southbridge. I assume it's an nForce4 thing. There shouldn't be anything
odd there, and the PIT read shouldn't be taking any longer than on
anything else, but who knows?

Linus

john stultz

unread,
Feb 26, 2009, 5:31:53 PM2/26/09
to Thomas Gleixner, Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Thu, 2009-02-26 at 23:06 +0100, Thomas Gleixner wrote:
> On Thu, 26 Feb 2009, john stultz wrote:
> > On Thu, 2009-02-26 at 22:35 +0100, Jesper Krogh wrote:
> > > > Also mind sending the full dmesg for both kernels?
> > >
> > > http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
> > > http://krogh.cc/~jesper/dmesg-2.6.26.8.txt
> >
> > So one interesting difference:
> > 2.6.26.8: TSC calibrated against PM_TIMER
> > 2.6.29-rc6: Fast TSC calibration using PIT
> >
> > Thomas, any thoughts as to why we might be calibrating off the PIT
> > instead of the PM_TIMER w/ 2.6.29?
>
> Yup, because we introduced the Fast PIT calibration in 2.6.28.

Ah. Ok.

> Is the delta anything NTP might get upset about:
>
> 2.6.26: time.c: Detected 2311.847 MHz processor.
> 2.6.29: Detected 2310.029 MHz processor.

I wouldn't think so.

Although, I'm recalling on some systems here right after we deploy them
we'll see something similar to the originally reported ntpd "time reset"
noise for a period of time while ntpd tries to find the right freq. For
some reason, I've noticed, having multiple servers in your ntp.conf
seems to increase NTP's difficulty at picking a time and converging.

So this may be just the slight calibration change is confusing ntp or it
may be the NTP_INTERVAL_LENGTH change from awhile back which would cause
the drift value to change could be doing the same thing (although I
thought that landed in the 2.6.24 timeframe, but I may be forgetting).

I'll kick up some of my own testing between these two releases to see if
I can't find something similar.

Jesper: How long was the box up for when you noticed the ntpd noise?

Also what's the output of the following under the different kernels:
ntpdc -c peers
ntpdc -c kerninfo

thanks
-john

Linus Torvalds

unread,
Feb 26, 2009, 5:32:26 PM2/26/09
to Thomas Gleixner, john stultz, Jesper Krogh, Linux Kernel Mailing List, Len Brown

On Thu, 26 Feb 2009, Linus Torvalds wrote:

>
>
> On Thu, 26 Feb 2009, Thomas Gleixner wrote:
> >
> > Is the delta anything NTP might get upset about:
> >
> > 2.6.26: time.c: Detected 2311.847 MHz processor.
> > 2.6.29: Detected 2310.029 MHz processor.
> >
> > If yes, then we need to fix NTP not the calibration code :)
>
> Well, that _is_ about 500ppm difference

Doing the math rather than just eyeballing it, I think it's closer to
800ppm than 500ppm. But maybe I did that wrong too.

Which is definitely pretty far out. The theory is that if we can catch the
edge of the PIT timer to 1us, and even if we get it maximally wrong at
beginning/end (ie the difference is off by 2us), a 2us error over 15ms
should be on the order of just a 133ppm error.

So 800ppm looks too big. We're clearly not getting to within 1us of the
PIT timer event edge. But it would be interesting to hear whether making
teh 15ms be 30ms will get us to a better place, and make ntp happier.

And maybe my math is just wrong, and it's not the "within 1us" assumption
that was wrong.

Linus Torvalds

unread,
Feb 26, 2009, 5:41:33 PM2/26/09
to john stultz, Thomas Gleixner, Jesper Krogh, Linux Kernel Mailing List, Len Brown

On Thu, 26 Feb 2009, john stultz wrote:
>
> I'll kick up some of my own testing between these two releases to see if
> I can't find something similar.

Since the PIT timer read is possibly hw-dependent, it might be that you
can't necessarily reproduce it on some random hardware.

How sensitive is ntpd to (stable) drift? IOW, if we get the calibration
wrong, the TSC should still hopefully be very _stable_, it's just that the
initial guesstimate for the frequency is off and ntp would have to correct
for that.

The easiest way to test might be to just force a 1000ppm estimation error
with something like this total hack (indented just so that nobody would
ever apply this by mistake):

arch/x86/kernel/tsc.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 599e581..b80a0c4 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -350,6 +350,10 @@ static unsigned long quick_pit_calibrate(void)
delta = (t2 - t1)*PIT_TICK_RATE;
do_div(delta, QUICK_PIT_ITERATIONS*256*1000);
printk("Fast TSC calibration using PIT\n");
+
+ /* HACK! */
+ delta -= delta >> 10;
+
return delta;
}
failed:

which wouldn't be hardware-dependent.

Linus

john stultz

unread,
Feb 26, 2009, 6:00:05 PM2/26/09
to Linus Torvalds, Thomas Gleixner, Jesper Krogh, Linux Kernel Mailing List, Len Brown
On Thu, 2009-02-26 at 14:40 -0800, Linus Torvalds wrote:
>
> On Thu, 26 Feb 2009, john stultz wrote:
> >
> > I'll kick up some of my own testing between these two releases to see if
> > I can't find something similar.
>
> Since the PIT timer read is possibly hw-dependent, it might be that you
> can't necessarily reproduce it on some random hardware.
>
> How sensitive is ntpd to (stable) drift? IOW, if we get the calibration
> wrong, the TSC should still hopefully be very _stable_, it's just that the
> initial guesstimate for the frequency is off and ntp would have to correct
> for that.

NTP can adjust the clock about +/-500ppm (so a 1000ppm range). Past that
it starts throwing errors.

Part of the issue is that if the drift value changes in between boots,
NTPd can take a while to settle down on the right freq. I suspect that's
whats happening here, and should the box be left alone for a few hours
(maybe overnight) NTPd will find the new drift correction the issue will
go away.

Thomas tripped over this a little while back when the
NTP_INTERVAL_LENGTH change landed, but I think that was prior to 2.6.26,
so its probably the calibration changes discussed, but I wanted to see
if there were any other slight changes that might be contributing to the
issue as well.

thanks
-john

Jesper Krogh

unread,
Feb 27, 2009, 1:31:32 AM2/27/09
to john stultz, Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner
john stultz wrote:
> On Thu, 2009-02-26 at 22:35 +0100, Jesper Krogh wrote:
>> john stultz wrote:
>>> On Thu, Feb 26, 2009 at 12:43 PM, Jesper Krogh <jes...@krogh.cc> wrote:
>>>> Linus Torvalds wrote:
>>>>> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>>>>>> 2.6.26.8 doesnt have this problem.
>>>>>>
>>>>>> The "current_clocsource" is the same on both systems.
>>>>>>
>>>>>> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
>>>>>> tsc
>>>>> What does the frequency calibrate to? It should be in the dmesg. Does it
>>>>> differ by a big amount?
>>>> Non-working:
>>>> $ dmesg | grep -i freq
>>>> [ 0.004007] Calibrating delay loop (skipped), value calculated using
>>>> timer frequency.. 4620.05 BogoMIPS (lpj=9240104)
>>>>
>>>> 2.6.26.8 doesn't have that information.
>>> I'm surprised the clocksource watchdog isn't catching it.
>>>
>>> What's the output from:
>>> cat /sys/devices/system/clocksource/clocksource0/available_clocksource
>> $ cat /sys/devices/system/clocksource/clocksource0/available_clocksource
>> tsc acpi_pm jiffies
>
> Hmm. Does booting w/ "clocksourc=acpi_pm" also show the severe (~550ppm,
> which NTP can't handle) drift?

I booted another server (identical hardware) with the same kernel and
the above clocksource line, it has run over night (8 hours) with full
load and ntp has not complained about anything on that server.

>>From the dmesg, I don't see any major calibration difference right off.
>
> So I'd suspect something like TSC halting in idle could be causing
> problems, but the watchdog should catch that as well. My only guess at
> this point is that the ACPI PM is halting in idle along with the TSC.
>
> And you said this only happens under load?

I cant say that, but I've only observed it under load.

--
Jesper

Jesper Krogh

unread,
Feb 27, 2009, 1:47:58 AM2/27/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

I was booted Feb 25 21:58 .. the first noice from ntp starts here:
Feb 25 22:09:53 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 25 22:09:56 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 25 22:14:08 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 25 22:16:20 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
Feb 25 22:32:25 quad12 ntpd[4901]: time reset -1.601641 s
Feb 25 22:36:18 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
Feb 25 22:36:45 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
Feb 25 22:51:41 quad12 ntpd[4901]: time reset -0.922993 s
Feb 25 22:55:05 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13

> Also what's the output of the following under the different kernels:
> ntpdc -c peers
> ntpdc -c kerninfo

Working (clocksource=acpi_pm) 2.6.29-rc6
jk@quad02:~$ ntpdc -c kerninfo
pll offset: -0.001577 s
pll frequency: -45.787 ppm
maximum error: 0.066739 s
estimated error: 0.000768 s
status: 0001 pll
pll time constant: 6
precision: 1e-06 s
frequency tolerance: 500 ppm
jk@quad02:~$ ntpdc -c peers
remote local st poll reach delay offset disp
=======================================================================
*hal.nzcorp.net 10.194.132.81 4 64 377 0.00008 0.003752 0.04816
=svn.nzcorp.net 10.194.132.81 4 64 377 0.00009 -0.008724 0.04979
=LOCAL(0) 127.0.0.1 13 64 377 0.00000 0.000000 0.03082


Working (clocksource=tsc) 2.6.26.8
jk@quad03:~$ ntpdc -c kerninfo
pll offset: 0.003208 s
pll frequency: -25.070 ppm
maximum error: 0.833193 s
estimated error: 0.002787 s
status: 4001 pll
pll time constant: 10
precision: 1e-06 s
frequency tolerance: 500 ppm
jk@quad03:~$ ntpdc -c peers
remote local st poll reach delay offset disp
=======================================================================
*hal.nzcorp.net 10.194.132.82 4 1024 377 0.00781 0.006788 0.13666
=sal.nzcorp.net 10.194.132.82 4 1024 377 0.00018 -0.000541 0.12175
=LOCAL(0) 127.0.0.1 13 64 377 0.00000 0.000000 0.03041

Non-working (clocksource=tsc) 2.6.29-rc6
jk@quad12:~$ ntpdc -c kerninfo
pll offset: 0 s
pll frequency: -34.754 ppm
maximum error: 0.023514 s
estimated error: 0 s
status: 0001 pll
pll time constant: 6
precision: 1e-06 s
frequency tolerance: 500 ppm
jk@quad12:~$ ntpdc -c peers
remote local st poll reach delay offset disp
=======================================================================
=hal.nzcorp.net 10.194.132.91 4 64 17 0.00011 -0.069377 0.96895
=trac.nzcorp.net 10.194.132.91 4 64 17 0.00011 -0.096107 0.96904
*LOCAL(0) 127.0.0.1 13 64 17 0.00000 0.000000 0.96857

Ingo Molnar

unread,
Feb 27, 2009, 2:33:53 AM2/27/09
to john stultz, Linus Torvalds, Thomas Gleixner, Jesper Krogh, Linux Kernel Mailing List, Len Brown

* john stultz <john...@us.ibm.com> wrote:

> On Thu, 2009-02-26 at 14:40 -0800, Linus Torvalds wrote:
> >
> > On Thu, 26 Feb 2009, john stultz wrote:
> > >
> > > I'll kick up some of my own testing between these two releases to see if
> > > I can't find something similar.
> >
> > Since the PIT timer read is possibly hw-dependent, it might be that you
> > can't necessarily reproduce it on some random hardware.
> >
> > How sensitive is ntpd to (stable) drift? IOW, if we get the calibration
> > wrong, the TSC should still hopefully be very _stable_, it's just that the
> > initial guesstimate for the frequency is off and ntp would have to correct
> > for that.
>
> NTP can adjust the clock about +/-500ppm (so a 1000ppm range).
> Past that it starts throwing errors.

Well, it will start throwing errors but still it will correct
the clock and find the frequency delta between the host clock
and the reference clock just fine, and converge in a couple of
hours, correct?

500ppm is 0.05% of a frequency drift which is awfully small -
thermal effects alone can cause such differences so it should
not be anything out of the ordinary for ntpd.

> Part of the issue is that if the drift value changes in
> between boots, NTPd can take a while to settle down on the
> right freq. I suspect that's whats happening here, and should
> the box be left alone for a few hours (maybe overnight) NTPd
> will find the new drift correction the issue will go away.

If the default poll interval of 64 seconds is used then it can
take that much time - so i'd sugges to decrease that to below 10
seconds.

It's not like the frequency is changing rapidly here. The
correction pattern to find is a very simple and very static and
reliable multiplicator of ~1.000800 between the two frequencies.

Say the over-the-network reference clock ntpd follows has a 10
msecs of intrinsic observation noise. For that 10 msecs noise to
go down to the 10 ppm range [to the local but drifted time
source which has ~10 ppm precision straight away], we need
roughly 1000 samples. [simplified, fewer are enough in reality,
especially if you have some known-to-have-converged-before
cached value to start out with.]

1000 samples with 64 seconds intervals can take half a day to
converge. 1000 samples with 1 second intervals takes just 15
minutes to converge.

We'll improve in-kernel calibration but calibration noise in the
0.05% range should be expected in some cases.

Ingo

john stultz

unread,
Feb 27, 2009, 3:38:26 PM2/27/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Fri, 2009-02-27 at 07:47 +0100, Jesper Krogh wrote:
> john stultz wrote:
> > Jesper: How long was the box up for when you noticed the ntpd noise?
>
> I was booted Feb 25 21:58 .. the first noice from ntp starts here:
> Feb 25 22:09:53 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
> Feb 25 22:09:56 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
> Feb 25 22:14:08 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
> Feb 25 22:16:20 quad12 ntpd[4901]: synchronized to 10.194.133.13, stratum 4
> Feb 25 22:32:25 quad12 ntpd[4901]: time reset -1.601641 s
> Feb 25 22:36:18 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13
> Feb 25 22:36:45 quad12 ntpd[4901]: synchronized to 10.194.133.12, stratum 4
> Feb 25 22:51:41 quad12 ntpd[4901]: time reset -0.922993 s
> Feb 25 22:55:05 quad12 ntpd[4901]: synchronized to LOCAL(0), stratum 13

Ok, so that's not very long. I'd expect by now, if the box is still up,
the messages have stopped. Is that true, or is it still resetting?


> > Also what's the output of the following under the different kernels:
> > ntpdc -c peers
> > ntpdc -c kerninfo

[snip]


> Working (clocksource=tsc) 2.6.26.8
> jk@quad03:~$ ntpdc -c kerninfo
> pll offset: 0.003208 s
> pll frequency: -25.070 ppm

[snip]


> Non-working (clocksource=tsc) 2.6.29-rc6
> jk@quad12:~$ ntpdc -c kerninfo
> pll offset: 0 s
> pll frequency: -34.754 ppm


Ok, so it seems ntp hasn't really had a chance to settle down, its only
made a 10ppm adjustment so far. NTPd will stop corrections at ~
+/-500ppm, so you're not at that bound yet, where things would be really
broken.

If the affected kernel isn't resetting in the logs anymore, I'd be
interested in what the new ppm value is.

thanks
-john

john stultz

unread,
Feb 27, 2009, 3:50:35 PM2/27/09
to Ingo Molnar, Linus Torvalds, Thomas Gleixner, Jesper Krogh, Linux Kernel Mailing List, Len Brown
On Fri, 2009-02-27 at 08:33 +0100, Ingo Molnar wrote:
> * john stultz <john...@us.ibm.com> wrote:
>
> > On Thu, 2009-02-26 at 14:40 -0800, Linus Torvalds wrote:
> > >
> > > On Thu, 26 Feb 2009, john stultz wrote:
> > > >
> > > > I'll kick up some of my own testing between these two releases to see if
> > > > I can't find something similar.
> > >
> > > Since the PIT timer read is possibly hw-dependent, it might be that you
> > > can't necessarily reproduce it on some random hardware.
> > >
> > > How sensitive is ntpd to (stable) drift? IOW, if we get the calibration
> > > wrong, the TSC should still hopefully be very _stable_, it's just that the
> > > initial guesstimate for the frequency is off and ntp would have to correct
> > > for that.
> >
> > NTP can adjust the clock about +/-500ppm (so a 1000ppm range).
> > Past that it starts throwing errors.
>
> Well, it will start throwing errors but still it will correct
> the clock and find the frequency delta between the host clock
> and the reference clock just fine, and converge in a couple of
> hours, correct?

No NTP spec limits the freq correction to ~+/-500ppm. Once NTPd hits
that 500ppm wall, it will throw an error and stop trying to sync the
clock.

> 500ppm is 0.05% of a frequency drift which is awfully small -
> thermal effects alone can cause such differences so it should
> not be anything out of the ordinary for ntpd.

Practically I've not seen boxes that vary that much. I've seen very poor
systems who's crystals are off by ~280ppm, but those don't vary that
much over time much.


> > Part of the issue is that if the drift value changes in
> > between boots, NTPd can take a while to settle down on the
> > right freq. I suspect that's whats happening here, and should
> > the box be left alone for a few hours (maybe overnight) NTPd
> > will find the new drift correction the issue will go away.
>
> If the default poll interval of 64 seconds is used then it can
> take that much time - so i'd sugges to decrease that to below 10
> seconds.

Indeed. Shortening the maxpoll value in the ntp.conf greatly improves
how fast and how close the client will sync to the server, but take
caution, as that can cause undue load on public time servers.

thanks
-john

Jesper Krogh

unread,
Mar 1, 2009, 8:52:13 AM3/1/09
to john stultz, Linus Torvalds, Linux Kernel Mailing List, Thomas Gleixner

That wasn't true.. I got some real sunday testing done today. A fresh
2.6.28.7 has the same problem with a load of 0.00 0.00 0.00

2.6.27.19 doesn't have problems keeping time.

--
Jesper

Jesper Krogh

unread,
Mar 1, 2009, 10:05:31 AM3/1/09
to Linus Torvalds, john stultz, Linux Kernel Mailing List
Linus Torvalds wrote:
>
> On Thu, 26 Feb 2009, Jesper Krogh wrote:
>>> Also mind sending the full dmesg for both kernels?
>> http://krogh.cc/~jesper/dmesg-2.6.29-rc6.txt
>> http://krogh.cc/~jesper/dmesg-2.6.26.8.txt
>
> Try changing
>
> #define QUICK_PIT_MS 15
>
> in arch/x86/kernel/tsc.c into something bigger. Let's say just doubling
> it to 30. Does that change anything?

It seems to "slow down" the process (time from bootup to first clock
reset).

Mar 1 15:38:41 quad01 ntpd[4603]: synchronized to LOCAL(0), stratum 13
Mar 1 15:38:41 quad01 ntpd[4603]: kernel time sync status change 0001
Mar 1 15:39:47 quad01 ntpd[4603]: synchronized to 10.194.133.13, stratum 4
Mar 1 15:43:02 quad01 ntpd[4603]: synchronized to 10.194.133.12, stratum 4
Mar 1 15:53:41 quad01 ntpd[4603]: time reset -0.352221 s
Mar 1 15:57:18 quad01 ntpd[4603]: synchronized to LOCAL(0), stratum 13
Mar 1 15:58:23 quad01 ntpd[4603]: synchronized to 10.194.133.13, stratum 4
jk@quad01:~$ w
16:03:29 up 28 min, 2 users, load average: 0.04, 0.01, 0.00

--
Jesper

Jesper Krogh

unread,
Mar 1, 2009, 10:09:24 AM3/1/09
to Linus Torvalds, Linux Kernel Mailing List
Jesper Krogh wrote:
> The "current_clocsource" is the same on both systems.
>
> $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> tsc

What selects the "current_clocksource"? I tried to boot one of the
kernels hat have the problem on another piece of hardware and on that
system it ended up defaulting to "acpi_pm" instead of "tsc".

http://krogh.cc/~jesper/dmesg-2.6.28.7.txt

"acpi_pm" seems to be reliable all the time.

Sitsofe Wheeler

unread,
Mar 1, 2009, 10:44:55 AM3/1/09
to Jesper Krogh, Linus Torvalds, Linux Kernel Mailing List
On Sun, Mar 01, 2009 at 04:09:03PM +0100, Jesper Krogh wrote:
> Jesper Krogh wrote:
> >The "current_clocsource" is the same on both systems.
> >
> >$ cat /sys/devices/system/clocksource/clocksource0/current_clocksource
> >tsc
>
> What selects the "current_clocksource"? I tried to boot one of the
> kernels hat have the problem on another piece of hardware and on that
> system it ended up defaulting to "acpi_pm" instead of "tsc".

I believe different clock sources have different priorities based on
their resolution and behaviour. Clock sources's that "go bad" because
hardware interactions are hopefully detected and subsequent "best" clock
sources are then tried.

There was a nice treatment of different clocksourcs in this
kernelnewbies thread:
http://www.mail-archive.com/kernel...@nl.linux.org/msg05164.html .

--
Sitsofe | http://sucs.org/~sits/

Jesper Krogh

unread,
Mar 1, 2009, 3:13:57 PM3/1/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
john stultz wrote:
>> Working (clocksource=tsc) 2.6.26.8
>> jk@quad03:~$ ntpdc -c kerninfo
>> pll offset: 0.003208 s
>> pll frequency: -25.070 ppm
> [snip]
>> Non-working (clocksource=tsc) 2.6.29-rc6
>> jk@quad12:~$ ntpdc -c kerninfo
>> pll offset: 0 s
>> pll frequency: -34.754 ppm
>
>
> Ok, so it seems ntp hasn't really had a chance to settle down, its only
> made a 10ppm adjustment so far. NTPd will stop corrections at ~
> +/-500ppm, so you're not at that bound yet, where things would be really
> broken.

But I should settle within a "reasonable" period of time? (not hours?).

> If the affected kernel isn't resetting in the logs anymore, I'd be
> interested in what the new ppm value is.

I keeps resetting after 7 hours .. Is there more information I can
provide?

Jesper

--
Jesper

Jesper Krogh

unread,
Mar 2, 2009, 4:54:10 AM3/2/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
john stultz wrote:
> Ok, so it seems ntp hasn't really had a chance to settle down, its only
> made a 10ppm adjustment so far. NTPd will stop corrections at ~
> +/-500ppm, so you're not at that bound yet, where things would be really
> broken.
>
> If the affected kernel isn't resetting in the logs anymore, I'd be
> interested in what the new ppm value is.

After 20 hours.. its still resetting.
Mar 2 10:43:24 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
Mar 2 10:50:37 quad12 ntpd[4416]: time reset -1.103654 s
jk@quad12:~$ uptime
10:51:36 up 20:46, 1 user, load average: 0.00, 0.00, 0.00

And it hasn't shifted clocksource either.

jk@quad12:~$ cat
/sys/devices/system/clocksource/clocksource0/current_clocksource
tsc

--
Jesper

john stultz

unread,
Mar 2, 2009, 4:28:24 PM3/2/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Mon, 2009-03-02 at 10:53 +0100, Jesper Krogh wrote:
> john stultz wrote:
> > Ok, so it seems ntp hasn't really had a chance to settle down, its only
> > made a 10ppm adjustment so far. NTPd will stop corrections at ~
> > +/-500ppm, so you're not at that bound yet, where things would be really
> > broken.
> >
> > If the affected kernel isn't resetting in the logs anymore, I'd be
> > interested in what the new ppm value is.
>
> After 20 hours.. its still resetting.
> Mar 2 10:43:24 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
> Mar 2 10:50:37 quad12 ntpd[4416]: time reset -1.103654 s

So what's the "ntpdc -c kerninfo" output now?

thanks
-john

Jesper Krogh

unread,
Mar 3, 2009, 1:04:28 AM3/3/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
john stultz wrote:
> On Mon, 2009-03-02 at 10:53 +0100, Jesper Krogh wrote:
>> john stultz wrote:
>>> Ok, so it seems ntp hasn't really had a chance to settle down, its only
>>> made a 10ppm adjustment so far. NTPd will stop corrections at ~
>>> +/-500ppm, so you're not at that bound yet, where things would be really
>>> broken.
>>>
>>> If the affected kernel isn't resetting in the logs anymore, I'd be
>>> interested in what the new ppm value is.
>> After 20 hours.. its still resetting.
>> Mar 2 10:43:24 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
>> Mar 2 10:50:37 quad12 ntpd[4416]: time reset -1.103654 s
>
> So what's the "ntpdc -c kerninfo" output now?

Mar 3 06:41:10 quad12 ntpd[4416]: time reset -0.813957 s
Mar 3 06:45:20 quad12 ntpd[4416]: synchronized to LOCAL(0), stratum 13
Mar 3 06:45:36 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
Mar 3 06:51:57 quad12 ntpd[4416]: synchronized to 10.194.133.13, stratum 4
Mar 3 07:00:29 quad12 ntpd[4416]: time reset -0.783390 s


jk@quad12:~$ ntpdc -c kerninfo
pll offset: 0 s

pll frequency: -28.691 ppm
maximum error: 1.0433 s


estimated error: 0 s
status: 0001 pll

pll time constant: 4


precision: 1e-06 s
frequency tolerance: 500 ppm

jk@quad12:~$ w
07:03:17 up 1 day, 16:59, 1 user, load average: 0.00, 0.00, 0.00

john stultz

unread,
Mar 3, 2009, 3:02:01 PM3/3/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Tue, 2009-03-03 at 07:04 +0100, Jesper Krogh wrote:
> john stultz wrote:
> > On Mon, 2009-03-02 at 10:53 +0100, Jesper Krogh wrote:
> >> john stultz wrote:
> >>> Ok, so it seems ntp hasn't really had a chance to settle down, its only
> >>> made a 10ppm adjustment so far. NTPd will stop corrections at ~
> >>> +/-500ppm, so you're not at that bound yet, where things would be really
> >>> broken.
> >>>
> >>> If the affected kernel isn't resetting in the logs anymore, I'd be
> >>> interested in what the new ppm value is.
> >> After 20 hours.. its still resetting.
> >> Mar 2 10:43:24 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
> >> Mar 2 10:50:37 quad12 ntpd[4416]: time reset -1.103654 s
> >
> > So what's the "ntpdc -c kerninfo" output now?
>
> Mar 3 06:41:10 quad12 ntpd[4416]: time reset -0.813957 s
> Mar 3 06:45:20 quad12 ntpd[4416]: synchronized to LOCAL(0), stratum 13
> Mar 3 06:45:36 quad12 ntpd[4416]: synchronized to 10.194.133.12, stratum 4
> Mar 3 06:51:57 quad12 ntpd[4416]: synchronized to 10.194.133.13, stratum 4
> Mar 3 07:00:29 quad12 ntpd[4416]: time reset -0.783390 s
> jk@quad12:~$ ntpdc -c kerninfo
> pll offset: 0 s
> pll frequency: -28.691 ppm


This is baffling. You've only gone from -34.754ppm to -28.691ppm in over
a day? And you're still not syncing? If the calibration was so bad that
NTP couldn't sync, I'd expect the freq value to hit +/-500ppm before it
gave up. This just doesn't follow my expectations.

Could you provide:
/usr/sbin/ntpdc -c version

Do you see the same behavior if you drop all but one server (including
the local clock: 127.127.1.0)?

You might also add "minpoll 4 maxpoll 4" to the server line to speed up
testing.

Actually, if you could, I'd be interested if you could send your
ntp.conf

thanks
-john

Jesper Krogh

unread,
Mar 3, 2009, 3:20:03 PM3/3/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

It's resetting.. without deep knowledge about ntp, doesnt that mean
"start over again"? I believe it hits +/-500ppm

> Could you provide:
> /usr/sbin/ntpdc -c version

$ ntpdc -c version
ntpdc 4.2...@1.1520-o Tue Jan 6 15:51:00 UTC 2009 (1)

> Do you see the same behavior if you drop all but one server (including
> the local clock: 127.127.1.0)?
>
> You might also add "minpoll 4 maxpoll 4" to the server line to speed up
> testing.

Will try those option while debugging.

> Actually, if you could, I'd be interested if you could send your
> ntp.conf

http://krogh.cc/~jesper/ntp.conf

But this seems to be a "regression". Since 2.6.27.19 doesn't misbehave.
Same NTP, same configuration, same hardware. only change is the kernel
version. Or am I missing some parameter here?

Would it make sense to try to bisect it?

Jesper

--
Jesper

Jesper Krogh

unread,
Mar 3, 2009, 3:40:08 PM3/3/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
john stultz wrote:
> Do you see the same behavior if you drop all but one server (including
> the local clock: 127.127.1.0)?

Yes.
Mar 3 21:20:59 quad12 ntpd[2435]: ntpd 4.2...@1.1520-o Tue Jan 6
15:50:55 UTC 2009 (1)
Mar 3 21:20:59 quad12 ntpd[2436]: precision = 1.000 usec
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #0 wildcard,
0.0.0.0#123 Disabled
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #1 wildcard,
::#123 Disabled
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #2 lo, ::1#123
Enabled
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #3 bond0,
fe80::21e:68ff:fe57:8169#123 Enabled
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #4 lo,
127.0.0.1#123 Enabled
Mar 3 21:20:59 quad12 ntpd[2436]: Listening on interface #5 bond0,
10.194.132.91#123 Enabled
Mar 3 21:20:59 quad12 ntpd[2436]: kernel time sync status 0040
Mar 3 21:20:59 quad12 ntpd[2436]: frequency initialized -29.286 PPM
from /var/lib/ntp/ntp.drift
Mar 3 21:21:58 quad12 ntpd[2436]: synchronized to 10.194.133.12, stratum 4
Mar 3 21:21:58 quad12 ntpd[2436]: time reset -6.148275 s
Mar 3 21:21:58 quad12 ntpd[2436]: kernel time sync status change 0001
Mar 3 21:25:01 quad12 ntpd[2436]: synchronized to 10.194.133.12, stratum 4
Mar 3 21:37:03 quad12 ntpd[2436]: time reset -0.664351 s

Only one server and the minpoll 4 maxpoll 4 options to the server line.

--
Jesper

john stultz

unread,
Mar 3, 2009, 5:24:45 PM3/3/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

Well, it may still need a few hours to settle. :) Again, those time
resets are seen when NTPd doesn't have a good drift ppm at startup, and
it has to find it.

thanks
-john

john stultz

unread,
Mar 3, 2009, 5:25:04 PM3/3/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

No, the "time reset" message means that when the offset is larger
then .125sec (the slew boundary), NTPd has corrected it by calling
settimeofday instead of slewing the clock.

Here's some background about how NTP and the kernel interact:
Every time NTPd calls adjtimex(), its provides the current offset from
the tracked ntp server. The kernel takes this offset and applies a
temporary correction factor to the clocksource frequency to converge
that offset. It also takes the provided offset, dampens it, and then
uses the result to adjust the frequency value. Once the freq value hits
the max adjustment value (+/- 500ppm), then NTP will start throwing
error messages and give up.

The part that is so odd with your data, is that the freq value isn't
changing very much. After a time reset, I'd expect to see adjustments in
the 100us, then multiple ms, and only once we get above 100ms to see
another time reset. All the while, these adjustment values should be
tweaking the freq value, causing the clocks to converge.

The case I can think of that could cause this, is if the drift is
somehow jumping above the slew boundary before NTPd actually makes any
adjtimex calls, so we end up with minimal correction to the freq value,
but that still doesn't completely vibe with the data.


> > Could you provide:
> > /usr/sbin/ntpdc -c version
>
> $ ntpdc -c version
> ntpdc 4.2...@1.1520-o Tue Jan 6 15:51:00 UTC 2009 (1)
>
> > Do you see the same behavior if you drop all but one server (including
> > the local clock: 127.127.1.0)?
> >
> > You might also add "minpoll 4 maxpoll 4" to the server line to speed up
> > testing.
>
> Will try those option while debugging.
>
> > Actually, if you could, I'd be interested if you could send your
> > ntp.conf
>
> http://krogh.cc/~jesper/ntp.conf

Cool, I see you're collecting stats already. Depending on the results of
the tests above I may want to check those out as well.

> But this seems to be a "regression". Since 2.6.27.19 doesn't misbehave.
> Same NTP, same configuration, same hardware. only change is the kernel
> version. Or am I missing some parameter here?
>
> Would it make sense to try to bisect it?

Well, I suspect you'll just bisect it to the fast-pit TSC calibration
causing a different correction freq to be needed for synchronization.
The odd part is that the userland NTPd isn't behaving as I'd expect if
the TSC calibration was really so bad that NTP couldn't handle it.

Bisection may be something worth trying just to verify or disprove that
theory, so if you have the time, it would be interesting to see. But if
the theory is true then we're back to the same spot.

I guess something to test my idea above (that the drift is bad enough
that NTPd isn't making slew adjustments via adjtimex offset) is to
remove NTPd from the init.d startup.

Then after rebooting (into 2.6.29), run the attached python script for
10 minutes or so to get an idea of the ppm drift. Then repeat with
2.6.26.

To run:
/drift-test.py <ntp server>

It will give some wild ppm numbers, but after a few minutes it should
settle down to the "natural drift" of the system.

thanks
-john

drift-test.py

Jesper Krogh

unread,
Mar 4, 2009, 12:36:38 AM3/4/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

With one server and the maxpoll minpoll stuff, this on "settled" after a
bit more than 3 hours:
Mar 4 01:14:05 quad12 ntpd[2436]: time reset -0.381826 s
Mar 4 01:15:39 quad12 ntpd[2436]: synchronized to 10.194.133.12, stratum 4
jk@quad12:~$ uptime
06:35:40 up 15:55, 1 user, load average: 0.00, 0.00, 0.00
jk@quad12:~$ ntpq -c peers
remote refid st t when poll reach delay offset
jitter
==============================================================================
*bioinf.nzcorp.n 10.192.96.19 4 u 8 16 377 0.098 -80.184
0.673
jk@quad12:~$ ntpdc -c kerinfo
***Command `kerinfo' unknown


jk@quad12:~$ ntpdc -c kerninfo

pll offset: -0.06619 s
pll frequency: -500.000 ppm
maximum error: 0.130081 s
estimated error: 0.001201 s


status: 0001 pll
pll time constant: 4
precision: 1e-06 s
frequency tolerance: 500 ppm

--
Jesper

Jesper Krogh

unread,
Mar 4, 2009, 10:31:31 AM3/4/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
john stultz wrote:
> I guess something to test my idea above (that the drift is bad enough
> that NTPd isn't making slew adjustments via adjtimex offset) is to
> remove NTPd from the init.d startup.
>
> Then after rebooting (into 2.6.29), run the attached python script for
> 10 minutes or so to get an idea of the ppm drift. Then repeat with
> 2.6.26.
>
> To run:
> ./drift-test.py <ntp server>

>
> It will give some wild ppm numbers, but after a few minutes it should
> settle down to the "natural drift" of the system.

Ok. I removed ntpd from the system... heres is from "non-working
2.6.28.7 kernel".
04 Mar 14:59:16 offset: -0.139829 drift: -656.0 ppm
04 Mar 15:00:16 offset: -0.175233 drift: -591.147540984 ppm
04 Mar 15:01:16 offset: -0.210637 drift: -590.611570248 ppm
04 Mar 15:02:16 offset: -0.246033 drift: -590.386740331 ppm
04 Mar 15:03:17 offset: -0.28144 drift: -587.880165289 ppm
04 Mar 15:04:17 offset: -0.31684 drift: -588.301324503 ppm
04 Mar 15:05:17 offset: -0.352247 drift: -588.602209945 ppm
04 Mar 15:06:17 offset: -0.387649 drift: -588.805687204 ppm
04 Mar 15:07:17 offset: -0.423046 drift: -588.94813278 ppm
04 Mar 15:08:17 offset: -0.458451 drift: -589.073800738 ppm
04 Mar 15:09:18 offset: -0.493856 drift: -588.1973466 ppm
04 Mar 15:10:18 offset: -0.529265 drift: -588.374057315 ppm
04 Mar 15:11:18 offset: -0.564661 drift: -588.503457815 ppm
04 Mar 15:12:18 offset: -0.600063 drift: -588.620689655 ppm
04 Mar 15:13:18 offset: -0.635458 drift: -588.712930012 ppm
04 Mar 15:14:18 offset: -0.040699 drift: 109.052048726 ppm
04 Mar 15:15:18 offset: -0.076098 drift: 65.4984423676 ppm
04 Mar 15:16:18 offset: -0.111495 drift: 27.0557184751 ppm
04 Mar 15:17:18 offset: -0.146885 drift: -7.12096029548 ppm
04 Mar 15:18:19 offset: -0.182285 drift: -37.6853146853 ppm
04 Mar 15:19:19 offset: -0.217688 drift: -65.2117940199 ppm
04 Mar 15:20:19 offset: -0.253085 drift: -90.1202531646 ppm
04 Mar 15:21:19 offset: -0.288479 drift: -112.768882175 ppm
04 Mar 15:22:19 offset: -0.323866 drift: -133.448699422 ppm
04 Mar 15:23:19 offset: -0.359259 drift: -152.414127424 ppm
04 Mar 15:24:20 offset: -0.394648 drift: -169.750830565 ppm
04 Mar 15:25:20 offset: -0.430047 drift: -185.861980831 ppm
04 Mar 15:26:20 offset: -0.46544 drift: -200.779692308 ppm
04 Mar 15:27:20 offset: -0.500835 drift: -214.63620178 ppm
04 Mar 15:28:20 offset: -0.536221 drift: -227.534670487 ppm
04 Mar 15:29:20 offset: -0.571605 drift: -239.574515235 ppm
04 Mar 15:30:21 offset: -0.606992 drift: -250.706859593 ppm
04 Mar 15:31:21 offset: -0.64241 drift: -261.286085151 ppm
04 Mar 15:32:21 offset: -0.677792 drift: -271.20795569 ppm
04 Mar 15:33:21 offset: -0.713187 drift: -280.554252199 ppm
04 Mar 15:34:21 offset: -0.040744 drift: 46.7374169041 ppm
04 Mar 15:35:21 offset: -0.076145 drift: 29.0987996307 ppm
04 Mar 15:36:21 offset: -0.111551 drift: 12.4088050314 ppm
04 Mar 15:37:21 offset: -0.146952 drift: -3.40288713911 ppm

And from working 2.6.27.19 kernel.

jk@quad12:~$ python drift-test.py 10.192.96.19
04 Mar 16:17:23 offset: -0.006929 drift: -62.0 ppm
04 Mar 16:18:24 offset: -0.010252 drift: -54.5967741935 ppm
04 Mar 16:19:24 offset: -0.013574 drift: -54.9754098361 ppm
04 Mar 16:20:24 offset: -0.016897 drift: -55.1098901099 ppm
04 Mar 16:21:24 offset: -0.020233 drift: -55.2314049587 ppm
04 Mar 16:22:24 offset: -0.023566 drift: -55.2947019868 ppm
04 Mar 16:23:24 offset: -0.026895 drift: -55.3259668508 ppm
04 Mar 16:24:24 offset: -0.030217 drift: -55.3317535545 ppm
04 Mar 16:25:24 offset: -0.033539 drift: -55.3360995851 ppm
04 Mar 16:26:24 offset: -0.036865 drift: -55.3468634686 ppm
04 Mar 16:27:25 offset: -0.038266 drift: -52.0713101161 ppm
04 Mar 16:28:25 offset: -0.039747 drift: -49.592760181 ppm
04 Mar 16:29:25 offset: -0.041331 drift: -47.6680497925 ppm

Jesper Krogh

unread,
Mar 4, 2009, 1:37:25 PM3/4/09
to john stultz, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
Jesper Krogh wrote:
> john stultz wrote:
>> I guess something to test my idea above (that the drift is bad enough
>> that NTPd isn't making slew adjustments via adjtimex offset) is to
>> remove NTPd from the init.d startup.
>>
>> Then after rebooting (into 2.6.29), run the attached python script for
>> 10 minutes or so to get an idea of the ppm drift. Then repeat with
>> 2.6.26.
>>
>> To run: ./drift-test.py <ntp server>
>>
>> It will give some wild ppm numbers, but after a few minutes it should
>> settle down to the "natural drift" of the system.
>
> Ok. I removed ntpd from the system... heres is from "non-working

Updated. I think I has NTPd running in the former "non-working" test. I
just tried to reproduce the numbers, and they look like this
(reproducible on 2.6.29-rc6).

jk@quad12:~$ python drift-test.py 10.192.96.19

04 Mar 19:27:10 offset: -0.157696 drift: -693.0 ppm
04 Mar 19:28:10 offset: -0.195134 drift: -625.098360656 ppm
04 Mar 19:29:10 offset: -0.232579 drift: -624.595041322 ppm
04 Mar 19:30:10 offset: -0.270021 drift: -624.408839779 ppm
04 Mar 19:31:11 offset: -0.307461 drift: -621.727272727 ppm
04 Mar 19:32:11 offset: -0.344903 drift: -622.185430464 ppm
04 Mar 19:33:11 offset: -0.382345 drift: -622.491712707 ppm
04 Mar 19:34:11 offset: -0.419794 drift: -622.727488152 ppm
04 Mar 19:35:11 offset: -0.457239 drift: -622.89626556 ppm

Still the same.

John Stultz

unread,
Mar 4, 2009, 1:59:48 PM3/4/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Wed, 2009-03-04 at 19:36 +0100, Jesper Krogh wrote:
> Jesper Krogh wrote:
> > john stultz wrote:
> >> I guess something to test my idea above (that the drift is bad enough
> >> that NTPd isn't making slew adjustments via adjtimex offset) is to
> >> remove NTPd from the init.d startup.
> >>
> >> Then after rebooting (into 2.6.29), run the attached python script for
> >> 10 minutes or so to get an idea of the ppm drift. Then repeat with
> >> 2.6.26.
> >>
> >> To run: ./drift-test.py <ntp server>
> >>
> >> It will give some wild ppm numbers, but after a few minutes it should
> >> settle down to the "natural drift" of the system.
> >
> > Ok. I removed ntpd from the system... heres is from "non-working
>
> Updated. I think I has NTPd running in the former "non-working" test. I
> just tried to reproduce the numbers, and they look like this
> (reproducible on 2.6.29-rc6).

Yea, the last numbers did look odd :)

> jk@quad12:~$ python drift-test.py 10.192.96.19
> 04 Mar 19:27:10 offset: -0.157696 drift: -693.0 ppm
> 04 Mar 19:28:10 offset: -0.195134 drift: -625.098360656 ppm
> 04 Mar 19:29:10 offset: -0.232579 drift: -624.595041322 ppm
> 04 Mar 19:30:10 offset: -0.270021 drift: -624.408839779 ppm
> 04 Mar 19:31:11 offset: -0.307461 drift: -621.727272727 ppm
> 04 Mar 19:32:11 offset: -0.344903 drift: -622.185430464 ppm
> 04 Mar 19:33:11 offset: -0.382345 drift: -622.491712707 ppm
> 04 Mar 19:34:11 offset: -0.419794 drift: -622.727488152 ppm
> 04 Mar 19:35:11 offset: -0.457239 drift: -622.89626556 ppm


Yea, so from this and the settled ntpdc -c kerninfo data before, we can
see that the drift is further out then the 500ppm NTP can handle.

So with that at least confirmed, we can focus back on to the fast-pit
tsc calibration code.

Ingo, Thomas: I'm missing a bit of the context to that patch, other then
just speeding up boot times, was there other rational for moving away
from the ACPI PM timer based calibration?

Could we maybe add a quick test that the pit reads actually take the
assumed 2us max? Doing this maybe via the HPET/ACPI PM?

thanks
-john

john stultz

unread,
Mar 4, 2009, 9:39:38 PM3/4/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown
On Wed, 2009-03-04 at 10:57 -0800, John Stultz wrote:
> On Wed, 2009-03-04 at 19:36 +0100, Jesper Krogh wrote:
> > jk@quad12:~$ python drift-test.py 10.192.96.19
> > 04 Mar 19:27:10 offset: -0.157696 drift: -693.0 ppm
> > 04 Mar 19:28:10 offset: -0.195134 drift: -625.098360656 ppm
> > 04 Mar 19:29:10 offset: -0.232579 drift: -624.595041322 ppm
> > 04 Mar 19:30:10 offset: -0.270021 drift: -624.408839779 ppm
> > 04 Mar 19:31:11 offset: -0.307461 drift: -621.727272727 ppm
> > 04 Mar 19:32:11 offset: -0.344903 drift: -622.185430464 ppm
> > 04 Mar 19:33:11 offset: -0.382345 drift: -622.491712707 ppm
> > 04 Mar 19:34:11 offset: -0.419794 drift: -622.727488152 ppm
> > 04 Mar 19:35:11 offset: -0.457239 drift: -622.89626556 ppm
>
>
> Yea, so from this and the settled ntpdc -c kerninfo data before, we can
> see that the drift is further out then the 500ppm NTP can handle.
>
> So with that at least confirmed, we can focus back on to the fast-pit
> tsc calibration code.
>
> Ingo, Thomas: I'm missing a bit of the context to that patch, other then
> just speeding up boot times, was there other rational for moving away
> from the ACPI PM timer based calibration?
>
> Could we maybe add a quick test that the pit reads actually take the
> assumed 2us max? Doing this maybe via the HPET/ACPI PM?

Hey Jesper,

Here's a very-hackish patch to see if the approach I'm considering
might fix the issue you're hitting. Could you apply it, boot the kernel
a few times and send me the following segments of the dmesg for each of
those boots (the example below is from my test box)?

tsc delta: 44418024
ref_freq: 3000100 pit_freq: 3000384
TSC: Fast PIT calibration matches PMTIMER.
TSC: PIT calibration matches PMTIMER. 1 loops
Detected 3000.045 MHz processor.

I'm trying to see how regular the mis-calculation is, as well as see how
well the alternate calibration method does to handle this on your
hardware.

Its likely the fat pit calibration can be better integrated with the
other calibration methods, so this probably isn't anything close to what
the actual fix will look like.

Ingo, Thomas: On the hardware I'm testing the fast-pit calibration only
triggers probably 80-90% of the time. About 10-20% of the time, the
initial check to pit_expect_msb(0xff) fails (count=0), so we may need to
look more at this approach.

john stultz

unread,
Mar 4, 2009, 9:52:30 PM3/4/09
to Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

Err. Sorry, hit send before I included the patch.

-john

Not for inclusion.

Signed-off-by: John Stultz <john...@us.ibm.com>

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 599e581..2e16d30 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -317,15 +317,17 @@ static unsigned long quick_pit_calibrate(void)

if (pit_expect_msb(0xff)) {
int i;
- u64 t1, t2, delta;
+ u64 t1, t2, delta, ref1, ref2;
+ u64 ref_freq = 0, pit_freq = 0;
+ int hpet = is_hpet_enabled();
unsigned char expect = 0xfe;

- t1 = get_cycles();
+ t1 = tsc_read_refs(&ref1, hpet);
for (i = 0; i < QUICK_PIT_ITERATIONS; i++, expect--) {
if (!pit_expect_msb(expect))
goto failed;
}
- t2 = get_cycles();
+ t2 = tsc_read_refs(&ref2, hpet);

/*
* Make sure we can rely on the second TSC timestamp:
@@ -333,6 +335,13 @@ static unsigned long quick_pit_calibrate(void)
if (!pit_expect_msb(expect))
goto failed;

+
+ delta = (t2 - t1);
+ if (hpet)
+ ref_freq = calc_hpet_ref(delta*1000000LL, ref1, ref2);
+ else
+ ref_freq = calc_pmtimer_ref(delta*1000000LL, ref1, ref2);
+
/*
* Ok, if we get here, then we've seen the
* MSB of the PIT decrement QUICK_PIT_ITERATIONS
@@ -347,10 +356,32 @@ static unsigned long quick_pit_calibrate(void)
* kHz = (t2 - t1) / (QPI * 256 / PIT_TICK_RATE) / 1000
* kHz = ((t2 - t1) * PIT_TICK_RATE) / (QPI * 256 * 1000)
*/
- delta = (t2 - t1)*PIT_TICK_RATE;
- do_div(delta, QUICK_PIT_ITERATIONS*256*1000);
+ printk("tsc delta: %lld\n", t2-t1);
+
+ pit_freq = delta * PIT_TICK_RATE;
+ do_div(pit_freq, QUICK_PIT_ITERATIONS*256*1000);
+
+ printk("ref_freq: %lld pit_freq: %lld\n", ref_freq, pit_freq);
+
+ /* Check the reference deviation */
+ delta = ((u64) pit_freq) * 100;
+ do_div(delta, ref_freq);
+
+ /*
+ * If both calibration results are inside a 10% window
+ * then we can be sure, that the calibration
+ * succeeded. We break out of the loop right away. We
+ * use the reference value, as it is more precise.
+ */
+ if (delta >= 90 && delta <= 110) {
+ printk(KERN_INFO
+ "TSC: Fast PIT calibration matches %s.\n",
+ hpet ? "HPET" : "PMTIMER");
+ return ref_freq;
+ }
+
printk("Fast TSC calibration using PIT\n");
- return delta;
+ return pit_freq;
}
failed:
return 0;
@@ -375,7 +406,7 @@ unsigned long native_calibrate_tsc(void)
local_irq_save(flags);
fast_calibrate = quick_pit_calibrate();
local_irq_restore(flags);
- if (fast_calibrate)
+ if (0 && fast_calibrate)
return fast_calibrate;

/*

Ingo Molnar

unread,
Mar 5, 2009, 3:44:24 AM3/5/09
to john stultz, Jesper Krogh, Thomas Gleixner, Linus Torvalds, Linux Kernel Mailing List, Len Brown

* john stultz <john...@us.ibm.com> wrote:

> > Ingo, Thomas: On the hardware I'm testing the fast-pit
> > calibration only triggers probably 80-90% of the time. About
> > 10-20% of the time, the initial check to
> > pit_expect_msb(0xff) fails (count=0), so we may need to look
> > more at this approach.

We definitely need to improve calibration quality.

The question is - why does fast-calibration fail 10-20% of the
time on your test-system? Also, why exactly do we miscalibrate?
Could you please have a look at that?

One theory would be that the PIT readout is unreliable. Windows
does not make use of it, so it's not the most tested aspect of
the PIT. Is that what happens on your box?

Ingo

john stultz

unread,
Mar 5, 2009, 10:14:03 PM3/5/09