Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Linux 2.6.27-rc8

1 view
Skip to first unread message

Linus Torvalds

unread,
Sep 29, 2008, 6:40:10 PM9/29/08
to

So yet another week, another -rc. This one should be the last one: we're
certainly not running out of regressions, but at the same time, at some
point I just have to pick some point, and on the whole the regressions
don't look _too_ scary. And -rc8 obviously does fix more of them.

Most of the changes since -rc7 are pretty small, and there aren't even a
whole lot of them. The shortlog (appended) is just a couple of pages, and
the diffstat is even smaller, but since the dirstat is a dense overview,
I'll just put that here instead:

4.6% arch/m32r/kernel/
5.7% arch/m32r/
9.5% arch/mips/pci/
10.4% arch/mips/
4.2% arch/x86/kernel/
4.4% arch/x86/
26.0% arch/
3.5% drivers/usb/storage/
10.4% drivers/usb/
3.6% drivers/watchdog/
23.8% drivers/
11.5% fs/xfs/
13.5% fs/
3.7% kernel/
9.8% net/9p/
10.6% net/
5.4% scripts/kconfig/
5.9% scripts/
7.4% sound/soc/codecs/
8.4% sound/soc/
10.1% sound/

and it's actually more spread out than usual. Arch and drivers are just
half of the patch even when combined.

Give it a try,

Linus

---
Adrian Bunk (5):
m32r: remove the unused NOHIGHMEM option
m32r: don't offer CONFIG_ISA
m32r: export empty_zero_page
m32r: export __ndelay
m32r/kernel/: cleanups

Adrian Hunter (2):
UBIFS: TNC / GC race fixes
UBIFS: remove incorrect assert

Akinobu Mita (2):
[WATCHDOG] ibmasr: remove unnecessary spin_unlock()
ibmasr: remove unnecessary spin_unlock()

Alan Cox (1):
pcmcia: Fix broken abuse of dev->driver_data

Alan Stern (2):
USB: unusual_devs addition for RockChip MP3 player
USB: revert recovery from transient errors

Alex Chiang (1):
[IA64] Ski simulator doesn't need check_sal_cache_flush

Alexander Beregalov (1):
UBIFS: fix printk format warnings

Alexander Duyck (1):
netdev: simple_tx_hash shouldn't hash inside fragments

Andrea Righi (1):
x86, oprofile: BUG scheduling while atomic

Andreas Bombe (1):
usb-serial: Add Siemens EF81 to PL-2303 hack triggers

Andreas Herrmann (1):
x86: c1e_idle: don't mark TSC unstable if CPU has invariant TSC

Andrew Morton (2):
Documentation/sysctl/kernel.txt: fix softlockup_thresh description
USB: drivers/usb/musb/: disable it on SuperH

Andrew Vasquez (1):
[SCSI] qla2xxx: Defer enablement of RISC interrupts until ISP initialization completes.

Anti Sullin (1):
atmel_serial: update the powersave handler to match serial core

Atsuo Igarashi (1):
kgdb: could not write to the last of valid memory with kgdb

Aurelien Jarno (2):
[MIPS] BCM47xx: Fix build error due to missing PCI functions
[SSB] Initialise dma_mask for SSB_BUSTYPE_SSB devices

Balbir Singh (1):
mm owner: fix race between swapoff and exit

Ben Dooks (1):
[WATCHDOG] wdt285: fix sparse warnings

Boaz Harrosh (2):
[SCSI] qlogicpti: fix sg list traversal error in continuation entries
scsi: fix fall out of sg-chaining patch in qlogicpti

Borislav Petkov (1):
ide-tape: fix vendor strings

Bruno Randolf (2):
[MIPS] au1000: Fix gpio direction
[MIPS] au1000: Make sure GPIO value is zero or one

Chris Adams (1):
usb serial: ti_usb_3410_5052 obviously broken by firmware changes

Craig Shelley (1):
USB: SERIAL CP2101 add device IDs

Daisuke Nishimura (1):
memcg: check under limit at shrink_usage

David Almaroad (1):
usb: unusual devs patch for Nokia 5310 Music Xpress

David Brownell (3):
USB: ehci: fix some ehci hangs and crashes
usb gadget: fix omap_udc DMA regression
USB: fix EHCI periodic transfers

David Howells (3):
MN10300: Move asm-arm/cnt32_to_63.h to include/linux/
MN10300: Make sched_clock() report time since boot
ARM: Delete ARM's own cnt32_to_63.h

David S. Miller (2):
sparc64: Fix disappearing PCI devices on e3500.
sparc64: Fix missing devices due to PCI bridge test in of_create_pci_dev().

Eric Van Hensbergen (1):
9p: fix put_data error handling

Felipe Balbi (1):
usb: musb: fix include path

Filip Joelsson (1):
USB: Fixing Nokia 3310c in storage mode

Gaetan Carlier (1):
usb: ftdi_sio: add support for Domintell devices

Geoff Levand (1):
USB: fix hcd interrupt disabling

Greg Kroah-Hartman (1):
PCI: fix compiler warnings in pci_get_subsys()

Haavard Skinnemoen (1):
ALSA: ASoC: Fix at32-pcm build breakage with PM enabled

Henrik Rydberg (1):
Input: bcm5974 - switch back to normal mode when closing

Ingo Molnar (1):
timers: fix build error in !oneshot case

Jack Tan (1):
[MIPS] Fixe the definition of PTRS_PER_PGD

James Bottomley (1):
[SCSI] Fix hang with split requests

Jaroslav Kysela (1):
USB: ftdi_sio: Add 0x5050/0x0900 USB IDs (Papouch Quido USB 4/4)

Jason Wessel (4):
kgdb, x86, arm, mips, powerpc: ignore user space single stepping
kgdb, x86_64: gdb serial has BX and DX reversed
kgdb, x86_64: fix PS CS SS registers in gdb serial
kgdboc,tty: Fix tty polling search to use name correctly

Jay Lan (1):
[IA64] kexec fails on systems with blocks of uncached memory

Jean Delvare (2):
i2c: Fix mailing lists in two MAINTAINERS entries
ALSA: ASoC: Fix another cs4270 error path

Jeremy Katz (1):
x86: disable apm on the olpc

Joerg Roedel (2):
AMD IOMMU: set iommu sunc flag after command queuing
AMD IOMMU: protect completion wait loop with iommu lock

Jonathan Steel (1):
kexec: fix segmentation fault in kimage_add_entry

Julia Lawall (1):
9p: introduce missing kfree

Julien Brunel (1):
9p: use an IS_ERR test rather than a NULL test

Kevin Lloyd (3):
USB Storage: Sierra: Non-configurable TRU-Install
USB Serial: Sierra: Device addition & version rev
USB Serial: Sierra: Add MC8785 VID/PID

Kirill A. Shutemov (1):
smb.h: do not include linux/time.h in userspace

Kristoffer Ericson (1):
Input: jornada720_ts - fix build error ( LONG() usage )

Lachlan McIlroy (2):
[XFS] Fix extent list corruption in xfs_iext_irec_compact_full().
[XFS] Remove xfs_iext_irec_compact_full()

Liam Girdwood (1):
ALSA: ASoC: maintainers - update email address for Liam Girdwood

Linus Torvalds (2):
Fix NULL pointer dereference in proc_sys_compare
Linux 2.6.27-rc8

Luis R. Rodriguez (1):
ath9k: disable MIB interrupts to fix interrupt storm

Marc Dionne (1):
x86: prevent stale state of c1e_mask across CPU offline/online, fix

Marcel Holtmann (3):
[Bluetooth] Fix I/O errors on MacBooks with Broadcom chips
[Bluetooth] Fix wrong URB handling of btusb driver
[Bluetooth] Fix USB disconnect handling of btusb driver

Marin Mitov (1):
Documentation/DMA-mapping.txt: update for pci_dma_mapping_error() changes

Michael Kerrisk (1):
sys_paccept: disable paccept() until API design is resolved

Márton Németh (1):
cdrom: update ioctl documentation

Nick Piggin (1):
mm: tiny-shmem fix lock ordering: mmap_sem vs i_mutex

Oliver Neukum (1):
USB: update of Documentation/usb/anchors.txt

Otavio Salvador (1):
USB: serial: add ZTE CDMA Tech id to option driver

Peter Korsgaard (1):
USB: fsl_usb2_udc: fix VDBG() format string

Rakib Mullick (1):
sched: fix init_hrtick() section mismatch warning

Ralf Baechle (2):
[MIPS] IP27: Switch to dynamic interrupt routing avoding panic on error.
Swarm: Fix crash due to missing initialization

Randy Dunlap (1):
kernel-doc: allow structs whose members are all private

Ravikiran G Thirumalai (1):
x86: fix 27-rc crash on vsmp due to paravirt during module load

Richard Nauber (1):
USB: Fix the Nokia 6300 storage-mode.

Roland Dreier (1):
IPoIB: Fix crash when path record fails after path flush

Sebastian Siewior (1):
UBIFS: create the name of the background thread in every case

Senthil Balasubramanian (2):
ath9k: connectivity is lost after Group rekeying is done
ath9k: Fix IRQ nobody cared issue with ath9k

Sitsofe Wheeler (1):
PCI: Fix pcie_aspm=force

Sven Wegener (1):
i2c-dev: Return correct error code on class_create() failure

Takashi Iwai (2):
ALSA: fix locking in snd_pcm_open*() and snd_rawmidi_open*()
ALSA: remove unneeded power_mutex lock in snd_pcm_drop

Tejun Heo (7):
9p: implement proper trans module refcounting and unregistration
9p-trans_fd: fix trans_fd::p9_conn_destroy()
9p-trans_fd: clean up p9_conn_create()
9p-trans_fd: don't do fs segment mangling in p9_fd_poll()
9p-trans_fd: fix and clean up module init/exit paths
ide: note that IDE generic may prevent other drivers from attaching
sata_nv: reinstate nv_hardreset() for non generic controllers

Thomas Gleixner (6):
clockevents: prevent cpu online to interfere with nohz
x86: prevent stale state of c1e_mask across CPU offline/online
clockevents: prevent stale tick_next_period for onlining CPUs
clockevents: check broadcast device not tick device
clockevents: prevent mode mismatch on cpu online
x86: prevent C-states hang on AMD C1E enabled machines

Timur Tabi (1):
ALSA: make the CS4270 driver a new-style I2C driver

Tony Murray (1):
USB: Correct Sierra Wireless USB EVDO Modem Device ID

Uwe Kleine-König (1):
i2c-powermac: Fix section for probe and remove functions

Wim Van Sebroeck (1):
[WATCHDOG] unlocked_ioctl changes

Yasuyuki Kozakai (1):
netfilter: ip6t_{hbh,dst}: Rejects not-strict mode on rule insertion

born.int...@gmail.com (1):
wireless: zd1211rw: add device ID fix wifi dongle "trust nw-3100"

zip...@linux-m68k.org (2):
kconfig: fix silentoldconfig
kconfig: readd lost change count

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

da...@lang.hm

unread,
Sep 29, 2008, 7:10:10 PM9/29/08
to
On Mon, 29 Sep 2008, Linus Torvalds wrote:

> So yet another week, another -rc. This one should be the last one: we're
> certainly not running out of regressions, but at the same time, at some
> point I just have to pick some point, and on the whole the regressions
> don't look _too_ scary. And -rc8 obviously does fix more of them.

unless there is news that I missed, the E1000 bricking bug is still out
there. that is a particularly nasty one.

David Lang

> M?rton N?meth (1):

> Uwe Kleine-K?nig (1):

Jiri Kosina

unread,
Sep 29, 2008, 7:40:05 PM9/29/08
to
On Mon, 29 Sep 2008, Linus Torvalds wrote:

> So yet another week, another -rc. This one should be the last one: we're
> certainly not running out of regressions, but at the same time, at some
> point I just have to pick some point, and on the whole the regressions
> don't look _too_ scary. And -rc8 obviously does fix more of them.

If 2.6.27 is released with e1000e driver corrupting EEPROM contents on
many systems out there, rendering the cards unusable for most of the
i-am-not-a-hacker users (and remember, even Dave Airlie bricked his laptop
completely to death, when trying to restore eeprom contents), well, I
personally find that very scary.

Intel is working with us on tracking down and resolving the issue, but
this is not going as well as one would like to see (one attempt, one card
with completely hosed EEPROM contents ... and restoring the contents is
not *that* trivial).

Intel has some patches to mitigate the symptoms (even though we still
don't know who is causing the breakage, but Xorg is the biggest suspect in
my eyes), but they are neither in your tree nor in any other maintainer's
queue yet, as far as I know.

--
Jiri Kosina
SUSE Labs

Linus Torvalds

unread,
Sep 29, 2008, 10:00:18 PM9/29/08
to

On Tue, 30 Sep 2008, Jiri Kosina wrote:
>
> Intel is working with us on tracking down and resolving the issue, but
> this is not going as well as one would like to see (one attempt, one card
> with completely hosed EEPROM contents ... and restoring the contents is
> not *that* trivial).

What's the magic to trigger it? I've got a laptop with that e1000e chip in
it, and am obviously running a recent kernel on it. Do people have a
handle on it? Is it actually verified to be kernel-related, and not
related to the X server etc?

Linus

Arjan van de Ven

unread,
Sep 29, 2008, 10:10:11 PM9/29/08
to
On Tue, 30 Sep 2008 11:59:58 +1000
"Dave Airlie" <air...@gmail.com> wrote:

> On Tue, Sep 30, 2008 at 11:56 AM, Linus Torvalds
> <torv...@linux-foundation.org> wrote:
> >
> >
> > On Tue, 30 Sep 2008, Jiri Kosina wrote:
> >>
> >> Intel is working with us on tracking down and resolving the issue,
> >> but this is not going as well as one would like to see (one
> >> attempt, one card with completely hosed EEPROM contents ... and
> >> restoring the contents is not *that* trivial).
> >
> > What's the magic to trigger it? I've got a laptop with that e1000e
> > chip in it, and am obviously running a recent kernel on it. Do
> > people have a handle on it? Is it actually verified to be
> > kernel-related, and not related to the X server etc?
>

> If we had the magic we'd have fixed it by now, the current working
> theory is its X server related. This
> hasn't been proven, though my ATI GPU e1000e seems fine so it may have
> some legs.
>
> If it is X related then its both a kernel + X server issue, the e1000e
> driver opens the barn door, the X server drives the horses through it.
>
> Of course until someone produces a way to fix the hw after it breaks,
> reproducing this isn't something for the feint hearted. I'm hoping my
> laptop
> comes back today with a brand new motherboard in it.
>

we have a patch to save/restore now, in final testing stages
(obviously we want to be really careful with this)

Note that so far it seems to mostly hit with "new" distros, so both
new kernel and new X... ;(


--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Dave Airlie

unread,
Sep 29, 2008, 10:10:10 PM9/29/08
to
On Tue, Sep 30, 2008 at 11:56 AM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
>
> On Tue, 30 Sep 2008, Jiri Kosina wrote:
>>
>> Intel is working with us on tracking down and resolving the issue, but
>> this is not going as well as one would like to see (one attempt, one card
>> with completely hosed EEPROM contents ... and restoring the contents is
>> not *that* trivial).
>
> What's the magic to trigger it? I've got a laptop with that e1000e chip in
> it, and am obviously running a recent kernel on it. Do people have a
> handle on it? Is it actually verified to be kernel-related, and not
> related to the X server etc?

If we had the magic we'd have fixed it by now, the current working


theory is its X server related. This
hasn't been proven, though my ATI GPU e1000e seems fine so it may have
some legs.

If it is X related then its both a kernel + X server issue, the e1000e
driver opens the barn door, the X server drives the horses through it.

Of course until someone produces a way to fix the hw after it breaks,
reproducing this isn't something for the feint hearted. I'm hoping my
laptop
comes back today with a brand new motherboard in it.

Dave.

Linus Torvalds

unread,
Sep 29, 2008, 10:30:13 PM9/29/08
to

On Tue, 30 Sep 2008, Dave Airlie wrote:
>
> If it is X related then its both a kernel + X server issue, the e1000e
> driver opens the barn door, the X server drives the horses through it.

Are you sure? There was a mandriva report abou NVM corruption on an e100
too (that one apparently just caused PXE failure, the networking worked
fine).

So I wonder if it's _purely_ X-server-related, adn the reason people blame
2.6.27-rc1 is just timing of some X update and then people just look at
the kernel beceuse the 'network card failed' looks so kernel-related.

The reason I mention that is right now it looks like the distros are just
running around disabling the e1000e module, or perhaps downgrading it.
Which may not even work!

The discussions in some of the bug-trackers seem to be full of people who
have no actual information, but are perfectly willing to flail around
wildly saying obviously crazy things.

The Ubuntu people are some of the crazier ones (should I be surprised?),
but that one also has Ben Collins claiming they use the same e1000e driver
for the 2.6.26/27 kernels (from intels sf.net project). That may be bogus,
but if true it would indicate that it's possibly not so kernel-related, or
at least not so e1000e-driver-related.

Linus

Linus Torvalds

unread,
Sep 29, 2008, 10:30:16 PM9/29/08
to

On Mon, 29 Sep 2008, Arjan van de Ven wrote:
>
> we have a patch to save/restore now, in final testing stages
> (obviously we want to be really careful with this)

Btw, the _real_ bug is clearly in the hardware design that allows you to
brick those things without apparently even having a lock bit.

I'm hoping Intel doesn't treat this as just a software bug. Some hw
designer should be thinking hard about which orifice they put their head
up in.

It used to be that you could fry some monitors by feeding them
out-of-range signals. The _monitors_ got fixed.

Linus

Linus Torvalds

unread,
Sep 29, 2008, 10:30:16 PM9/29/08
to

On Mon, 29 Sep 2008, Linus Torvalds wrote:
>
> It used to be that you could fry some monitors by feeding them
> out-of-range signals. The _monitors_ got fixed.

Mostly. I think you can still do bad things to internal LCD's on at least
some laptops. Although I hope I'm wrong.

Brandeburg, Jesse

unread,
Sep 29, 2008, 10:40:11 PM9/29/08
to
Linus Torvalds wrote:
> What's the magic to trigger it? I've got a laptop with that e1000e
> chip in it, and am obviously running a recent kernel on it. Do people
> have a handle on it? Is it actually verified to be kernel-related,
> and not related to the X server etc?

my current status mail was posted earlier today to lkml from this
address, since then we've had a local reproduction and are going for
number two. The reproduction seems racy, i.e. it doesn't happen every
time, so we put it in a loop doing detect, check eeprom, detect, etc,
and we'll see if it fails.

Reproduction seems to consistently be around X probing time, no firm
leads yet. As for Intel we have keithp and jbarnes as well as arjan,
auke, myself and a few others involved.

We have some patches to lock the nvm down, we'll be posting those
tonight and tomorrow, I also have some debug logic (and fixes) to help
prove that we don't think it's a race in e1000e.
--
Jesse

Dave Airlie

unread,
Sep 29, 2008, 10:40:09 PM9/29/08
to
On Tue, Sep 30, 2008 at 12:21 PM, Linus Torvalds
<torv...@linux-foundation.org> wrote:
>
>
> On Tue, 30 Sep 2008, Dave Airlie wrote:
>>
>> If it is X related then its both a kernel + X server issue, the e1000e
>> driver opens the barn door, the X server drives the horses through it.
>
> Are you sure? There was a mandriva report abou NVM corruption on an e100
> too (that one apparently just caused PXE failure, the networking worked
> fine).

Well from a purely empirical standpoint, I've been running new X
against that laptop for a long time,
and others have the same laptop, so I think its a problem with the
e1000e driver putting the card into a state which allows
X to do bad things. I think X maybe causing issues on other hw, like
e100 and some realtek.. Also when we say X I think it looks like Intel
driver interaction issues,
as I said I'm running the same stuff on my ATI gpu laptop with e1000e
and haven't had any problems.

But I'm leaving this up to Intel, I don't think HP will take it too
kindly if I keep returning my laptop.

Dave.

Arjan van de Ven

unread,
Sep 29, 2008, 11:30:15 PM9/29/08
to
On Mon, 29 Sep 2008 19:21:02 -0700 (PDT)
Linus Torvalds <torv...@linux-foundation.org> wrote:

>
>
> On Tue, 30 Sep 2008, Dave Airlie wrote:
> >
> > If it is X related then its both a kernel + X server issue, the
> > e1000e driver opens the barn door, the X server drives the horses
> > through it.
>
> Are you sure? There was a mandriva report abou NVM corruption on an
> e100 too (that one apparently just caused PXE failure, the networking
> worked fine).
>
> So I wonder if it's _purely_ X-server-related, adn the reason people
> blame 2.6.27-rc1 is just timing of some X update and then people just
> look at the kernel beceuse the 'network card failed' looks so
> kernel-related.
>
> The reason I mention that is right now it looks like the distros are
> just running around disabling the e1000e module, or perhaps
> downgrading it. Which may not even work!


btw, we're also working on making some parts of the kernel more robust
against certain types of bugs; for example the ioremap checks and sysfs
resource checks. There's a set of checks and API changes we can do to
make it less likely that drivers end up doing bad stuff; but that's
obviously more for 2.6.28 than for .27

--
Arjan van de Ven Intel Open Source Technology Centre
For development, discussion and tips for power savings,
visit http://www.lesswatts.org

Brandeburg, Jesse

unread,
Sep 29, 2008, 11:50:10 PM9/29/08
to
On Mon, 29 Sep 2008, Linus Torvalds wrote:
> Btw, the _real_ bug is clearly in the hardware design that allows you to
> brick those things without apparently even having a lock bit.

The hardware has a lock bit, and we're trying to figure out why the BIOS
writers guide doesn't say to set it. Probably because of the MAC address,
but who knows.



> I'm hoping Intel doesn't treat this as just a software bug. Some hw
> designer should be thinking hard about which orifice they put their head
> up in.

We will post a patch to e1000e tomorrow that sets a lock bit that prevents
the registers memory mapped by 0:19.0 BAR1 from causing flash write
cycles.

The patches I've just posted don't quite do that yet.

Jiri Kosina

unread,
Sep 30, 2008, 3:10:06 AM9/30/08
to
On Mon, 29 Sep 2008, Linus Torvalds wrote:

> > Intel is working with us on tracking down and resolving the issue, but
> > this is not going as well as one would like to see (one attempt, one
> > card with completely hosed EEPROM contents ... and restoring the
> > contents is not *that* trivial).
> What's the magic to trigger it? I've got a laptop with that e1000e chip
> in it, and am obviously running a recent kernel on it. Do people have a
> handle on it? Is it actually verified to be kernel-related, and not
> related to the X server etc?

So far it seems to be that you need 1) something close to xorg 7.4 and
2) 2.6.27-rcX kernel to trigger it. Not every system having e1000e is
affected.

Apparently it is some kind of race, as it usually takes multiple cycles to
trigger (on one of our testing machines this took three attempts to
trigger for the first time, and then after unbricking the machine and
restarting testing, the reproduction tests have been running for several
hours).

It always seems to happen when X is probing/initializing the graphics
card. So it really seems to be some badness in Xorg intel driver
initialization code, and kernel/hardware allows bad things to happen.

Last time I heard, our X developers are suspecting vbeinit initialization
code in Intel driver and are looking into it.

Also, we are going to release next opensuse/SLES beta with patches that
should mitigate the problem (Jesse has posted a new version of them), so
hopefully we will then receive some stacktraces from the users who are
able to trigger the problem more easily.

--
Jiri Kosina
SUSE Labs

Jiri Kosina

unread,
Sep 30, 2008, 3:20:07 AM9/30/08
to
On Mon, 29 Sep 2008, Linus Torvalds wrote:

> > If it is X related then its both a kernel + X server issue, the e1000e
> > driver opens the barn door, the X server drives the horses through it.
> Are you sure? There was a mandriva report abou NVM corruption on an e100
> too (that one apparently just caused PXE failure, the networking worked
> fine).

That is very probably completely separate issue, and shoudl have been
fixed already by 78566fecb.

> The Ubuntu people are some of the crazier ones (should I be surprised?),
> but that one also has Ben Collins claiming they use the same e1000e
> driver for the 2.6.26/27 kernels (from intels sf.net project). That may
> be bogus, but if true it would indicate that it's possibly not so
> kernel-related, or at least not so e1000e-driver-related.

I think that not many peeople are suspecting bug in e1000e directly.
Rather a combination of X bug, kernel allowing X to do bad things (for
example the missing check in drivers/pci/pci-sysfs.c:pci_mmap_resource()
looks particularly suspicious) and a "bug-friendly" hardware behavior.

--
Jiri Kosina
SUSE Labs

Ingo Molnar

unread,
Sep 30, 2008, 4:00:19 AM9/30/08
to

* J.A. Magallón <jamag...@ono.com> wrote:

> Hi....


>
> On Mon, 29 Sep 2008 15:39:09 -0700 (PDT), Linus Torvalds <torv...@linux-foundation.org> wrote:
>
> >
> > So yet another week, another -rc. This one should be the last one: we're
> > certainly not running out of regressions, but at the same time, at some
> > point I just have to pick some point, and on the whole the regressions
> > don't look _too_ scary. And -rc8 obviously does fix more of them.
> >
> > Most of the changes since -rc7 are pretty small, and there aren't even a
> > whole lot of them. The shortlog (appended) is just a couple of pages, and
> > the diffstat is even smaller, but since the dirstat is a dense overview,
> > I'll just put that here instead:
> >
>
>

> Dealing with my Aspire One setup, I found this (so obvious I don't
> send a patch:)
>
>
> arch/x86/kernel/cpu/mtrr/main.c:
>
> static int __init disable_mtrr_cleanup_setup(char *str)
> {
> if (enable_mtrr_cleanup != -1)
> enable_mtrr_cleanup = 0;
> return 0;
> }
> early_param("disable_mtrr_cleanup", disable_mtrr_cleanup_setup);
>
> static int __init enable_mtrr_cleanup_setup(char *str)
> {
> if (enable_mtrr_cleanup != -1)
> enable_mtrr_cleanup = 1;
> return 0;
> }
> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
> ^^^^^^
>
> Nice ;)

heh. Could you send a patch with a changelog please?

Ingo

J.A. Magallón

unread,
Sep 30, 2008, 4:00:25 AM9/30/08
to
Hi....

On Mon, 29 Sep 2008 15:39:09 -0700 (PDT), Linus Torvalds <torv...@linux-foundation.org> wrote:

>
> So yet another week, another -rc. This one should be the last one: we're
> certainly not running out of regressions, but at the same time, at some
> point I just have to pick some point, and on the whole the regressions
> don't look _too_ scary. And -rc8 obviously does fix more of them.
>
> Most of the changes since -rc7 are pretty small, and there aren't even a
> whole lot of them. The shortlog (appended) is just a couple of pages, and
> the diffstat is even smaller, but since the dirstat is a dense overview,
> I'll just put that here instead:
>

Dealing with my Aspire One setup, I found this (so obvious I don't send a patch:)


arch/x86/kernel/cpu/mtrr/main.c:

static int __init disable_mtrr_cleanup_setup(char *str)
{
if (enable_mtrr_cleanup != -1)
enable_mtrr_cleanup = 0;
return 0;
}
early_param("disable_mtrr_cleanup", disable_mtrr_cleanup_setup);

static int __init enable_mtrr_cleanup_setup(char *str)
{
if (enable_mtrr_cleanup != -1)
enable_mtrr_cleanup = 1;
return 0;
}
early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
^^^^^^

Nice ;)

--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP

J.A. Magallón

unread,
Sep 30, 2008, 4:10:14 AM9/30/08
to

Here it goes...I hope its right.

==================

Correct typo for 'enable_mtrr_cleanup' early boot param name.

Signed-off-by: J.A. Magallon <jamag...@ono.com>

diff -p -up linux/arch/x86/kernel/cpu/mtrr/main.c.orig linux/arch/x86/kernel/cpu/mtrr/main.c
--- linux/arch/x86/kernel/cpu/mtrr/main.c.orig 2008-09-30 09:57:46.000000000 +0200
+++ linux/arch/x86/kernel/cpu/mtrr/main.c 2008-09-30 09:57:55.000000000 +0200
@@ -834,7 +834,7 @@ static int __init enable_mtrr_cleanup_se


enable_mtrr_cleanup = 1;
return 0;
}

-early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
+early_param("enable_mtrr_cleanup", enable_mtrr_cleanup_setup);

struct var_mtrr_state {
unsigned long range_startk;

--
J.A. Magallon <jamagallon()ono!com> \ Software is like sex:
\ It's better when it's free
Mandriva Linux release 2009.0 (Cooker) for i586
Linux 2.6.25-jam18 (gcc 4.3.1 20080626 (GCC) #1 SMP

Ingo Molnar

unread,
Sep 30, 2008, 4:10:16 AM9/30/08
to

* J.A. Magallón <jamag...@ono.com> wrote:

> > heh. Could you send a patch with a changelog please?
>
> Here it goes...I hope its right.

applied to tip/x86/urgent, thanks!

Ingo

Eric Piel

unread,
Sep 30, 2008, 4:20:12 AM9/30/08
to
Jiri Kosina schreef:

> On Mon, 29 Sep 2008, Linus Torvalds wrote:
>
>>> If it is X related then its both a kernel + X server issue, the e1000e
>>> driver opens the barn door, the X server drives the horses through it.
>> Are you sure? There was a mandriva report abou NVM corruption on an e100
>> too (that one apparently just caused PXE failure, the networking worked
>> fine).
>
> That is very probably completely separate issue, and shoudl have been
> fixed already by 78566fecb.
Likely not, you are mentioning a patch for e1000, while the Mandriva bug
report is about e100:
https://qa.mandriva.com/show_bug.cgi?id=44192

See you,
Eric

Alan Cox

unread,
Sep 30, 2008, 8:10:07 AM9/30/08
to
> Mostly. I think you can still do bad things to internal LCD's on at least
> some laptops. Although I hope I'm wrong.

You still can in some cases. You can also erase many video card
firmwares, trash disks, brick DVD drives and the like fairly easily too
but you do tend to have to try to be evil in these cases, not just get an
address wrong.

Alan

Alan Cox

unread,
Sep 30, 2008, 8:10:15 AM9/30/08
to
> I'm hoping Intel doesn't treat this as just a software bug. Some hw
> designer should be thinking hard about which orifice they put their head
> up in.

I am confident they will, because right now some more malicious virus
writers will be thinking 'whoopeee party time'.

Jiri Kosina

unread,
Sep 30, 2008, 10:20:08 AM9/30/08
to
On Tue, 30 Sep 2008, Krzysztof Halasa wrote:

> > So far it seems to be that you need 1) something close to xorg 7.4 and
> > 2) 2.6.27-rcX kernel to trigger it. Not every system having e1000e is
> > affected.

> And this e1000e must be ICH*, right? I.e. not a separate e1000e
> chip/card?

So far all the affected systems I am aware of were ICH.

Krzysztof Halasa

unread,
Sep 30, 2008, 10:20:04 AM9/30/08
to
Jiri Kosina <jko...@suse.cz> writes:

> So far it seems to be that you need 1) something close to xorg 7.4 and
> 2) 2.6.27-rcX kernel to trigger it. Not every system having e1000e is
> affected.

And this e1000e must be ICH*, right? I.e. not a separate e1000e
chip/card?
--
Krzysztof Halasa

Allan, Bruce W

unread,
Sep 30, 2008, 11:50:19 AM9/30/08
to
Ditto here, i.e. we have no similar reports on other parts.

Luiz Fernando N. Capitulino

unread,
Sep 30, 2008, 12:30:18 PM9/30/08
to
Em Tue, 30 Sep 2008 09:58:56 +0200
Eric Piel <eric...@tremplin-utc.net> escreveu:

| Jiri Kosina schreef:
| > On Mon, 29 Sep 2008, Linus Torvalds wrote:
| >
| >>> If it is X related then its both a kernel + X server issue, the e1000e
| >>> driver opens the barn door, the X server drives the horses through it.
| >> Are you sure? There was a mandriva report abou NVM corruption on an e100
| >> too (that one apparently just caused PXE failure, the networking worked
| >> fine).
| >
| > That is very probably completely separate issue, and shoudl have been
| > fixed already by 78566fecb.
| Likely not, you are mentioning a patch for e1000, while the Mandriva bug
| report is about e100:
| https://qa.mandriva.com/show_bug.cgi?id=44192

Yes, also the reporter has said that he has got the problem with -rc7 and
this fix is available since -rc6.

Jiri, doesn't e100 need that fix as well?

Anyway, it is not clear for us whether this is a kernel problem. We
could not reproduce it here and the reporter is now checking his network.

--
Luiz Fernando N. Capitulino

Herton Ronaldo Krzesinski

unread,
Sep 30, 2008, 2:30:21 PM9/30/08
to
On Tuesday 30 September 2008 13:28:31 Luiz Fernando N. Capitulino wrote:
> Em Tue, 30 Sep 2008 09:58:56 +0200
>
> Eric Piel <eric...@tremplin-utc.net> escreveu:
> | Jiri Kosina schreef:
> | > On Mon, 29 Sep 2008, Linus Torvalds wrote:
> | >>> If it is X related then its both a kernel + X server issue, the
> | >>> e1000e driver opens the barn door, the X server drives the horses
> | >>> through it.
> | >>
> | >> Are you sure? There was a mandriva report abou NVM corruption on an
> | >> e100 too (that one apparently just caused PXE failure, the networking
> | >> worked fine).
> | >
> | > That is very probably completely separate issue, and shoudl have been
> | > fixed already by 78566fecb.
> |
> | Likely not, you are mentioning a patch for e1000, while the Mandriva bug
> | report is about e100:
> | https://qa.mandriva.com/show_bug.cgi?id=44192
>
> Yes, also the reporter has said that he has got the problem with -rc7 and
> this fix is available since -rc6.
>
> Jiri, doesn't e100 need that fix as well?
>
> Anyway, it is not clear for us whether this is a kernel problem. We
> could not reproduce it here and the reporter is now checking his network.

He finished checks and discovered the e100 issue was in reality a hardware
problem in the switch being used that started to have problems now,
coincidently with this e1000e issue getting more attention, after swapping
the switch the problem stopped, so just a false alarm. I closed
https://qa.mandriva.com/show_bug.cgi?id=44192 that was the original report.

--
[]'s
Herton

H. Peter Anvin

unread,
Sep 30, 2008, 2:50:11 PM9/30/08
to
Ingo Molnar wrote:
>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>> ^^^^^^
>>
>> Nice ;)
>
> heh. Could you send a patch with a changelog please?

These options are also named inconsistently with all other options.

The standard way to name an boolean option is "foo" versus "nofoo", in
this case, "mtrrcleanup" vs "nomtrrcleanup".

-hpa

Yinghai Lu

unread,
Sep 30, 2008, 3:40:10 PM9/30/08
to
On Tue, Sep 30, 2008 at 11:47 AM, H. Peter Anvin <h...@zytor.com> wrote:
> Ingo Molnar wrote:
>>>
>>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>>> ^^^^^^
>>>
>>> Nice ;)
>>
>> heh. Could you send a patch with a changelog please?
>
> These options are also named inconsistently with all other options.
>
> The standard way to name an boolean option is "foo" versus "nofoo", in this
> case, "mtrrcleanup" vs "nomtrrcleanup".
>
ok, we could change it...

YH

H. Peter Anvin

unread,
Sep 30, 2008, 4:10:09 PM9/30/08
to
Yinghai Lu wrote:
> On Tue, Sep 30, 2008 at 11:47 AM, H. Peter Anvin <h...@zytor.com> wrote:
>> Ingo Molnar wrote:
>>>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>>>> ^^^^^^
>>>>
>>>> Nice ;)
>>> heh. Could you send a patch with a changelog please?
>> These options are also named inconsistently with all other options.
>>
>> The standard way to name an boolean option is "foo" versus "nofoo", in this
>> case, "mtrrcleanup" vs "nomtrrcleanup".
>>
> ok, we could change it...

If we're fixing a typo anyway I'd suggest so. We know we're not
breaking anyone's working setup...

-hpa

Yinghai Lu

unread,
Sep 30, 2008, 5:40:05 PM9/30/08
to
On Tue, Sep 30, 2008 at 12:59 PM, H. Peter Anvin <h...@zytor.com> wrote:
> Yinghai Lu wrote:
>>
>> On Tue, Sep 30, 2008 at 11:47 AM, H. Peter Anvin <h...@zytor.com> wrote:
>>>
>>> Ingo Molnar wrote:
>>>>>
>>>>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>>>>> ^^^^^^
>>>>>
>>>>> Nice ;)
>>>>
>>>> heh. Could you send a patch with a changelog please?
>>>
>>> These options are also named inconsistently with all other options.
>>>
>>> The standard way to name an boolean option is "foo" versus "nofoo", in
>>> this
>>> case, "mtrrcleanup" vs "nomtrrcleanup".
>>>
>> ok, we could change it...
>
> If we're fixing a typo anyway I'd suggest so. We know we're not breaking
> anyone's working setup...

mtrr_cleanup and no_mtrr_cleanup?

YH

H. Peter Anvin

unread,
Sep 30, 2008, 5:40:10 PM9/30/08
to
Yinghai Lu wrote:
> On Tue, Sep 30, 2008 at 12:59 PM, H. Peter Anvin <h...@zytor.com> wrote:
>> Yinghai Lu wrote:
>>> On Tue, Sep 30, 2008 at 11:47 AM, H. Peter Anvin <h...@zytor.com> wrote:
>>>> Ingo Molnar wrote:
>>>>>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>>>>>> ^^^^^^
>>>>>>
>>>>>> Nice ;)
>>>>> heh. Could you send a patch with a changelog please?
>>>> These options are also named inconsistently with all other options.
>>>>
>>>> The standard way to name an boolean option is "foo" versus "nofoo", in
>>>> this
>>>> case, "mtrrcleanup" vs "nomtrrcleanup".
>>>>
>>> ok, we could change it...
>> If we're fixing a typo anyway I'd suggest so. We know we're not breaking
>> anyone's working setup...
>
> mtrr_cleanup and no_mtrr_cleanup?
>

Dashes seem to be used more than underscores, so it probably should be
"mtrr-cleanup" and "nomtrr-cleanup" if you want a separator.

-hpa

Yinghai Lu

unread,
Sep 30, 2008, 5:50:10 PM9/30/08
to
On Tue, Sep 30, 2008 at 2:37 PM, H. Peter Anvin <h...@zytor.com> wrote:
> Yinghai Lu wrote:
>>
>> On Tue, Sep 30, 2008 at 12:59 PM, H. Peter Anvin <h...@zytor.com> wrote:
>>>
>>> Yinghai Lu wrote:
>>>>
>>>> On Tue, Sep 30, 2008 at 11:47 AM, H. Peter Anvin <h...@zytor.com> wrote:
>>>>>
>>>>> Ingo Molnar wrote:
>>>>>>>
>>>>>>> early_param("enble_mtrr_cleanup", enable_mtrr_cleanup_setup);
>>>>>>> ^^^^^^
>>>>>>>
>>>>>>> Nice ;)
>>>>>>
>>>>>> heh. Could you send a patch with a changelog please?
>>>>>
>>>>> These options are also named inconsistently with all other options.
>>>>>
>>>>> The standard way to name an boolean option is "foo" versus "nofoo", in
>>>>> this
>>>>> case, "mtrrcleanup" vs "nomtrrcleanup".
>>>>>
>>>> ok, we could change it...
>>>
>>> If we're fixing a typo anyway I'd suggest so. We know we're not breaking
>>> anyone's working setup...
>>
>> mtrr_cleanup and no_mtrr_cleanup?
>>
>
> Dashes seem to be used more than underscores, so it probably should be
> "mtrr-cleanup" and "nomtrr-cleanup" if you want a separator.
>

i need to document the mtrr_cleanup_debug too...change it to
mtrrcleanup_debug ? just like initcall_debug?

YH

Thomas Gleixner

unread,
Sep 30, 2008, 6:10:07 PM9/30/08
to
On Mon, 29 Sep 2008, Brandeburg, Jesse wrote:
> Linus Torvalds wrote:
> > What's the magic to trigger it? I've got a laptop with that e1000e
> > chip in it, and am obviously running a recent kernel on it. Do people
> > have a handle on it? Is it actually verified to be kernel-related,
> > and not related to the X server etc?
>
> my current status mail was posted earlier today to lkml from this
> address, since then we've had a local reproduction and are going for
> number two. The reproduction seems racy, i.e. it doesn't happen every
> time, so we put it in a loop doing detect, check eeprom, detect, etc,
> and we'll see if it fails.
>
> Reproduction seems to consistently be around X probing time, no firm
> leads yet. As for Intel we have keithp and jbarnes as well as arjan,
> auke, myself and a few others involved.
>
> We have some patches to lock the nvm down, we'll be posting those
> tonight and tomorrow, I also have some debug logic (and fixes) to help
> prove that we don't think it's a race in e1000e.

Can we get the simple debug patches including the fixes which resulted
from them pushed upstream ASAP ?

Thanks,

tglx

H. Peter Anvin

unread,
Sep 30, 2008, 6:10:06 PM9/30/08
to
Yinghai Lu wrote:
> On Tue, Sep 30, 2008 at 2:37 PM, H. Peter Anvin <h...@zytor.com> wrote:
>>>
>> Dashes seem to be used more than underscores, so it probably should be
>> "mtrr-cleanup" and "nomtrr-cleanup" if you want a separator.
>>
> i need to document the mtrr_cleanup_debug too...change it to
> mtrrcleanup_debug ? just like initcall_debug?
>

I would prefer "mtrr-cleanup-debug" if the main one is "mtrr-cleanup";
mixing dashes and underscores is a bit sick. Unfortunately we have had
very few attempts at consistency with command line options... some in
the early days were even StudlyCaps (yuck...)

-hpa

Jiri Kosina

unread,
Oct 1, 2008, 11:40:05 AM10/1/08
to
On Tue, 30 Sep 2008, Allan, Bruce W wrote:

> > > > So far it seems to be that you need 1) something close to xorg 7.4
> > > > and 2) 2.6.27-rcX kernel to trigger it. Not every system having
> > > > e1000e is affected.
> > > And this e1000e must be ICH*, right? I.e. not a separate e1000e
> > > chip/card?
> > So far all the affected systems I am aware of were ICH.

> Ditto here, i.e. we have no similar reports on other parts.

We have received another report [1] a few hours ago, which really looks
very much like the very same corruption, but it happened on system with
nVidia card, for the first time whatsoever! So it now really looks like
that we could rule out at least the xorg Intel driver, if this is really
the same bug. Now go guess.

I am trying to get more information about the system in question.

[1] https://bugzilla.novell.com/show_bug.cgi?id=425480#c105

Domenico Andreoli

unread,
Oct 1, 2008, 5:40:10 PM10/1/08
to
On Tue, Sep 30, 2008 at 10:05:46AM +0200, Ingo Molnar wrote:

>
> * J.A. Magall�n <jamag...@ono.com> wrote:
>
> > > heh. Could you send a patch with a changelog please?
> >
> > Here it goes...I hope its right.
>
> applied to tip/x86/urgent, thanks!

Ingo, why did you require a patch? Was not it really more simple and
easy for everyone to write it yourself? Since I am sure it was not only
a laziness matter (really?), I am very curious to know the reason.

Thank you,
Domenico

-----[ Domenico Andreoli, aka cavok
--[ http://www.dandreoli.com/gpgkey.asc
---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50

Willy Tarreau

unread,
Oct 2, 2008, 1:30:11 AM10/2/08
to
On Wed, Oct 01, 2008 at 11:33:28PM +0200, Domenico Andreoli wrote:
> On Tue, Sep 30, 2008 at 10:05:46AM +0200, Ingo Molnar wrote:
> >
> > * J.A. Magall�n <jamag...@ono.com> wrote:
> >
> > > > heh. Could you send a patch with a changelog please?
> > >
> > > Here it goes...I hope its right.
> >
> > applied to tip/x86/urgent, thanks!
>
> Ingo, why did you require a patch? Was not it really more simple and
> easy for everyone to write it yourself? Since I am sure it was not only
> a laziness matter (really?), I am very curious to know the reason.

I see two things :
- preserve authorship of the code
- "laziness" as you call it, is the only way to scale for a maintainer.

Willy

Ingo Molnar

unread,
Oct 2, 2008, 5:30:15 AM10/2/08
to

* Willy Tarreau <w...@1wt.eu> wrote:

> On Wed, Oct 01, 2008 at 11:33:28PM +0200, Domenico Andreoli wrote:
> > On Tue, Sep 30, 2008 at 10:05:46AM +0200, Ingo Molnar wrote:
> > >
> > > * J.A. Magall�n <jamag...@ono.com> wrote:
> > >
> > > > > heh. Could you send a patch with a changelog please?
> > > >
> > > > Here it goes...I hope its right.
> > >
> > > applied to tip/x86/urgent, thanks!
> >
> > Ingo, why did you require a patch? Was not it really more simple and
> > easy for everyone to write it yourself? Since I am sure it was not only
> > a laziness matter (really?), I am very curious to know the reason.
>
> I see two things :
> - preserve authorship of the code
> - "laziness" as you call it, is the only way to scale for a maintainer.

yeah, correct. Also, i asked (not required) J.A. Magall�n whether he
could send a patch - if he didnt (no time, etc.) i'd have fixed it
myself (crediting him in the changelog).

But it's also a general principle: maintainers dont 'own' the code in
any way and there should be no assymetry in the ability to modify the
code. So if people are willing to fix bugs they notice, i prefer that
far more than me doing it.

Ingo

Domenico Andreoli

unread,
Oct 2, 2008, 5:50:12 AM10/2/08
to
On Thu, Oct 02, 2008 at 11:26:30AM +0200, Ingo Molnar wrote:
>
> * Willy Tarreau <w...@1wt.eu> wrote:
>
> > On Wed, Oct 01, 2008 at 11:33:28PM +0200, Domenico Andreoli wrote:
> > > On Tue, Sep 30, 2008 at 10:05:46AM +0200, Ingo Molnar wrote:
> > > >
> > > > * J.A. Magall�n <jamag...@ono.com> wrote:
> > > >
> > > > > > heh. Could you send a patch with a changelog please?
> > > > >
> > > > > Here it goes...I hope its right.
> > > >
> > > > applied to tip/x86/urgent, thanks!
> > >
> > > Ingo, why did you require a patch? Was not it really more simple and
> > > easy for everyone to write it yourself? Since I am sure it was not only
> > > a laziness matter (really?), I am very curious to know the reason.
> >
> > I see two things :
> > - preserve authorship of the code
> > - "laziness" as you call it, is the only way to scale for a maintainer.
>
> yeah, correct. Also, i asked (not required) J.A. Magall�n whether he
> could send a patch - if he didnt (no time, etc.) i'd have fixed it
> myself (crediting him in the changelog).

yes, asked. sorry.

> But it's also a general principle: maintainers dont 'own' the code in
> any way and there should be no assymetry in the ability to modify the
> code. So if people are willing to fix bugs they notice, i prefer that
> far more than me doing it.

I think I got the lesson although the assymetry matter is still not that
clear to me. Anyway I also know that when you talk about code you prefer
patches to plain english so I expect you'd like others do the same ;)

Thank you,
Domenico

-----[ Domenico Andreoli, aka cavok
--[ http://www.dandreoli.com/gpgkey.asc
---[ 3A0F 2F80 F79C 678A 8936 4FEE 0677 9033 A20E BC50

0 new messages