I'm afraid I've been quite a naughty boy this year because Santa
brought me what looks like a kernel bug for Christmas ;)
I use basic acpi support to suspend my box. I perform eg
# echo mem > /sys/power/state
to suspend to ram.
Resuming from a suspend on my ThinkPad T42 is broken in both -rc6 and
-rc7 releases. When X is not launched, everything goes fine, but when
resuming a running X, X looks frozen. I can ssh to my box and the
sysrq keys are still working, but I'm unable to kill the X process.
If I suspend from a vt (but still with a X running), the resume goes
fine until I switch back from the vt to X.
Nothing shows up in dmesg (anyway as I said, everything still works
fine when I ssh to my box, I just can't use my computer directly), nor
in the Xorg logs (I use Xorg 6.8.2).
Everything was ok with -rc5.
Please find my .config and the lspci output attached (my graphic card
is a AGP plugged ATI Radeon Mobility 7500 and I use the "radeon"
driver from xorg).
Best regards,
Jules
Attached.
Investigating a bit further, I found out that resume is quite innocent
about all this: what hangs X is switching from a vt to X. Moreover, When I
launch X only by typing "X" in a vt, switching back and forth makes
the box hang hard (ie no sysrq), so I had to do a startx to see a call
trace with sysrq-t (I know, it may sound like black art).
Regards,
Jules
PS: Sorry for messing up with the lkml's email...
First revert wasn't enough, but the second one made it! Everything is
working now.
Thanks,
Jules
That is not good. See my other mail. I need more infos to understand
what's up.
Ben.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Definitely, and we need to figure out why the patch cause a regression.
Those patches fixes a serious issues with a number of machines.
The problem is very nasty as all the various parties involved (radeonfb,
X radeon driver, radeon DRM, etc...) all try to reconfigure the card
memory map in differently bogus ways...
Can you add printk's to the kernel to check the values in
CONFIG_MEMSIZE, CONFIG_APER_SIZE, priv->fb_location and the values
calculated for gart_vm_start ? Then tell me what that printk gets on X
start and when switching consoles.
Thanks,
Do you still want me to try that now that reverting the two patches
made the job?
Jules
Also, does it work if you don't use radeonfb ? radeonfb shouldn't touch
MC_AGP_LOCATION and the DRM change only affects that, so I'm a bit
surprised.
Ben.
-
Did you use the nVidia module? Several people reported machine hangs when
doing the vt <-> X switching. This, however, should be fixed with the latest
drivers.
I had the same problem some time ago. Though I knew a have reached a console
where I was logged in the keyboard seems to deny any service when coming from X.
Since i upgraded X to some CVS version and the nVidia driver 8174 (8178 working
as well) anything is OK.
Marc
> Did you use the nVidia module? Several people reported machine hangs when
> doing the vt <-> X switching. This, however, should be fixed with the latest
> drivers.
>
> I had the same problem some time ago. Though I knew a have reached a console
> where I was logged in the keyboard seems to deny any service when coming from X.
> Since i upgraded X to some CVS version and the nVidia driver 8174 (8178 working
> as well) anything is OK.
>
> Marc
Doh! Damn tooooo late over here... I just managed to find some 'radeonfb'-like
string in your mail. :)
Good night,
That might be good enough until I fully fix X and the DRM (work in progress
but there are other "issues").
Index: linux-work/drivers/char/drm/radeon_cp.c
===================================================================
--- linux-work.orig/drivers/char/drm/radeon_cp.c 2005-12-24 10:07:22.000000000 +1100
+++ linux-work/drivers/char/drm/radeon_cp.c 2005-12-27 12:48:02.000000000 +1100
@@ -1312,7 +1312,7 @@
static int radeon_do_init_cp(drm_device_t * dev, drm_radeon_init_t * init)
{
drm_radeon_private_t *dev_priv = dev->dev_private;
- unsigned int mem_size;
+ unsigned int mem_size, aper_size;
DRM_DEBUG("\n");
@@ -1527,7 +1527,9 @@
mem_size = RADEON_READ(RADEON_CONFIG_MEMSIZE);
if (mem_size == 0)
mem_size = 0x800000;
- dev_priv->gart_vm_start = dev_priv->fb_location + mem_size;
+ aper_size = max(RADEON_READ(RADEON_CONFIG_APER_SIZE), mem_size);
+
+ dev_priv->gart_vm_start = dev_priv->fb_location + aper_size;
#if __OS_HAS_AGP
if (!dev_priv->is_pci)
Removing radeonfb from the kernel only makes things worse: the box
gets completly frozen when reproducing the bug (no more ssh access nor
sysrq).
>
> The problem is very nasty as all the various parties involved (radeonfb,
> X radeon driver, radeon DRM, etc...) all try to reconfigure the card
> memory map in differently bogus ways...
>
> Can you add printk's to the kernel to check the values in
> CONFIG_MEMSIZE, CONFIG_APER_SIZE, priv->fb_location and the values
> calculated for gart_vm_start ? Then tell me what that printk gets on X
> start and when switching consoles.
I get these figures when I first start X:
[ 104.399101] ### fb_location is now e0000000
[ 104.399104] ### mem_size is 2000000
[ 104.399107] ### aper_size is 4000000
[ 104.399109] ### gart_vm_start is e2000000
The sad thing is that it looks like the crash occurs *before* entering
the radeon_do_init_cp function, assuming it should enter it again when
I switch back from a tty to X (I've put some printk's at the
beginning of the function but didn't see them in dmesg although other
things showed up), so I don't know where to put the printk's in order
to get other figures...
Thanks,
Jules
> The sad thing is that it looks like the crash occurs *before* entering
> the radeon_do_init_cp function, assuming it should enter it again when
> I switch back from a tty to X (I've put some printk's at the
> beginning of the function but didn't see them in dmesg although other
> things showed up), so I don't know where to put the printk's in order
> to get other figures...
I think the problem is actually a bug in the X server that we are
triggering indirectly. It's very difficult to fix things properly
because of various bugs that depends on each other side effects in X and
the DRM. I may have to back it all off for now and add some version test
to both DRM and X so that they only try to "do the right thing" once
they detect that the other hand has been fixed too...
Let's see if the latest patch I posted that fixes things for you also
helps others though.
On Wed, 28 Dec 2005, Benjamin Herrenschmidt wrote:
>
> Linus, please back out those 2 DRM patches of me for 2.6.15. It seems
> that they cause more problems than they solve due to bugs in the X
> server. I need to rethink the solution.
Hmm.. How many other problem reports do we have? Jules reported that your
patch to use the max() of the aperture size and memsize fixed the problem
for him (and I merged it). Does it have other downsides?
Linus
It doesn't, but I've got one confirmed report of failure that isn't
fixed by the latest patch and 2 other ones still dubious.
I'm not entirely sure what's going on yet. On console switch (EnterVT()
in the X driver), it will restore the mode and set back the wrong value
in MC_AGP_LOCATION. It will then re-enable AGP and call the "resume"
ioctl to the DRM which should then "fix" MC_AGP_LOCATION to the
"correct" value we calculated. However, it's possible that the chip
dislikes those constant changes of these memory controller settings
especially while it's currently pumping pixels out.
Also, if using dual head, it's possible that the X server radeon driver
goes back writing the wrong value _again_ after the first head has been
re-initialized, and while the engine is actively pumping command from
AGP, which would be deadly. The radeon driver in X is one of the worst
mess I've ever dealt with so far...
So I think at this point, the best is that we keep the old bogus code
that at least is consistent with the bug in the server. I'm working on a
big patch to X that reworks the memory map stuff completely and fixes
those issues on the server side, I'll do a DRM patch matching this X fix
as well so that the memory map is only ever set in one place and with
what I hope is a correct algorithm...
Ben.