Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

SuSe 9.3 became highly unstable after latest YOU update

8 views

Skip to first unread message

Marty

unread,

Mar 30, 2007, 6:27:02 PM3/30/07

I've been running SuSe 9.3 (64-bit) for a long time now on my K8 box and it
has been rock solid. My last uptime was over 90 days and I only took it
down to update a driver before that.

Yesterday, I started to notice that my updatedb's were hanging and the
processes started to accumulate over time. It was my mistake because it
was doing finds across some remote filesystems that I didn't tell it to
exclude, which were not always there. I couldn't make the processes go
away, so I decided it was time to reboot to clean them out.

Just prior to deciding this, I had noticed that my SuSe Watcher hadn't sent
me any updates in a very long time, which I thought was strange. So I
fired up YOU and noticed that somehow my server information had gotten
corrupted. The server names were some random garbled parts of XML or some
such thing describing the YOU servers. I reset them to use the main Novell
server, and I saw a stream of 6 or 7 new security updates come through. I
installed them without thinking, like usual, and everything looked ok. I
fired up YOU again to make sure, and my list of servers was restored (the
garbled stuff was gone).

After my reboot I started noticing massive instability, the likes of which I
had never seen on this system before. Applications were dying seemingly at
random, to the point where even my stock ticker and weather monitor started
to die for no apparent reason. And once they died, they could not be
re-launched. Reboot seems to clear it up for a short time.

I started looking in my system log to get a clue what is happening. I
noticed two things that I had not seen before. Firstly, I was getting
ssh-bombed by some clown. To be safe, I closed the SSH port in my
firewall. Secondly, when my applications died, I had some very disturbing
information in the log:

Mar 30 14:26:29 sittyfo kernel: Unable to handle kernel paging request at
ffff8000020e1d00 RIP:
Mar 30 14:26:29 sittyfo kernel: <ffffffff8015dd44>{shrink_zone+420}
Mar 30 14:26:29 sittyfo kernel: PGD 0
Mar 30 14:26:29 sittyfo kernel: Oops: 0002 [1]
Mar 30 14:26:29 sittyfo kernel: CPU 0

Every single one of my app failures had one of these in it. It didn't make
any sense to me, however, because I had plenty of paging space available
and really wasn't pushing the system any harder than I normally do. In
fact, in the past (before this problem), I had run some processes which had
exhausted all of the paging space (2GB RAM + 8GB swap), and the system
behaved very gracefully the whole time.

I started suspecting hardware, so I made sure all my fans were clean and
running, power supply voltages nominal, CPU temperature ok, etc. It all
checks out.

Any ideas what I can do to get my stability back? I've been so pleased with
the system for so long and it just all turned to crap on me in a single
day.

The log goes on to describe what looks like failures in kswapd0:

Mar 30 14:26:29 sittyfo kernel: Pid: 185, comm: kswapd0 Tainted: P U
2.6.11.4-21.15-default
Mar 30 14:26:29 sittyfo kernel: RIP: 0010:[<ffffffff8015dd44>]
<ffffffff8015dd44>{shrink_zone+420}
Mar 30 14:26:29 sittyfo kernel: RSP: 0000:ffff81007fa15b98 EFLAGS: 00010086
Mar 30 14:26:29 sittyfo kernel: RAX: ffffffff803e68a0 RBX: 0000000000000003
RCX: ffff8100020e1d38
Mar 30 14:26:29 sittyfo kernel: RDX: ffff8000020e1d00 RSI: ffffffff803e68a0
RDI: 0000000000000078
Mar 30 14:26:29 sittyfo kernel: RBP: 0000000000000003 R08: 0000000000000000
R09: 0000000000000000
Mar 30 14:26:29 sittyfo kernel: R10: 0000000000000000 R11: 0000000000000004
R12: ffffffff803e6720
Mar 30 14:26:29 sittyfo kernel: R13: ffff81007fa15e48 R14: ffff81007fa15dc8
R15: 0000000000000020
Mar 30 14:26:29 sittyfo kernel: FS: 00002aaaab3636e0(0000)
GS:ffffffff804cae80(0000) knlGS:00000000566ccbb0
Mar 30 14:26:29 sittyfo kernel: CS: 0010 DS: 0018 ES: 0018 CR0:
000000008005003b
Mar 30 14:26:29 sittyfo kernel: CR2: ffff8000020e1d00 CR3: 000000007a955000
CR4: 00000000000006e0
Mar 30 14:26:29 sittyfo kernel: Process kswapd0 (pid: 185, threadinfo
ffff81007fa14000, task ffff81007fa10e90)
Mar 30 14:26:29 sittyfo kernel: Stack: 0000000100000000 ffff8100432ff6e8
0000002000000000 ffffffff00000000
Mar 30 14:26:29 sittyfo kernel: 0000000000000020 0000000000000000
0000000000000000 000000000000002b
Mar 30 14:26:29 sittyfo kernel: 0000000000000000 0000000000000000
Mar 30 14:26:29 sittyfo kernel: Call
Trace:<ffffffff80146b80>{autoremove_wake_function+0}
<ffffffff8015ed5d>{balance_pgdat+557}
Mar 30 14:26:29 sittyfo kernel: <ffffffff8015f117>{kswapd+295}
<ffffffff80146b80>{autoremove_wake_function+0}
Mar 30 14:26:29 sittyfo kernel:
<ffffffff80146b80>{autoremove_wake_function+0}
<ffffffff8010f1cb>{child_rip+8}
Mar 30 14:26:29 sittyfo kernel: <ffffffff8015eff0>{kswapd+0}
<ffffffff8010f1c3>{child_rip+0}
Mar 30 14:26:29 sittyfo kernel:
Mar 30 14:26:29 sittyfo kernel:
Mar 30 14:26:29 sittyfo kernel: Code: 48 89 02 48 c7 41 08 00 02 20 00 48 c7
01 00 01 10 00 ff 41
Mar 30 14:26:29 sittyfo kernel: RIP <ffffffff8015dd44>{shrink_zone+420} RSP
<ffff81007fa15b98>
Mar 30 14:26:29 sittyfo kernel: CR2: ffff8000020e1d00

--
Reverse the parts of the e-mail address to reply by mail.

0 new messages