Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Tiac News Servers BOMB AGAIN

0 views
Skip to first unread message

megab...@my-dejanews.com

unread,
Jan 22, 1999, 3:00:00 AM1/22/99
to
Hi,

I'm sure many of you are wondering what happened to news and why it took
14 hours to fix it. Basically, a power failure caused a corruption of
the meta-data on the disk array which housed the non-alt groups. We were
unaware of this corruption until after we allowed the RAID controllers
to re-build the arrays, which took several hours. When we brought the
machine back online, it kernel panicked a short while later. After
testing the hardware, we tested the filesystems and found (by
elimination) the corrupt array. Although the data was sound, the
meta-data corruption was not recoverable.

In truth, it was possibly only one file that caused the array to fail,
but any access, including trying to delete it would have caused a kernel
panic. Not knowing which file it was, we decided to only back up the
tiac heirarchy (hoping it was not one of the tiac messsages) and
re-create a filesystem on the array.

So what happened to alt? In short, we nuked the alt spool and swapped
arrays. Most of the articles remaining in alt would expire by morning
anyway. The array alt sat on was considerably slower than the one
non-alt was on, and alt was suffering for it, as evidenced by a small
but consistent backlog of alt articles and a 99% wait state for I/O for
that array. Users should now see a significant improvement in alt
propagation, with little or no difference in the non-alts.

Obviously, existing articles were deleted, so you may see some 'article
no longer on server' messages until the current expire finishes. Sadly,
some articles that spooled during the outage will probably expire in the
spools and never make it to news.tiac.net, but I hope to see several
gigs of backfill overnight. On the bright side, at least no one has to
renumber.

Thanks for your patience,
Eric

--
Eric F Paul | The only prerequisite
Manager of Software Engineering | for completing any task is unwavering
The Internet Access Company | faith that it can be done.

-----------== Posted via Deja News, The Discussion Network ==----------
http://www.dejanews.com/ Search, Read, Discuss, or Start Your Own

joet

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
In article <AT8q2.122$h95....@news.shore.net>,
eskw...@SPAMBLOCK.shore.net says...

> megab...@my-dejanews.com wrote:
> | Hi,
>
> | I'm sure many of you are wondering what happened to news and why it took
> | 14 hours to fix it. Basically, a power failure caused a corruption of
>
> I assume that your UPS failed too. What a coincidence...

Funny you should mention this. I just had a power failure here the other
day.

Four or five times/year the power blips and my UPS's ride it out. Nice.
But at least twice a year, like wednesday, a UPS will crap out on its own
and take something off the air.

Sure APC's have hot-swappable batteries and all that, but when the
electronics freak out, you're SOL. I guess statistically I'm ahead of
the game, but it's not perfect.

--
+---- joe tomkowitz ----+ When something doesn't work, really force it
| jo...@jtcs.net | because maybe it'll go. And if it doesn't
| http://www.jtcs.net | or if it breaks, you needed to fix it anyway.
+-----------------------+ -my dad, ca. 1970

Charles Demas

unread,
Jan 23, 1999, 3:00:00 AM1/23/99
to
In article <AT8q2.122$h95....@news.shore.net>,

<eskw...@SPAMBLOCK.shore.net> wrote:
>megab...@my-dejanews.com wrote:
>| Hi,
>
>| I'm sure many of you are wondering what happened to news and why it took
>| 14 hours to fix it. Basically, a power failure caused a corruption of
>
>I assume that your UPS failed too. What a coincidence...

Additional information has it that TIAC was doing some raised floor
area modifications, had move the news-server machine, and that someone
triped over the power cord, unpluging the server, and when they plugged
it in again, something was screwed up with the disks.

This is so comical, it's probably true.


Chuck Demas
Needham, Mass.

--
Eat Healthy | _ _ | Nothing would be done at all,
Stay Fit | @ @ | If a man waited to do it so well,
Die Anyway | v | That no one could find fault with it.
de...@tiac.net | \___/ | http://www.tiac.net/users/demas

0 new messages