Saturday Update

593 views
Skip to first unread message

Scott Alden

unread,
Mar 31, 2012, 1:14:50 PM3/31/12
to bgg_down
Hey Y'all, I wanted to give an update as we're about to head back over
to the ISP and do some more work.

Last night we rebuilt the RAID system on our main file server (due to
a failed hard drive) - even though we have a full backup on another
server, I wanted to have another regular backup before rebuilding the
RAID. This took a bit longer than expected because the way rsync
calculates checksums. However, this wasn't the problem that kept me
up till 6am.

In addition to replacing a failed drive on the main file server, we
wanted to finally retire one of our oldest servers (Sackson). Sackson
runs a database called MongoDB (www.mongodb.org). It's been growing
over time, and we intended to migrate it to one of our newer machines.
The process of backing up and importing a mongo database takes a long
time due to the sheer size of the data and the indices that this
database uses.

During the setup of mongo on a new server, I started noticing the OS
would "stall" several times when writing to disk. It turns out that a
combination of using a virtual server, with the particular hard drive
configuration on this machine doesn't seem to work properly - it takes
far too long to write the data to disk (this is a problem to solve for
a different time). Around 4am I decided to abandon the new server and
go back to a non-virtualized server.

We're going back in today and reinstalling a different OS and moving
things around.

I really apologize for the delay - nobody wants the site back up more than me.

--
Aldie

Mat Nowak

unread,
Mar 31, 2012, 1:19:59 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
For those of us that are not as technical, what does this mean in terms of an ETA?

AtraAngelis

unread,
Mar 31, 2012, 1:26:50 PM3/31/12
to bgg_...@googlegroups.com, bgg_down


On Saturday, March 31, 2012 1:19:59 PM UTC-4, Mat Nowak wrote:
For those of us that are not as technical, what does this mean in terms of an ETA?


This means it will be a few more hours. I would rather them rebuild a stable platform then force a setup thats not optimal. If that takes a day or two, so be it.

mister lunch

unread,
Mar 31, 2012, 1:36:11 PM3/31/12
to BGG Down
Hang in there!

JoergH

unread,
Mar 31, 2012, 1:38:28 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
Hope you got some sleep after that night. I appreciate your work very much and am looking forward to when you're done!

Mathew H-E

unread,
Mar 31, 2012, 1:42:02 PM3/31/12
to BGG Down
Thanks for keeping us in the loop! I can't say that I understand
everything, but working until 6 this morning shows a lot of dedication
to your site and to us--I really appreciate it.

Best of luck with your work this weekend!

On Mar 31, 10:14 am, Scott Alden <scott.al...@gmail.com> wrote:

Helljin

unread,
Mar 31, 2012, 1:43:37 PM3/31/12
to BGG Down
Thanks for the update, the hard work is appreciated.

When BGG is back up, I'll quit procrastinating and donate for my 2012
micro badge.

malzspiele

unread,
Mar 31, 2012, 1:44:35 PM3/31/12
to bgg_...@googlegroups.com
Good luck for getting everything up and running again soon.

I'll spend the time playing some more games...  :-)

Jennifer Schlickbernd

unread,
Mar 31, 2012, 2:15:28 PM3/31/12
to BGG Down
Get some sleep! You'll be a lot more effective and we can wait.

On Mar 31, 10:14 am, Scott Alden <scott.al...@gmail.com> wrote:

Keith Koene

unread,
Mar 31, 2012, 1:33:36 PM3/31/12
to BGG Down
Thanks for the update...being an AIX/Linux admin I feel your pain with
upgrading and migrating databases...you never know what you get until
it doesn't work. Keep up the good fight!

Brian Dunkle

unread,
Mar 31, 2012, 1:57:54 PM3/31/12
to BGG Down

Yike. Yeah, any kind of database server with decent traffic is likely
to struggle in a virtual environment...since it's going to eat the
disk I/O. You can do all sorts of prioritization and optimization,
but...much better not to make it share, if you can help it. And set up
the fastest RAID you can afford. My goal is to have a secondary/
failover server that is a VM, with the understanding that the
performance will suck but at least people won't be down. With the size
of BGG, that may not be possible - I imagine it would just thrash and
die i it wasn't powerful enough.

Rebuilding RAIDs is always nerve-wracking. Very wise to make a new
full backup first. I have had a couple of case where the set just
crapped on me when rebuilding after replacing one disk. Once where it
proceeded to rebuild off of the wrong disks (if disks 1+2+3+4+5 were
the degraded set, and 6 was the new disk replacing a broken 6, it
rebuilt using 1+2+4+5+6 as if 3 were the new disk). And of course it
can take forever.

Anyway...as a 20-year server guy I feel your pain.

Sorry to geek out, have a nasty cold and of course nowhere to let my
dopiness out with BGG down. :)

CoolApBro

unread,
Mar 31, 2012, 3:52:47 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
Wow, thanks a lot for sharing all the details. As a Linux system admin I really apreciate to know whats going on, so I can relate and learn from the info.

Best regards

batcut

unread,
Mar 31, 2012, 4:22:59 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
Thanks for the updates. Being an IT person, there is nothing more annoying than constantly being asked "when will you have it done". It will be done when it's done.

Marco Arnaudo

unread,
Mar 31, 2012, 4:29:55 PM3/31/12
to BGG Down
thank you for the dedication and all the great work, Scott!!!


Wulf Corbett

unread,
Mar 31, 2012, 4:34:14 PM3/31/12
to BGG Down
Usually it's my job to tell everyone why the servers are down...

Yes, we did say it would only be an hour.
Yes, it's been a lot longer.
No, I can't say exactly how long it'll be.
No, I can't tell you exactly what's wrong, the people who could tell
ME that are in the Data Centre trying to fix it...

I'm struggling not to become one of them...

peacmyer

unread,
Mar 31, 2012, 5:56:04 PM3/31/12
to BGG Down
Back to ASL, then dinner and a movie! (Man, I could get Bounding Fire
Productions' _Blood and Jungle_ for the cost of the dinner and
movie....)

Mohrlock

unread,
Mar 31, 2012, 7:29:37 PM3/31/12
to bgg_...@googlegroups.com
I hope someone creates a "Sackson server must die!" microbadge for when the server is back up!

Sorry you've got the weight of the world on your shoulders getting things up and running! Please make sure you take your time getting things back in order - we don't need burnt out admins!

Keep up the great work - the majority of us are all happy to wait while things get back in order. Take care Aldie & team :)

Brian Dunkle

unread,
Mar 31, 2012, 8:01:39 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
I beg to differ. More annoying are the people who feel the need to impress upon you just how important it is that the server comes back up, as if you're not already working on it.
Complaining is fine. Whining is even fine. But somehow the other is insulting.

And I DON'T mean anyone here.

Stasia Doster

unread,
Mar 31, 2012, 8:26:51 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
Thanks for all your hard work, Scott.  As much as I miss BGG when it is down, I will be glad that it is up and running again in top shape. I appreciate you and your team doing all this work to keep us BGGers satisfied.

SamNZed

unread,
Mar 31, 2012, 8:40:15 PM3/31/12
to BGG Down
Keep up the good work and take it easy

Boze

unread,
Mar 31, 2012, 9:18:02 PM3/31/12
to BGG Down
Go go power aldie!

Brian Cooksey

unread,
Mar 31, 2012, 10:54:15 PM3/31/12
to BGG Down
Thanks for the update and for all your hard work.

superflat3000

unread,
Mar 31, 2012, 11:30:20 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
BGG isn't a Mom & Pop operation anymore and hasn't been for years. It's "Aw, shucks, folks" attitude doesn't reflect the reality of its use by the users or owners.  
From IT to the Admins (apologies), BGG needs to grow out of the buddy system to something resembling a social network in the teens of the 21st Century.
The only reason this downtime is acceptable is there's really nowhere else to go.
That's part of the problem. If there was somewhere else to go, this wouldn't happen.
 
Am I angry? You bet. I can't think of anywhere else on the internet that this would be acceptable. 
 

On Saturday, March 31, 2012 12:14:50 PM UTC-5, Scott Alden wrote:

cpf

unread,
Mar 31, 2012, 11:40:23 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
On Sunday, 1 April 2012 11:30:20 UTC+8, superflat3000 wrote:
BGG isn't a Mom & Pop operation anymore and hasn't been for years. It's "Aw, shucks, folks" attitude doesn't reflect the reality of its use by the users or owners.  
From IT to the Admins (apologies), BGG needs to grow out of the buddy system to something resembling a social network in the teens of the 21st Century.
The only reason this downtime is acceptable is there's really nowhere else to go.
That's part of the problem. If there was somewhere else to go, this wouldn't happen.

Why dont you pay some full time admin a decent salary with decent budget for hardwares if you think it is unacceptable? hell, go build a new site and compete with them, when they are down, people will use your site. may be.

seriously, the admins are doing their best and in the world of computer, it's not like magic that I want this and I get this. no, computers are strange and tempermental. every plans you have can be out of the window when you execute it. to ensure safety, back up is needed and more time is needed. so, stop whining if you cant afford to pay for a premium version of this site, or create one since there's none yet.

to the admins, thank you for your time on this. hope you can find a long term stable solution. 

bsm...@gmail.com

unread,
Mar 31, 2012, 11:40:59 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
Wow Scott, you gave me a cold shiver in all my IT bones. I don't envy you the task you have at hand. There's nothing worse than having to back-pedal because of an unanticipated behaviour in a software/hardware config.

As for a completion ETA. I am reminded of Rex Harrison and Charlton Heston in the Agony and the Ecstasy:

Pope Julius IIBuonarroti, when will you make an end? 
Michelangelo: When I am finished! 

I'm not a religious man, but I used to use this when people asked how long it would take to do anything to the computers I used to support.   

morlockhq

unread,
Mar 31, 2012, 11:46:55 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
You realize boardgames are a niche hobby, right? From all the interviews that I've heard and read, BGG makes just enough money to run and to hire a bare bones crew. That's it!

frumpish

unread,
Mar 31, 2012, 11:48:24 PM3/31/12
to BGG Down
Huh, here I thought it was down for the UI redesign being implemented.


On Mar 31, 10:40 pm, "bsm...@gmail.com" <bsm...@gmail.com> wrote:
> Wow Scott, you gave me a cold shiver in all my IT bones. I don't envy you
> the task you have at hand. There's nothing worse than having
> to back-pedal because of an unanticipated behaviour in a software/hardware
> config.
>
> As for a completion ETA. I am reminded of Rex Harrison and Charlton Heston
> in the Agony and the Ecstasy:
>
> *Pope Julius II <http://www.imdb.com/name/nm0001322/>*: Buonarroti, when
> will you make an end?
> *Michelangelo <http://www.imdb.com/name/nm0000032/>*: When I am finished!

MSweazey

unread,
Mar 31, 2012, 11:59:32 PM3/31/12
to bgg_...@googlegroups.com, bgg_down
I have to say that I don't understand the whole "this is unacceptable" thing!  Nobody forces anybody here to pay money to be enjoy the services that we use.  My wife and I both donate every year just because we get a whole lot out of the site.  However, it is not providing a life or death service (I know, blasphemy! :) ), nor is it something we are being taxed for.  It's a website.  If this is the worst thing that happens to me, well, this week, then it has been a darn good week!  A little perspective would be helpful...deep breaths, maybe walk outside.  I bet this is a much bigger pain for those working on the fix than it is for us.  Whose weekend do you think it ruined more?

BitJam

unread,
Apr 1, 2012, 12:04:07 AM4/1/12
to bgg_...@googlegroups.com, bgg_down
On Saturday, March 31, 2012 9:30:20 PM UTC-6, superflat3000 wrote:
BGG isn't a Mom & Pop operation anymore and hasn't been for years. It's "Aw, shucks, folks" attitude doesn't reflect the reality of its use by the users or owners.  
From IT to the Admins (apologies), BGG needs to grow out of the buddy system to something resembling a social network in the teens of the 21st Century.
The only reason this downtime is acceptable is there's really nowhere else to go.
That's part of the problem. If there was somewhere else to go, this wouldn't happen.
 
Am I angry? You bet. I can't think of anywhere else on the internet that this would be acceptable.

I agree that after the current crisis has been averted and the people doing the averting have been able to catch up on the sleep and other things then it will be time for a rethink about what we as a group expect from BGG and what we as a group are willing to do to help make BGG meet our expectations. 
 
From my point of view, BGG is still kind of like a Mom & Pop operation which is probably its main attraction for me.  If it becomes massively commercialized like Facebook and other major social networks then I'm outa here.  An outage this long was not intended.  I think it is clear to everyone that it is not a good situation.   I'm reminded of a quote by Ambrose Bierce:

    "You acted unwisely," I cried, "as you see
     By the outcome." He calmly eyed me:
    "When choosing the course of my action," said he,
    I had not the outcome to guide me."


Greg Melhuish

unread,
Mar 31, 2012, 6:49:20 PM3/31/12
to BGG Down
That annoyance is hardly limited to the IT world, sadly.

GreenDude

unread,
Mar 31, 2012, 7:34:38 PM3/31/12
to BGG Down
Hey Aldie,

I hope that you take all this angst and whining as proof positive that
what you're doing really matters to a lot of people. That's gotta
feel good!

(Okay, so it's not so much "angst and whining" as disconsolate
muttering, but you know what I mean. ;) )

Dean.

winterplum

unread,
Mar 31, 2012, 6:28:09 PM3/31/12
to BGG Down
Thank you for the update. You guys are absolutely tops!
...and don't overdo it. Stay healthy.

Tegs

unread,
Apr 1, 2012, 1:11:54 AM4/1/12
to bgg_...@googlegroups.com, bgg_down
Funny Aldie didn't mention the fire!

Apocryphile

unread,
Apr 1, 2012, 1:29:03 AM4/1/12
to bgg_...@googlegroups.com, bgg_down
I must be psychic - I knew when they originally said "intermittent" down time for two days that it would be an entire 48+ hours of uninterrupted blackout.  How did I know that?


Shadow Hexagram

unread,
Apr 1, 2012, 2:13:43 AM4/1/12
to bgg_...@googlegroups.com
To Aldie & the Admins, my sincere thanks to your dedication in bringing us a good up and running website. Of course I miss my RPGG fix of the week-end, but it can wait. Just wanted to show some support though.

GROGnads

unread,
Apr 1, 2012, 2:34:26 AM4/1/12
to bgg_...@googlegroups.com, bgg_down
THIS was due'd to a "Smoldering Office Dalliance" that got itself out of hand & clothes &control into a 'burning conflagration of desire' so after 4 hours the Paramedics were summoned, "true story!"
Reply all
Reply to author
Forward
0 new messages