BGG runs on a shoestring budget, and can't afford the sort of hardware
that would allow for continuous service during maintenance.
Today there are several things going on. First, one of our MongoDB
(database) servers, sackson, is being replaced with a new system. The
goal was to transfer the database to the new server AND pull the old
system from the rack on the same trip. (BGG has only one rack of
servers, and it is full--any time we install a new server, we have to
remove an old one.) In order to do so, we needed to make a full backup
of the database while the site is down--if it were up, we would be
constantly writing to the database. That full backup alone takes a
long time--a couple of hours, I believe. So we shut down the system
and started the backup before Aldie headed to the server room.
Another reason for the trip was that one of the hard drives had failed
on our main file server. In order to replace that, we again made a
full backup before shutting it down. That backup took longer than
expected.
An additional purpose of the trip was to increase the hard drive
capacity in three of our systems. This meant swapping out about 20
hard drives, and this also apparently did not go as smoothly as we
would like.
Finally, two additional servers were installed.
As you can see, there was a lot going on in this maintenance trip. If
we could afford to pay someone to be in charge of our hardware, and if
we could afford more redundancy in our servers, we could certainly do
our maintenance with less downtime. Considering our resources,
though, I think we do OK.
-Daniel