Back up! And, an explanation

7 views

Skip to first unread message

Pam Sessoms

unread,

Feb 9, 2012, 1:02:38 PM2/9/12

to libraryh3lp

Hey everyone,

The main US server is back up and running. Widget chats should be
fine, and all gateways have been re-enabled. Load looks good. If
you're still having any problems, please email us at
sup...@libraryh3lp.com with details.

We thought folks might appreciate a description of what caused today's
problem and also the similar short outage we had in January. Our
tweets mention hardware issues at Amazon, but that's not a complaint
about Amazon itself. It's just the shortest way to describe the root
of the problem. In fact, we're extremely pleased with Amazon and
their infrastructure is wonderful. Basically, the main US server is
extremely busy, and any little underlying problem quickly gets
magnified by all the use. We have a plan to improve this.

Here is a slightly more technical description from Eric:

--------------------

The most recent problems with the main server (Jan 25 and Feb 8) were
resolved by moving the libraryh3lp to new hardware at Amazon. I don't
want to imply that this is actually a hardware problem at Amazon (EC2
rocks!), so I thought it deserved some explanation.

The weak-point in hosting with Amazon is their Elastic Block Storage
(EBS) system, which is a kind of "virtual disk" over the network.
This has advantages is terms of robustness, but comes with a
significant reduction in speed compared to a local disk. Because
libraryh3lp is so busy (see our birthday blog post... and now that
we're getting into the semester traffic is about twice that), if
there's any significant performance degradation our I/O just gets
behind and can't get caught up - at least not in the middle of the
day. The quickest way to fix this is to shut it all down and move to
new hardware. (And the great thing about EC2 and EBS is that I can do
this in 20 minutes with no loss of data.)

The real fix is to redistribute the work among multiple instances and
multiple EBS volumes, which is what we're moving toward... but not
immediately. I need to study the traffic patterns a bit more in order
to decide the best way to go about this, but we'll probably be doing
some upgrades in the next 2-4 weeks.

Eric

------------------

Best wishes to all,

Pam

On Thu, Feb 9, 2012 at 11:57 AM, Pam Sessoms <pses...@gmail.com> wrote:
> Hi everyone,
>
> Yes, it looks like we've had an underlying hardware problem at Amazon.
> We're currently moving the US instance to another node. Sit tight.
> Updates on Twitter:
>
> https://twitter.com/#!/libraryh3lp
>
> Pam
>
>
> On Thu, Feb 9, 2012 at 11:55 AM, Heiduschke, Victoria
> <Vict...@oregonstate.edu> wrote:
>> We were up and running earlier this morning (7:30am Pacific), but we've been kicked off and can't get back in.
>>
>> Victoria Heiduschke
>> Learning Commons Coordinator
>> Oregon State University Valley Library
>>
>> -----Original Message-----
>> From: libra...@googlegroups.com [mailto:libra...@googlegroups.com] On Behalf Of Matt
>> Sent: Thursday, February 09, 2012 8:47 AM
>> To: libraryh3lp
>> Subject: [libraryh3lp] library h3lp unavailable this morning?
>>
>> It seems that Library H3lp is unavailable this morning - anyone else having difficulty?
>> MATT
>>
>> --
>> You received this message because you are subscribed to the Google Groups "libraryh3lp" group.
>> To post to this group, send email to libra...@googlegroups.com.
>> To unsubscribe from this group, send email to libraryh3lp...@googlegroups.com.
>> For more options, visit this group at http://groups.google.com/group/libraryh3lp?hl=en.
>>

Reply all

Reply to author

Forward

0 new messages