Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Why did it crash?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Martin Sweeney  
View profile  
 More options May 7 2009, 4:10 am
From: Martin Sweeney <martin.swee...@gmail.com>
Date: Thu, 7 May 2009 01:10:13 -0700 (PDT)
Local: Thurs, May 7 2009 4:10 am
Subject: Why did it crash?
So my farm decided to crash this morning, all backups and database
bundles worked fine and another set of instances are in its place.
Hurrah!

What concerns me is why all four instances decided to crash within 3
minutes of each other. They're not connected by anything other than
connections to databases and memcache servers etc, but they all went
at once.

Instance 'i-46f94bxx' found in database but not found on EC2. Crashed.
Instance 'i-9a9014xx' found in database but not found on EC2. Crashed.
Instance 'i-29009axx' found in database but not found on EC2. Crashed.
Instance 'i-27a2ccxx' found in database but not found on EC2. Crashed.

Is there anywhere I can find more info on this other than my Event
log?

M.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Kovalyov  
View profile  
 More options May 7 2009, 10:11 am
From: Alex Kovalyov <alex.koval...@gmail.com>
Date: Thu, 7 May 2009 07:11:52 -0700 (PDT)
Local: Thurs, May 7 2009 10:11 am
Subject: Re: Why did it crash?
Martin, it was a user error on Scalr.net side. Dev version of poller
has gone  nuts and selectively terminated instances on few farms
before it was killed.

On 7 май, 11:10, Martin Sweeney <martin.swee...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Niv  
View profile  
 More options May 7 2009, 10:22 am
From: Niv <nivsin...@gmail.com>
Date: Thu, 7 May 2009 07:22:20 -0700 (PDT)
Local: Thurs, May 7 2009 10:22 am
Subject: Re: Why did it crash?
ruined my day & upcoming weekend + major data loss + ~20 extra
instances running for several hours doing nothing. yay.

On May 7, 5:11 pm, Alex Kovalyov <alex.koval...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Niv  
View profile  
 More options May 7 2009, 10:30 am
From: Niv <nivsin...@gmail.com>
Date: Thu, 7 May 2009 07:30:16 -0700 (PDT)
Local: Thurs, May 7 2009 10:30 am
Subject: Re: Why did it crash?
and i have to add that the cause of the major data loss is your no-
good way of doing the snapshots. once a snapshot creation starts, the
older snapshot is immediately corrupt.
your human error caused my instances to crash mid-snapshot creation
and when restarted, the servers failed to download the snapshot and
kept terminating.
this bug was submitted more than six months ago and you've done
absolutely nothing to fix it.

On May 7, 5:22 pm, Niv <nivsin...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Donovan Bray  
View profile  
 More options May 7 2009, 9:41 pm
From: Donovan Bray <donno...@gmail.com>
Date: Thu, 7 May 2009 18:41:12 -0700
Local: Thurs, May 7 2009 9:41 pm
Subject: Re: Why did it crash?
We lost instances out of several farms, luckily no known corruption.  
But I agree the snapshots should use the same pattern the mysql dumps  
do.     It's basic backup practice to never overwite your last backup  
with your next, it's like backing up to the same tape every night.  
You are eventually going to get bit and bit hard.  We created a task  
to grab the periodic snapshots and store them off s3, but would still  
like to have rolling snapshots.

On May 7, 2009, at 7:30 AM, Niv <nivsin...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Cole  
View profile  
 More options May 7 2009, 3:08 pm
From: Cole <coleflour...@gmail.com>
Date: Thu, 7 May 2009 12:08:00 -0700 (PDT)
Subject: Re: Why did it crash?
Woa, this is kind of a deal breaker here!  Did this really happen?
Rightscale's seeming quite cost-effective now if this is the case!

On May 7, 10:30 am, Niv <nivsin...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
rainier2  
View profile  
 More options Jul 6 2009, 2:39 pm
From: rainier2 <nick.stie...@gmail.com>
Date: Mon, 6 Jul 2009 11:39:15 -0700 (PDT)
Local: Mon, Jul 6 2009 2:39 pm
Subject: Re: Why did it crash?
Hey, just looking for a little closure here.

Was this a newly deployed production poller, or what it the dev poller
that broke out of the dev sandbox?

Has Scalr.net taken any actions to prevent a similar problem in the
future?

Thanks!

On May 7, 12:08 pm, Cole <coleflour...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Esé  
View profile  
 More options Jul 13 2009, 7:41 pm
From: Esé <opusdpeng...@gmail.com>
Date: Mon, 13 Jul 2009 16:41:53 -0700 (PDT)
Local: Mon, Jul 13 2009 7:41 pm
Subject: Re: Why did it crash?
hey folks,

would love to get an update on this as well. it's a little terrifying
to hear about rogue dev scalr processes killing production farms. are
there safeguards in place now to prevent this kind of thing happening?

hoping for a speedy response, thanks!

E.

On Jul 6, 11:39 am, rainier2 <nick.stie...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Nickolas Toursky  
View profile  
 More options Jul 14 2009, 12:04 pm
From: Nickolas Toursky <hin...@gmail.com>
Date: Tue, 14 Jul 2009 19:04:19 +0300
Local: Tues, Jul 14 2009 12:04 pm
Subject: Re: Why did it crash?
Hi guys,

We have developed a new staging environment after this has happened.
It gives us an ability to test the new features more accurately before
deploying them live.

Nick

2009/7/14 Esé <opusdpeng...@gmail.com>:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
kenvogt  
View profile  
 More options Jul 14 2009, 1:33 pm
From: kenvogt <kenneth.v...@gmail.com>
Date: Tue, 14 Jul 2009 10:33:08 -0700 (PDT)
Local: Tues, Jul 14 2009 1:33 pm
Subject: Re: Why did it crash?
So I'm not clear as to where things stand now. Are there rolling
snapshots or not?

On Jul 14, 9:04 am, Nickolas Toursky <hin...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »