Scalr terminates VMs after Ceph failure

29 views
Skip to first unread message

Steffen

unread,
Jan 6, 2017, 10:55:43 AM1/6/17
to scalr-discuss
Hi,

we are running a self hosted Scalr 5.11.22 with Openstack Mitaka and Ceph. 
We had a ceph problem, but vms are running, but when we are rebooting them they get removed due to a ceph problem. That is expected. 

The problem is that Scalr "terminated" the instance and thats not what I expected, because we can restore the vm image in ceph. 

Is that an expected behavior from Scalr or can we configure this somehow? 

Many Thanks

Steffen

Marc O'Brien

unread,
Jan 6, 2017, 11:18:12 AM1/6/17
to scalr-discuss
Hi Steffen,

Typically Scalr will only terminate running instances when autoscaling down, or when the cloud platform reports the instance as failed (or missing, in some cases).  Take a look at the System Log tab in the Scalr UI and /opt/scalr-server/var/log/service/*.log for more details about the termination process related to the servers in question.  This will give us more information about why Scalr is terminating the instance.

Many thanks,
Wm. Marc O'Brien
Scalr Technical Support

Steffen Wirth

unread,
Jan 6, 2017, 11:21:36 AM1/6/17
to scalr-...@googlegroups.com
Hi Marc,

autoscaling is disabled for the farm, but the instance was marked as error in Openstack and got removed. Is it possible to configure Scalr to show a Warning that it cant find the vm in the cloud and not terminate "the rest of it" in Scalr? 

Scalr logs: Jan 06 14:08:15 +00:00 - FarmLog@6456 - WARN - [FarmID: 388] Server '1e37700b-959e-4eb1-9141-9c39c338b0cc' (Platform: openstack) was terminated in cloud or from within an OS. Status: ERROR (Fault message: ).

Steffen

--
You received this message because you are subscribed to a topic in the Google Groups "scalr-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scalr-discuss/4TKnqUmNhPk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scalr-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Steffen Wirth
System Engineer

glispa GmbH
Sonnenburger Straße 73
10437 Berlin, Germany

tel: +49 30 6098483-0
fax: +49 30 6098483-99
skype: steffenoffice
steffen.wirth@glispamedia.com
www.glispa.com

Sitz Berlin, AG Charlottenburg HRB 114678B

Marc O'Brien

unread,
Jan 6, 2017, 11:41:42 AM1/6/17
to scalr-...@googlegroups.com
Hi Steffen,

This is indeed expected behavior on this version.  Scalr takes the cloud-native approach to resource management and will terminate failed instances to free up resources for a new replacement.  We recognize that although this is not a cloud-native approach, in some cases users will need to prevent Scalr from terminating failed or errored instances so that they may complete their own investigations or issue remediation.  This use case is planned to be covered in the next Enterprise Scalr release.

Steffen Wirth

unread,
Jan 6, 2017, 11:46:13 AM1/6/17
to scalr-...@googlegroups.com
Hi Marc,

thank you very much. Looking forward to it :) 

Steffen

On Fri, Jan 6, 2017 at 5:41 PM, Marc O'Brien <ma...@scalr.com> wrote:
Hi Steffen,

This is indeed expected behavior on this version.  Scalr takes the cloud-native approach to resource management and will terminate failed instances to free up resources for a new replacement.  We recognize that although this is not a cloud-native approach, in some cases users will need to prevent Scalr from terminating failed or errored instances so that may complete their own investigations or issue remediation.  This use case is planned to be covered in the next Enterprise Scalr release.


Many thanks,
Wm. Marc O'Brien
Scalr Technical Support

--
You received this message because you are subscribed to a topic in the Google Groups "scalr-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scalr-discuss/4TKnqUmNhPk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scalr-discuss+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Brandon Newport

unread,
Apr 21, 2017, 3:47:36 PM4/21/17
to scalr-discuss
Hello Steffen,

We are using the same version of Community Scalr and we ran into the same problem with OpenStack Kilo.  See a workaround that was provided to us from another Scalr user here.

We have implemented this in our environment and it has been working great.  When a system in OpenStack goes into an error state, it will not terminate the instance and rebuild, giving you the time to do any type of restore you need.  Keep in mind this changes the built in functionality that Scalr offers to rapidly rebuild your instance.
Hope this helps.

Brandon

On Friday, January 6, 2017 at 10:46:13 AM UTC-6, Steffen wrote:
Hi Marc,

thank you very much. Looking forward to it :) 

Steffen
On Fri, Jan 6, 2017 at 5:41 PM, Marc O'Brien <ma...@scalr.com> wrote:
Hi Steffen,

This is indeed expected behavior on this version.  Scalr takes the cloud-native approach to resource management and will terminate failed instances to free up resources for a new replacement.  We recognize that although this is not a cloud-native approach, in some cases users will need to prevent Scalr from terminating failed or errored instances so that may complete their own investigations or issue remediation.  This use case is planned to be covered in the next Enterprise Scalr release.

Many thanks,
Wm. Marc O'Brien
Scalr Technical Support

--
You received this message because you are subscribed to a topic in the Google Groups "scalr-discuss" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/scalr-discuss/4TKnqUmNhPk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to scalr-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Steffen Wirth
System Engineer

glispa GmbH
Sonnenburger Straße 73
10437 Berlin, Germany

tel: +49 30 6098483-0
fax: +49 30 6098483-99
skype: steffenoffice
Reply all
Reply to author
Forward
0 new messages