Unable to spin up new EC2 instances after upgrading to 5.11.22 (Community Edition)

76 views
Skip to first unread message

Jay Farschman

unread,
Jul 7, 2016, 1:32:38 PM7/7/16
to scalr-discuss
After upgrading to 5.11.22 (Community Edition), I am unable to spin up new instances. The instances become stuck in various states. For example, if I spin up 3 instances I'll get 2 stuck in "pending" status, and 1 will make it to "Initializing". Very rarely, perhaps one time in 10 and instance will properly complete all of the required steps and show a status of "Running". In all cases the instances start in Amazon Ec2 and I can log in.



# scalarizr -v
Scalarizr 4.7.14

# ps -ef |grep scal
root      1354     1  0 16:30 ?        00:00:00 /opt/scalarizr/embedded/bin/python /usr/bin/scalr-upd-client --daemonize --pid-file /var/run/scalr-upd-client.pid --log-file /var/log/scalarizr_update.log
root      1415     1  0 16:30 ?        00:00:11 /opt/scalarizr/embedded/bin/python /usr/bin/scalarizr -c /etc/scalr

ON A SERVER STUCK PENDING
The scalarizr_debug.log hangs at:
2016-07-07 16:29:27,764+00:00 - DEBUG - scalarizr.messaging.p2p.consumer - Waiting message acknowledge event: HostInit

ON A SERVER STUCK INITALIZING
The scalarizr_debug.log gets past this because it received and decodes a 'control' message. It does a number of other things, and then shortly after completing a dpkg-query it starts attempts to start some message handlers and the logs stop.

1167 2016-07-07 16:30:48,102+00:00 - DEBUG - scalarizr.messaging.p2p.consumer - Starting message subhandler thread: BeforeHostUpHandler
1168 2016-07-07 16:30:48,102+00:00 - DEBUG - scalarizr.messaging.p2p.consumer - Waiting message subhandler thread: BeforeHostUpHandler
1169 2016-07-07 16:30:48,102+00:00 - DEBUG - scalarizr.messaging.p2p.consumer - Starting message handler

I am struggling a bit with reasons this could happen, but I am leaning toward a load issue causing a timeout.

top - 17:29:08 up 130 days, 15:58,  2 users,  load average: 2.62, 3.27, 3.31

Tasks: 232 total,   3 running, 229 sleeping,   0 stopped,   0 zombie

%Cpu(s):  7.1 us,  5.5 sy,  0.0 ni, 79.4 id,  0.3 wa,  0.0 hi,  0.0 si,  7.7 st

KiB Mem:   3854816 total,  3436444 used,   418372 free,   103544 buffers

KiB Swap:        0 total,        0 used,        0 free.   213720 cached Mem


I'm currently testing with Ubuntu Images loaded with the 4.7.14 agent.  I'll try a different operating system and perhaps some different agent versions.

Jay Farschman

unread,
Jul 7, 2016, 2:20:58 PM7/7/16
to scalr-discuss
ON THE SERVER SIDE

 I dug into the server logs and get the following in my php-scalarizr_messaging.log:

Jul 07 11:07:39 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23619 - ERROR - Cannot handle message 'HostInit' (message_id: 0d58b6be-d0a1-48af-9532-133bbaad76c1) from server '52.40.180.82' (server_id: 567174e9-0d52-4e06-b9ca-675f6be3a4e6). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)

and when I did a grep "Could not find the service"  php-scalarizr_messaging.log I got a lot of error messages from my testing today with different messages that could not be handled (HostInit, RebootFinish, WinHostdown, etc.).

Igor Savchenko

unread,
Jul 7, 2016, 2:29:57 PM7/7/16
to scalr-...@googlegroups.com
Hi Jay,

We're looking into this.

Regards,
Igor

--
You received this message because you are subscribed to the Google Groups "scalr-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scalr-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jay Farschman

unread,
Jul 7, 2016, 4:01:12 PM7/7/16
to scalr-discuss
I'm gonna take a brief break and walk the dog but I can leave it like this for a few more hours.

If you want to do a google hangout, let me know.

I'm thinking of forcing the a reinstall of the packages

Vitaliy Demidov

unread,
Jul 7, 2016, 5:01:29 PM7/7/16
to scalr-discuss
Hi, Jay
May I ask you to replace "({$e->getFile()}:{$e->getLine()})" with $e->getTraceAsString()  in

app/src/Scalr/System/Zmq/Cron/Task/ScalarizrMessaging.php

$logger->error(sprintf("Cannot handle message '%s' (message_id: %s) " . "from server '%s' (server_id: %s). %s",
                        $message->getName(),
                        $message->messageId,
                        $dbserver->remoteIp ? $dbserver->remoteIp : '*no-ip*',
                        $dbserver->serverId,
                        $e->getMessage() . "({$e->getFile()}:{$e->getLine()})"
                    ));

Restart zmq_services and provide error messages from the log. It should include debug trace that can help a lot.

Jay Farschman

unread,
Jul 7, 2016, 5:25:48 PM7/7/16
to scalr-discuss
That did not seem to deliver any better information, but I'm not sure I did what you intended:

CHANGED app/src/Scalr/System/Zmq/Cron/Task/ScalarizrMessaging.php
$logger->error(sprintf("Cannot handle message '%s' (message_id: %s) " . "from server '%s' (server_id: %s). %s",
    $message->getName(),
    $message->messageId,
    $dbserver->remoteIp ? $dbserver->remoteIp : '*no-ip*',
    $dbserver->serverId,
    $e->getMessage() . "$e->getTraceAsString()"
));

OUTPUT of php-scalarizr_messaging.log
Jul 07 14:12:09 -07:00 - scalarizr_messaging@24491 - SERVICE - Client recieved termination SIGNAL:15
Jul 07 14:12:19 -07:00 - scalarizr_messaging@12760 - SERVICE - Starting scalarizr_messaging...
Jul 07 14:12:20 -07:00 - scalarizr_messaging@12760 - SERVICE - Memory usage: 20.93 MB
Jul 07 14:16:48 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23926 - ERROR - Cannot handle message 'HostInit' (message_id: c48a0c07-ff9e-4a97-b1e0-8447eaff0ba8) from server '50.112.197.90' (server_id: 371c501c-61fe-4e81-8f0a-6fa82b83239b). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 14:16:48 -07:00 - scalarizr_messaging@23926 - SERVICE - Memory usage: 18.25 MB
Jul 07 14:17:23 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23617 - ERROR - Cannot handle message 'HostInit' (message_id: 0c45e7b8-cb8a-46d6-80ff-19a57a1af31e) from server '50.112.32.218' (server_id: 0fc7c35f-6f1e-4d9c-a55b-40e6bbaf4f8e). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 14:17:23 -07:00 - scalarizr_messaging@23617 - SERVICE - Memory usage: 19.89 MB
Jul 07 14:17:49 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23619 - ERROR - Cannot handle message 'HostInit' (message_id: c7198f93-abc4-46bb-8d71-a151d0657112) from server '52.38.110.229' (server_id: d84d3905-5bc9-4454-93e5-40281d4956b7). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 14:17:49 -07:00 - scalarizr_messaging@23619 - SERVICE - Memory usage: 18.65 MB
Jul 07 14:17:55 -07:00 - DBServer@23930 - INFO - [FarmID: 128] Sending message 'HostInitResponse' via REST to server '52.40.134.120' (server_id: 283b330e-f52a-4479-b1a7-56328bc5447d) completed
Jul 07 14:17:55 -07:00 - scalarizr_messaging@23930 - SERVICE - Memory usage: 36.08 MB
Jul 07 14:18:01 -07:00 - scalarizr_messaging@23949 - SERVICE - Memory usage: 24.59 MB
Jul 07 14:18:08 -07:00 - scalarizr_messaging@23962 - SERVICE - Memory usage: 25.00 MB
Jul 07 14:18:14 -07:00 - scalarizr_messaging@23717 - SERVICE - Memory usage: 22.81 MB
Jul 07 14:18:40 -07:00 - DBServer@7367 - INFO - [FarmID: 128] Sending message 'HostInitResponse' via REST to server '52.37.21.229' (server_id: f5532f40-b52c-4b40-a059-bec7c93fad44) completed
Jul 07 14:18:40 -07:00 - scalarizr_messaging@7367 - SERVICE - Memory usage: 33.80 MB
Jul 07 14:18:45 -07:00 - Scalr@12174 - FATAL - Exception thrown in Scalr::FireEvent(Scalr\Observer\MessagingEventObserver:BeforeHostUp, /opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:119): keychain() is not callable or does not exist.
Jul 07 14:18:45 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@12174 - ERROR - keychain() is not callable or does not exist.
Jul 07 14:18:45 -07:00 - scalarizr_messaging@12174 - SERVICE - Memory usage: 23.85 MB
Jul 07 14:19:11 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23926 - ERROR - Cannot handle message 'HostInit' (message_id: dc60250e-92e3-4e21-aadf-917922f97eb0) from server '52.42.14.129' (server_id: 0cb11e57-fce4-44d0-93a6-2da95a6ac8e6). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)

Igor Savchenko

unread,
Jul 7, 2016, 5:29:06 PM7/7/16
to scalr-...@googlegroups.com
Hi Jay,

Did you run: "scalr-server-ctl reconfigure" after upgrade?

Regards,
Igor

Jay Farschman

unread,
Jul 7, 2016, 5:54:49 PM7/7/16
to scalr-discuss
No, I use scale-server-manage and restarted just zmq_service there.  But I'm doing a reconfigure now.  Always a good idea.

Igor Savchenko

unread,
Jul 7, 2016, 5:56:39 PM7/7/16
to scalr-...@googlegroups.com
What exactly you did with scale-server-manage? Running scalr-server-ctl reconfigure - is a part of a standard upgrade process. Update itself is a pretty much install a new package and run "scalr-server-ctl reconfigure". 

Regards,
Igor


Jay Farschman

unread,
Jul 7, 2016, 6:09:51 PM7/7/16
to scalr-discuss
From the scalr-server-manage superviror> prompt I did a "restart zmq_service "

I did a reconfigure after the upgrade and just did another one after the code change.  

This last batch of system came up like this:


NEW OUTPUT - AFTER Reconfigure
Jul 07 14:47:58 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@12176 - ERROR - keychain() is not callable or does not exist.
Jul 07 14:47:58 -07:00 - scalarizr_messaging@12176 - SERVICE - Memory usage: 24.65 MB
Jul 07 14:49:06 -07:00 - DBServer@7367 - INFO - [FarmID: 128] Sending message 'HostInitResponse' via REST to server '52.42.43.86' (server_id: 3cac54a2-4f38-43f2-95e2-ae516ac86fc3) completed
Jul 07 14:49:06 -07:00 - scalarizr_messaging@7367 - SERVICE - Memory usage: 34.44 MB
Jul 07 14:49:16 -07:00 - Scalr@23705 - FATAL - Exception thrown in Scalr::FireEvent(Scalr\Observer\MessagingEventObserver:BeforeHostUp, /opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:119): keychain() is not callable or does not exist.
Jul 07 14:49:16 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23705 - ERROR - keychain() is not callable or does not exist.
Jul 07 14:49:16 -07:00 - scalarizr_messaging@23705 - SERVICE - Memory usage: 23.57 MB
Jul 07 14:52:20 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23926 - ERROR - Cannot handle message 'HostInit' (message_id: 433b089e-07d4-48cc-98f4-305b4d5cdb77) from server '52.38.189.90' (server_id: b28598e2-b367-42f9-a19e-0ba45767e834). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 14:52:20 -07:00 - scalarizr_messaging@23926 - SERVICE - Memory usage: 18.40 MB
Jul 07 14:52:35 -07:00 - scalarizr_messaging@12760 - SERVICE - Memory usage: 21.26 MB
Jul 07 14:59:30 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23617 - ERROR - Cannot handle message 'HostInit' (message_id: 8cb38b7e-bd53-43ce-9cbb-cbb4092bcdbc) from server '50.112.212.4' (server_id: b44665fa-fb1a-4b6d-9c13-a17270ef0a5d). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 14:59:30 -07:00 - scalarizr_messaging@23617 - SERVICE - Memory usage: 19.90 MB
Jul 07 15:02:24 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23619 - ERROR - Cannot handle message 'HostInit' (message_id: 6371df4c-6760-4332-ad2d-fa2a07dfd10e) from server '52.41.124.191' (server_id: ce467758-8bf5-4ebd-a538-cc059237f960). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 15:02:24 -07:00 - scalarizr_messaging@23619 - SERVICE - Memory usage: 18.99 MB
Jul 07 15:02:39 -07:00 - scalarizr_messaging@12760 - SERVICE - Memory usage: 21.26 MB
Jul 07 15:03:26 -07:00 - DBServer@23930 - INFO - [FarmID: 128] Sending message 'HostInitResponse' via REST to server '52.10.196.107' (server_id: 1edfabf6-509c-4550-bff7-8947394fef22) completed
Jul 07 15:03:26 -07:00 - scalarizr_messaging@23930 - SERVICE - Memory usage: 36.86 MB
Jul 07 15:03:32 -07:00 - scalarizr_messaging@23955 - SERVICE - Memory usage: 25.25 MB

Fatal error: require(): Failed opening required '/opt/scalr-server/embedded/scalr/app/src/Scalr/AuditLogger.php' (include_path='.::/opt/scalr-server/embedded/scalr/app/src:/opt/scalr-server/embedded/scalr/app/src') in /opt/scalr-server/embedded/scalr/app/src/autoload.inc.php on line 123
Jul 07 15:05:25 -07:00 - scalarizr_messaging@12760 - FATAL - Some worker failed!
Jul 07 15:05:30 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@23926 - ERROR - Cannot handle message 'HostInit' (message_id: 940a7c7c-67d2-4cf0-ab94-6f04de31daee) from server '52.42.56.102' (server_id: 8a5435a1-1e9e-459d-950e-27d271f5c002). Could not find the service "environment" in the DI container.(/opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:91)
Jul 07 15:05:30 -07:00 - scalarizr_messaging@12206 - SERVICE - Memory usage: 23.17 MB
Jul 07 15:05:30 -07:00 - scalarizr_messaging@23926 - SERVICE - Memory usage: 18.33 MB
Jul 07 15:05:30 -07:00 - DBServer@7367 - INFO - [FarmID: 128] Sending message 'HostInitResponse' via REST to server '52.42.54.72' (server_id: 1dbf28d0-a6ab-4505-bc74-52e26143c355) completed
Jul 07 15:05:31 -07:00 - scalarizr_messaging@7367 - SERVICE - Memory usage: 35.13 MB
Jul 07 15:05:37 -07:00 - Scalr@12172 - FATAL - Exception thrown in Scalr::FireEvent(Scalr\Observer\MessagingEventObserver:BeforeHostUp, /opt/scalr-server/embedded/scalr/app/src/Scalr/DependencyInjection/BaseContainer.php:119): keychain() is not callable or does not exist.
Jul 07 15:05:37 -07:00 - Scalr\System\Zmq\Cron\Task\ScalarizrMessaging@12172 - ERROR - keychain() is not callable or does not exist.
Jul 07 15:05:37 -07:00 - scalarizr_messaging@12172 - SERVICE - Memory usage: 23.78 MB

Igor Savchenko

unread,
Jul 7, 2016, 6:40:48 PM7/7/16
to scalr-...@googlegroups.com
Can you please provide an output of scalr-server-ctl reconfigure run.

Thanks,
Igor

Jay Farschman

unread,
Jul 7, 2016, 8:41:39 PM7/7/16
to scalr-discuss
Sure... attached

Thanks for looking at this.

scalr-server-ctl-reconfigure.lst

Jett

unread,
Jul 7, 2016, 9:12:25 PM7/7/16
to scalr-discuss
I'm having the same issue - hoping it can be resolved relatively quickly.  Can't seem to get any instances to start up.

Jett

unread,
Jul 8, 2016, 8:45:04 AM7/8/16
to scalr-discuss
Just to add a bit more info here.  I downgraded, ran the scalr reconfigure script, then upgraded again and ran the reconfigure script again and I seem to be able to launch instances (at least the first one I tried to launch actually ran).  I wonder if something failed during the update initially.

Jay Farschman

unread,
Jul 8, 2016, 8:49:32 AM7/8/16
to scalr-discuss
Very interesting, Jett.

What process did you use to downgrade?  I'll probably sit on this issue a bit more today and wait to hear back from Vitaliy and Igor, but if people grumble too much I like you backup plan.

Igor Savchenko

unread,
Jul 8, 2016, 9:24:39 AM7/8/16
to scalr-...@googlegroups.com
Jett, downgrade officialy not supported. Installing old package won't downgrade your db schema, so make sure you reverted it from the backup.

Jay, can you please provide a dump of upgrades table from the scalr db.

We've tested new deployments and ipgrades as well and didn't find any issues. So something went terribly wrong with your deployment, most likely db update failed for some reason. 

Jett

unread,
Jul 8, 2016, 9:36:59 AM7/8/16
to scalr-discuss
May not be officially supported, but I wasn't running the downgraded version - I just speculated that my upgrade broke and decided to try to re-do the process to see if I could figure out where it broke.

Jay, this is what I did:

sudo apt-get install scalr-server=5.10.21.oss~nightly.20160205115632.19.3f2b1f0.5d525b1-1
sudo /opt/scalr-server/bin/scalr-server-ctl reconfigure
sudo apt-get upgrade
sudo /opt/scalr-server/bin/scalr-server-ctl reconfigure

After that, things started working for me again.  I'm guessing something broke during the upgrade, and it somehow got fixed by re-doing the upgrade.

Dave
To unsubscribe from this group and stop receiving emails from it, send an email to scalr-discuss+unsubscribe@googlegroups.com.

Jay Farschman

unread,
Jul 8, 2016, 10:02:43 AM7/8/16
to scalr-discuss
Igor,

Apologies, but I've stayed out of the scalr DB up until today and I'm unsure how to get access.  Also, we are out of coffee this morning.

So, I examined the process found the port and location of the mysql binary. I examined the scalr-server-secrets.json and found a password for root and scalr-mysql, so I tried them both.  This is scalr-mysql:

# /opt/scalr-server/embedded/bin/mysql --port=6280 -uscalr-mysql -p
Enter password: 
ERROR 1045 (28000): Access denied for user 'scalr-mysql'@'localhost' (using password: YES)

I thought about starting in safe mode and skip-grant-tables and realized that would be a really bad idea.

Jay Farschman

unread,
Jul 8, 2016, 2:01:39 PM7/8/16
to scalr-discuss
Okay this is strange,

It's working now.

Because I didn't know how to properly get into the mySQL DB I decided to halt scalr and then make a copy of the DB, export it to another "Lab" location, hack in and extract the information Igor needed from the updates table.  So, I did the following:

 
scalr-server-manage
supervisor> stop all

cp -Ra /opt/scalr-server/var/lib/mysql ~/

supervisor> start all

I then checked functionality, by spinning up 4 instances. I did not expect this to work, because it did not work before.... however, everything works now.  I have since created and destroyed a dozen instances.

RESOLUTION
"Have you tried turning it off and on again?" 

Marc O'Brien

unread,
Jul 8, 2016, 2:08:23 PM7/8/16
to scalr-discuss
Hi Jay,

Very glad to hear that you are up and running again.  Just let us know if you run in to anything else moving forward.

Cheers,
Wm. Marc O'Brien
Scalr Technical Support
Reply all
Reply to author
Forward
0 new messages