Cloudbench Instance Failures after Successful Bootstrap

83 views
Skip to first unread message

James Scollard

unread,
Feb 23, 2016, 12:26:10 PM2/23/16
to cbtool-users
I am still trying to get a working cbtool setup and am getting a new error on my new Ubuntu Xenial images:

Feb 23 17:10:51 cloudbench-server-1 [2016-02-23 17:10:51,761] [DEBUG] shared_functions.py/OskCmds.wait_for_instance_boot TEST_root - Check if the VM "MYOPENSTACK" (vm_3) is booted by attempting to establish a TCP connection to port 22 on address 172.16.18.94
Feb 23 17:10:51 cloudbench-server-1 [2016-02-23 17:10:51,764] [DEBUG] shared_functions.py/OskCmds.wait_for_instance_boot TEST_root - vm_3 (cloud-assigned uuid fe0cb6c3-8e36-44e9-b5f1-29ff2b82307b) is network reachable (boot process finished successfully)
Feb 23 17:10:53 cloudbench-server-1 [2016-02-23 17:10:53,393] [DEBUG] osk_cloud_ops.py/OskCmds.vmcreate TEST_root - vm_3 (cloud-assigned uuid fe0cb6c3-8e36-44e9-b5f1-29ff2b82307b) was successfully created on OpenStack Cloud "MYOPENSTACK".
Feb 23 17:10:53 cloudbench-server-1 [2016-02-23 17:10:53,427] [DEBUG]  status: Checking ssh accessibility on vm_3 (ubu...@172.16.18.94)...
Feb 23 17:10:53 cloudbench-server-1 [2016-02-23 17:10:53,427] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_root - Checking ssh accessibility on vm_3 (ubu...@172.16.18.94)...
Feb 23 17:10:54 cloudbench-server-1 [2016-02-23 17:10:54,427] [DEBUG]  status: Bootstrapping vm_3 (creating file cb_os_paramaters.txt in "ubuntu" user's home dir on 172.16.18.94)...
Feb 23 17:10:54 cloudbench-server-1 [2016-02-23 17:10:54,427] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_root - Bootstrapping vm_3 (creating file cb_os_paramaters.txt in "ubuntu" user's home dir on 172.16.18.94)...
Feb 23 17:11:04 cloudbench-server-1 [2016-02-23 17:11:04,243] [DEBUG]  status: Sending a copy of the code tree to vm_3 (172.16.18.94)...
Feb 23 17:11:04 cloudbench-server-1 [2016-02-23 17:11:04,243] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_root - Sending a copy of the code tree to vm_3 (172.16.18.94)...
Feb 23 17:11:11 cloudbench-server-1 [2016-02-23 17:11:11,190] [DEBUG]  status: Performing generic VM post_boot configuration on vm_3 (172.16.18.94)...
Feb 23 17:11:11 cloudbench-server-1 [2016-02-23 17:11:11,190] [DEBUG] active_operations.py/ActiveObjectOperations.post_attach_vm TEST_root - Performing generic VM post_boot configuration on vm_3 (172.16.18.94)...
Feb 23 17:13:38 cloudbench-server-1 [2016-02-23 17:13:38,896] [ERROR] active_operations.py/ActiveObjectOperations.objattach TEST_root - VM object 9D6E7C77-5667-5FDA-A4D2-336236195B13 (named "vm_3") could not be attached to this experiment: VM post-attachment operations failure: Error while executing the command line "~/cbtool/scripts/common/cb_post_boot.sh" (returncode = 5001) :Warning: Permanently added '172.16.18.94' (ECDSA) to the list of known hosts.
Feb 23 17:13:39 cloudbench-server-1 [2016-02-23 17:13:39,177] [DEBUG]  status: Sending a termination request for Instance "vm_3" (cloud-assigned uuid fe0cb6c3-8e36-44e9-b5f1-29ff2b82307b)....
Feb 23 17:13:39 cloudbench-server-1 [2016-02-23 17:13:39,177] [DEBUG] osk_cloud_ops.py/OskCmds.vmdestroy TEST_root - Sending a termination request for Instance "vm_3" (cloud-assigned uuid fe0cb6c3-8e36-44e9-b5f1-29ff2b82307b)....
Feb 23 17:18:26 cloudbench-server-1 [2016-02-23 17:18:26,645] [DEBUG] osk_cloud_ops.py/OskCmds.vvdestroy TEST_root - Volume previously attached to the vm_3 (cloud-assigned uuid none) was successfully destroyed on OpenStack Cloud "MYOPENSTACK".
Feb 23 17:18:26 cloudbench-server-1 [2016-02-23 17:18:26,646] [DEBUG] osk_cloud_ops.py/OskCmds.vmdestroy TEST_root - vm_3 (cloud-assigned uuid fe0cb6c3-8e36-44e9-b5f1-29ff2b82307b) was successfully destroyed on OpenStack Cloud "MYOPENSTACK".
Feb 23 17:18:26 cloudbench-server-1 [2016-02-23 17:18:26,683] [ERROR] cbact/main  - Operation "vm-attach" failure: VM object 9D6E7C77-5667-5FDA-A4D2-336236195B13 (named "vm_3") could not be attached to this experiment: VM post-attachment operations failure: Error while executing the command line "~/cbtool/scripts/common/cb_post_boot.sh" (returncode = 5001) :Warning: Permanently added '172.16.18.94' (ECDSA) to the list of known hosts.

Unfortunately CBTool deletes the instance on a failure so I cant see what happened by logging into it and troubleshooting.

Michael R. Hines

unread,
Feb 23, 2016, 2:39:12 PM2/23/16
to James Scollard, cbtool-users
The reason for the error is located in a different log file:
/var/log/cloudbench/XXXXX_remotescripts.log

This happens all the time when users make images from scratch ----- no
biggie, we can help you.

Open up that log file and scan through it quickly......

Then, to reproduce the problem, just CTRL-Z cbtool (stop the process)
when you see the "cb_post_boot.sh" log message, and then login to the VM
and try to run the script manually, like this:

$ ~/cbtool/scripts/common/cb_post_boot.sh

At that point, the culprit error should appear (if you didn't already
see it in the log file).

/*
* Michael R. Hines
* Platform Engineer, DigitalOcean.
*/

Michael R. Hines

unread,
Feb 23, 2016, 2:41:45 PM2/23/16
to James Scollard, cbtool-users
You don't need to wait for the error to CTRL-Z the command line, just as soon as you see something like this:

status: Performing generic VM post_boot configuration on vm_1 (10.9.0.42)...

That means the VM's already online and cloudbench has finished the rsync of all the code to the VM

........at that point, just CTRL-C cloudbench and login to the VM and run the command below that I mentioned.....


/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */
On 02/23/2016 01:33 PM, James Scollard wrote:
I have tried several times and just dont get the notification that it hit an error in time to stop the service before the termination request is sent.

On Tue, Feb 23, 2016 at 12:39 PM, Michael R. Hines <mhi...@digitalocean.com> wrote:
The reason for the error is located in a different log file: /var/log/cloudbench/XXXXX_remotescripts.log

This happens all the time when users make images from scratch ----- no biggie, we can help you.

Open up that log file and scan through it quickly......

Then, to reproduce the problem, just CTRL-Z cbtool (stop the process) when you see the "cb_post_boot.sh" log message, and then login to the VM and try to run the script manually, like this:

$ ~/cbtool/scripts/common/cb_post_boot.sh

At that point, the culprit error should appear (if you didn't already see it in the log file).

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

On 02/23/2016 11:26 AM, James Scollard wrote:

James Scollard

unread,
Feb 23, 2016, 3:23:39 PM2/23/16
to cbtool-users, spyde...@gmail.com
Pulling Marcio's changes from the repo helps:

Client - Fixes Ubuntu Xenial (maybe others) package install commands for FIO

Server - Once cb is attached, 

vmdev (shows the commends that would be executed.  For example:

(MYOPENSTACK) vmdev
The global object "vm_defaults" on Cloud MYOPENSTACK was modified:
|"sub-attribute" (key)                |old value                          |new value                          
|check_boot_complete                  |tcp_on_22                          |wait_for_0                         
|transfer_files                       |True                               |false                              
|run_generic_scripts                  |True                               |false                              
|debug_remote_commands                |False                              |true                               

(MYOPENSTACK) vmattach MYOPENSTACK
Usage: vmattach <cloud_name> <role> [vm_location = auto] [meta_tags = empty] [size = default] [pause_step = none] [temp_attr_list = empty=empty] [mode] 
(MYOPENSTACK) vmattach MYOPENSTACK fio
 status: Starting an instance on OpenStack, using the imageid "cloudbench-client-base" (<Image: cloudbench-client-base> ) and size "GP3-Medium" (<Flavor: GP3-Medium>), connected to networks "cloudbench-shared", on VMC "ap-tokyo-1", under tenant "default" (ssh key is "root_default_cbtool_rsa" and userdata is "auto")
 status: Waiting for vm_1 (cloud-assigned uuid 27427e93-620c-47e2-9a30-c69de0047659) to start...
 status: Trying to establish network connectivity to vm_1 (cloud-assigned uuid 27427e93-620c-47e2-9a30-c69de0047659), on IP address 172.16.18.105...
 status: Checking ssh accessibility on vm_1 (ssh ubu...@172.16.18.105)...
 status: This is the command that would have been executed from the orchestrator : 
         ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "/bin/true"
 status: Bootstrapping vm_1 (creating file cb_os_paramaters.txt in "ubuntu" user's home dir on 172.16.18.105)...
 status: This is the command that would have been executed from the orchestrator : 
         ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "mkdir -p /home/ubuntu/cbtool;echo '#OSKN-redis' > /home/ubuntu/cb_os_parameters.txt;echo '#OSHN-172.16.18.89' >> /home/ubuntu/cb_os_parameters.txt;echo '#OSPN-6379' >>  /home/ubuntu/cb_os_parameters.txt;echo '#OSDN-0' >>  /home/ubuntu/cb_os_parameters.txt;echo '#OSTO-30000' >>  /home/ubuntu/cb_os_parameters.txt;echo '#OSCN-MYOPENSTACK' >>  /home/ubuntu/cb_os_parameters.txt;echo '#OSMO-controllable' >>  /home/ubuntu/cb_os_parameters.txt;echo '#OSOI-TEST_root:MYOPENSTACK' >>  /home/ubuntu/cb_os_parameters.txt;echo '#VMUUID-2550FA4D-202B-59C5-A7B8-67466948DF4B' >>  /home/ubuntu/cb_os_parameters.txt;sudo chown -R ubuntu:ubuntu /home/ubuntu/cb_os_parameters.txt;sudo chown -R ubuntu:ubuntu  /home/ubuntu/cbtool"
 status: Sending a copy of the code tree to vm_1 (172.16.18.105)...
 status: This is the command that would have been executed from the orchestrator : 
         rsync -e "ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu " --exclude-from '/opt/cbtool/lib/auxiliary//../../exclude_list.txt' -az --delete --no-o --no-g --inplace -O /opt/cbtool/lib/auxiliary//../../* 172.16.18.105:~/cbtool/
 status: Performing generic VM post_boot configuration on vm_1 (172.16.18.105)...
 status: This is the command that would have been executed from the orchestrator : 
         ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "~/cbtool/scripts/common/cb_post_boot.sh"
VM object 2550FA4D-202B-59C5-A7B8-67466948DF4B (named "vm_1") sucessfully attached to this experiment. It is ssh-accessible at the IP address 172.16.18.105 (cb-root-MYOPENSTACK-vm1-fio).
(MYOPENSTACK)

Executing these in a separate terminal show me that I needed a secgroup allow rule when I got to the one that was failing:

  1. root@cloudbench-server-1:~# ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "~/cbtool/scripts/common/cb_post_boot.sh"
  2. Warning: Permanently added '172.16.18.105' (ECDSA) to the list of known hosts.
  3. Unable to connect to tcp port None on host 172.16.18.89: timed out
  4. closed
  5. port checker: host 172.16.18.89 port not open yet...
  6. Unable to connect to tcp port None on host 172.16.18.89: timed out
  7. closed
  8. port checker: host 172.16.18.89 port not open yet...
  9. Unable to connect to tcp port None on host 172.16.18.89: timed out
  10. closed
  11. port checker: host 172.16.18.89 port not open yet...
  12. Unable to connect to tcp port None on host 172.16.18.89: timed out
  13. closed
  14. port checker: host 172.16.18.89 port not open yet...
  15. Unable to connect to tcp port None on host 172.16.18.89: timed out
  16. closed
  17. port checker: host 172.16.18.89 port 6379 could not be reached. Dying now.
After allowing hosts on the client network into redis with the secgroup chage:

  1. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Service "ntp" failed to restart after 7 attempts
  2. root@cloudbench-server-1:~# ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "~/cbtool/scripts/common/cb_post_boot.sh"
  3. Warning: Permanently added '172.16.18.105' (ECDSA) to the list of known hosts.
  4. open
  5. port checker: host 172.16.18.89 is open.
  6. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting generic VM post_boot configuration
  7. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: VMs need to be able to perform passwordless SSH between each other. Updating ~/.ssh/id_rsa to be the same on all VMs..
  8. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Relaxing all security configurations
  9. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Stopping service "ufw" with command "sudo service ufw stop"...
  10. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Disabling service "ufw" with command "sudo sh -c 'echo manual > /etc/init/ufw.override'"...
  11. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Disabling Apparmor...
  12. /home/ubuntu/cbtool/scripts/common/.//cb_common.sh: line 978: service_stop_disable_apparmor: command not found
  13.  * Unloading AppArmor profiles
  14.    ...done.
  15. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Done
  16. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting (AI) Log store...
  17. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Local (AI) Log store started
  18. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting (AI) Object store...
  19. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Stopping service "redis-server" with command "sudo systemctl stop redis-server"...
  20. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Disabling service "redis-server" with command "sudo systemctl disable redis-server"...
  21. Synchronizing state of redis-server.service with SysV init with /lib/systemd/systemd-sysv-install...
  22. Executing /lib/systemd/systemd-sysv-install disable redis-server
  23. insserv: warning: current start runlevel(s) (empty) of script `redis-server' overrides LSB defaults (2 3 4 5).
  24. insserv: warning: current stop runlevel(s) (0 1 2 3 4 5 6) of script `redis-server' overrides LSB defaults (0 1 6).

Now the only thing left that looks strange is that I am getting these:


Failed to restart ntp.service: Unit ntp.service not found.
<175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Restarting service "ntp", with command "sudo service ntp restart", attempt 6 of 7...

Thanks.

Michael R. Hines

unread,
Feb 23, 2016, 3:37:28 PM2/23/16
to James Scollard, cbtool-users
Excellent. Now all you need to do is install NTP instead your VM and the systemd ntp service should be available.

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

James Scollard

unread,
Feb 26, 2016, 9:40:24 AM2/26/16
to cbtool-users, spyde...@gmail.com
Good to go.  Thanks!

James Scollard

unread,
Feb 29, 2016, 9:22:53 AM2/29/16
to cbtool-users, spyde...@gmail.com
Apparently I spoke too soon.  After a reload (./cb -x) I am back to getting the ssh key errors:


Thanks.
...

Marcio A Silva

unread,
Feb 29, 2016, 10:26:29 AM2/29/16
to James Scollard, cbtool-users

Hello James,

Just checked your pastebin output. Additional questions:

1) Can you confirm that the contents of OpenStack public key "root_default_cbtool_rsa" (checked with "nova keypair-show root_default_cbtool_rsa") and the contents of cbtool/credentials/cbtool_rsa.pub are the same?

2) Can you confirm, on a newly booted instance (please use the "vmdev" command on CB's CLI before attaching a new VM), that this public key was indeed injected (by cloud-init) on /home/ubuntun/.ssh/authorized_keys?

Regards,

Marcio

-------------------------------------------------------------
Marcio A. Silva, PhD.
Software Engineer
DataCenter Systems Software
IBM Thomas J. Watson Research Center
Yorktown Heights NY 10598-0218
phone: 1-914-945-2911, fax: 1-914-945-4254
e-mail: mar...@us.ibm.com

Inactive hide details for James Scollard ---02/29/2016 09:23:02---Apparently I spoke too soon.  After a reload (./cb -x) I am bJames Scollard ---02/29/2016 09:23:02---Apparently I spoke too soon. After a reload (./cb -x) I am back to getting the ssh key errors:

          1. root@cloudbench-server-1:~# ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "~/cbtool/scripts/common/cb_post_boot.sh"
          2. Warning: Permanently added '172.16.18.105' (ECDSA) to the list of known hosts.
          3. Unable to connect to tcp port None on host 172.16.18.89: timed out
          4. closed
          5. port checker: host 172.16.18.89 port not open yet...
          6. Unable to connect to tcp port None on host 172.16.18.89: timed out
          7. closed
          8. port checker: host 172.16.18.89 port not open yet...
          9. Unable to connect to tcp port None on host 172.16.18.89: timed out
          10. closed
          11. port checker: host 172.16.18.89 port not open yet...
          12. Unable to connect to tcp port None on host 172.16.18.89: timed out
          13. closed
          14. port checker: host 172.16.18.89 port not open yet...
          15. Unable to connect to tcp port None on host 172.16.18.89: timed out
          16. closed
          17. port checker: host 172.16.18.89 port 6379 could not be reached. Dying now.

        After allowing hosts on the client network into redis with the secgroup chage:

          1. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Service "ntp" failed to restart after 7 attempts
          2. root@cloudbench-server-1:~# ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l ubuntu 172.16.18.105 "~/cbtool/scripts/common/cb_post_boot.sh"
          3. Warning: Permanently added '172.16.18.105' (ECDSA) to the list of known hosts.
          4. open
          5. port checker: host 172.16.18.89 is open.
          6. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting generic VM post_boot configuration
          7. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: VMs need to be able to perform passwordless SSH between each other. Updating ~/.ssh/id_rsa to be the same on all VMs..
          8. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Relaxing all security configurations
          9. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Stopping service "ufw" with command "sudo service ufw stop"...
          10. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Disabling service "ufw" with command "sudo sh -c 'echo manual > /etc/init/ufw.override'"...
          11. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Disabling Apparmor...
          12. /home/ubuntu/cbtool/scripts/common/.//cb_common.sh: line 978: service_stop_disable_apparmor: command not found
          13.  * Unloading AppArmor profiles
          14.    ...done.
          15. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Done
          16. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting (AI) Log store...
          17. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Local (AI) Log store started
          18. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Starting (AI) Object store...
          19. <175> - cb-root-myopenstack-vm1-fio /home/ubuntu/cbtool/scripts/common/cb_post_boot.sh: Stopping service "redis-server" with command "sudo systemctl stop redis-server"...
    ...

James Scollard

unread,
Feb 29, 2016, 1:37:39 PM2/29/16
to cbtool-users, spyde...@gmail.com, mar...@us.ibm.com
Since i can no longer SSH to instances that CB is creating with its key that it created and manages there is no way to see what the current error is.  Why do SSH keys keep breaking?  Is this normal?  We haven't made any changes.

Thanks.

Michael R. Hines

unread,
Feb 29, 2016, 3:12:24 PM2/29/16
to James Scollard, cbtool-users, mar...@us.ibm.com
No, it's not normal. Something had to change.

There's nothing in the tool that auto-generates keys, so if it was configured correctly before, it should remain configured correctly forever, even across resets.

(Also, be careful when you use ./cb -x, because that throws away performance data that in mongodb. If you just want to reset your cloud, then use ./cb -f # instead)

Would you mind please double-checking your environment and the key pairs that are being used?

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

Marcio A Silva

unread,
Feb 29, 2016, 4:12:08 PM2/29/16
to James Scollard, cbtool-users

Hello James,

Is it possible for you access the instances through the (VNC) console (e.g., in Horizon)?

Assuming that the contents of both "root_default_cbtool_rsa" and cbtool/credentials/cbtool_rsa.pub are the same, and that the instance was indeed booted with the "root_default_cbtool_rsa" pubkey, I can only conceive the problem as occurring on the instance initialization.

Since you asked, no, pubkey ssh problems are not normal at all. Usually, ssh pubkey injection through cloud-init "just works" in OpenStack and there isn't any need to further debug.


Regards,

Marcio

-------------------------------------------------------------
Marcio A. Silva, PhD.
Software Engineer
DataCenter Systems Software
IBM Thomas J. Watson Research Center
Yorktown Heights NY 10598-0218
phone: 1-914-945-2911, fax: 1-914-945-4254
e-mail: mar...@us.ibm.com

Inactive hide details for James Scollard ---02/29/2016 13:37:54---Since i can no longer SSH to instances that CB is creating wiJames Scollard ---02/29/2016 13:37:54---Since i can no longer SSH to instances that CB is creating with its key that it created and manages



From: James Scollard <spyde...@gmail.com>
To: cbtool-users <cbtool...@googlegroups.com>
Cc: spyde...@gmail.com, Marcio A Silva/Watson/IBM@IBMUS
Date: 02/29/2016 13:37
Subject: Re: Cloudbench Instance Failures after Successful Bootstrap
Sent by: cbtool...@googlegroups.com





Since i can no longer SSH to instances that CB is creating with its key that it created and manages there is no way to see what the current error is.  Why do SSH keys keep breaking?  Is this normal?  We haven't made any changes.

Thanks.

On Monday, February 29, 2016 at 10:26:29 AM UTC-5, Marcio A Silva wrote:

    Hello James,

    Just checked your pastebin output. Additional questions:

    1) Can you confirm that the contents of OpenStack public key "
    root_default_cbtool_rsa" (checked with "nova keypair-show root_default_cbtool_rsa") and the contents of cbtool/credentials/cbtool_rsa.pub are the same?

    2) Can you confirm, on a newly booted instance (please use the "vmdev" command on CB's CLI before attaching a new VM), that this public key was indeed injected (by cloud-init) on /home/ubuntun/.ssh/authorized_keys?

    Regards,

    Marcio

    -------------------------------------------------------------
    Marcio A. Silva, PhD.
    Software Engineer
    DataCenter Systems Software
    IBM Thomas J. Watson Research Center
    Yorktown Heights NY 10598-0218
    phone: 1-914-945-2911, fax: 1-914-945-4254
    e-mail:
    mar...@us.ibm.com

James Scollard

unread,
Feb 29, 2016, 4:17:08 PM2/29/16
to cbtool-users, spyde...@gmail.com, mar...@us.ibm.com
In my config:

OSK_LOGIN = ubuntu                                         # The username that logins on the VMs
OSK_NETNAME = cloudbench-shared
OSK_KEY_NAME = cbtool_rsa

everything works fine until after the bootstrap, then once it has successfully ssh connected in and reconfigured the instance it can no longer connect.

Here is what it says on ai attachment:

(MYOPENSTACK) aiattach fio
 status: Starting an instance on OpenStack, using the imageid "cloudbench-client-base" (<Image: cloudbench-client-base> ) and size "GP3-Medium" (<Flavor: GP3-Medium>), connected to networks "cloudbench-shared", on VMC "ap-tokyo-1", under tenant "default" (ssh key is "root_default_cbtool_rsa" and userdata is "auto")
 status: Waiting for vm_1 (cloud-assigned uuid a7d5b2c6-77d3-4257-bc26-2bfda6f487d0) to start...
status: Trying to establish network connectivity to vm_1 (cloud-assigned uuid a7d5b2c6-77d3-4257-bc26-2bfda6f487d0), on IP address 172.16.18.117...
 status: Checking ssh accessibility on vm_1 (ssh ubu...@172.16.18.117)...
 status: Bootstrapping vm_1 (creating file cb_os_paramaters.txt in "ubuntu" user's home dir on 172.16.18.117)...
 status: Sending a copy of the code tree to vm_1 (172.16.18.117)...
 status: Performing generic application instance post_boot configuration on all VMs belonging to ai_1...
 status: Command "~/cbtool/scripts/common/cb_post_boot.sh" failed to execute on hostname 172.16.18.117 after attempt 0. Will try 3000 more times.
...

CBTool is able to ssh and then breaks ssh apparently?  I am able to ssh to the instance this time manually:

ssh  -i /opt/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/nul -o BatchMode=yes  -l ubuntu 172.16.18.117 "~/cbtool/scripts/common/cb_post_boot.sh"
...


I really dont understand what it is trying to do so its kind of hard to debug.  I have a presentation on this thing Thursday, so hopefully i can get results for at least 1 test by then. 

:P

Michael R. Hines

unread,
Feb 29, 2016, 4:19:04 PM2/29/16
to James Scollard, cbtool-users, mar...@us.ibm.com
The log shows that you're failing on NTP. Did you get NTP installed successfully inside the image?

- Michael

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

James Scollard

unread,
Feb 29, 2016, 4:28:17 PM2/29/16
to cbtool-users, spyde...@gmail.com, mar...@us.ibm.com
I can definitely add that to the list, but after 2 months I still don't have any data, so unless you are asserting that not installing NTP is causing the instances to not accept SSH keys I'm not too concerned about it at the moment.  If I ever get a working test completed I plan to add NTP and spawn a few more images with other test suites on them to support the full battery of tests.

Thanks.

Michael R. Hines

unread,
Feb 29, 2016, 4:29:20 PM2/29/16
to James Scollard, cbtool-users, mar...@us.ibm.com
No, your SSH keys have nothing to do with the problem, according the the log message that you showed on pastebin.

You're just missing NTP, and that's why it's failing.

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

James Scollard

unread,
Feb 29, 2016, 4:34:39 PM2/29/16
to cbtool-users, spyde...@gmail.com, mar...@us.ibm.com
I will see what i need to do to run NTPd on these.  Shouldnt all required software installations be included in the ./install --wks <workload> command when images are created?  Is there a reson that NTP isnt installed when everything else is?

Thanks.
...

Michael R. Hines

unread,
Feb 29, 2016, 4:38:31 PM2/29/16
to James Scollard, cbtool-users, mar...@us.ibm.com
It is in the dependency list, but it looks like there may be a bug there.

mrhines@mahler:~/do/cbtool$ grep -i ntp ./configs/templates/PUBLIC_dependencies.txt
ntp-order = 6
ntp-install = pm
ntp-configure = ntpd -V 2>&1 | grep Ver | cut -d ' ' -f 8
ntp-ver = ANY
rhel-ntp-install-pm = package_install ntp
fedora-ntp-install-pm = package_install ntp
ubuntu-ntp-install-pm = package_install ntpd

So, to remedy the issue, you'll need to install it anyway. (Or ask cloud-init to do it).

Would you mind filing an issue on github?

/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */

James Scollard

unread,
Mar 1, 2016, 2:14:07 PM3/1/16
to cbtool-users, spyde...@gmail.com, mar...@us.ibm.com
Manually installing ntp on the image cleared up the issue and images are functional now.  I shoud add that we didnt ad dthe nullworkload workload to the image because we didnt think it was needed, but apparently if nullworkload had been installed it would have installed that as a dependency for us automatically.
...

Michael R. Hines

unread,
Mar 1, 2016, 2:28:39 PM3/1/16
to James Scollard, cbtool-users, mar...@us.ibm.com
Ok, great. Yeah, using nullworkload shouldn't have been necessary --- that's a bug.


/*
 * Michael R. Hines
 * Platform Engineer, DigitalOcean.
 */
Reply all
Reply to author
Forward
0 new messages