cb_restart_seed

136 views
Skip to first unread message

Yuna

unread,
Aug 3, 2016, 5:27:31 PM8/3/16
to cbtool-users
Error: "status: Command "~/cb_restart_seed.sh" failed to execute on hostname 172.31.41.202 after attempt 0. Will try 3 more times."

remotescripts log:

Aug  3 21:16:55 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Service "cassandra" was successfully restarted
Aug  3 21:16:55 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Enabling service "cassandra", with command "sudo update-rc.d -f cassandra defaults"...
Aug  3 21:16:55 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Restarting service "cassandra", with command "sudo service cassandra restart", attempt 1 of 7...
Aug  3 21:16:55 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: The exit code of "check_cassandra_cluster_state 172.31.41.202 1 1" was 1. Starting Cassandra service on this seed...
Aug  3 21:16:54 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Service "cassandra" was successfully restarted
Aug  3 21:16:54 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Enabling service "cassandra", with command "sudo update-rc.d -f cassandra defaults"...
Aug  3 21:16:54 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Restarting service "cassandra", with command "sudo service cassandra restart", attempt 1 of 7...
Aug  3 21:16:54 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: The exit code of "check_cassandra_cluster_state 172.31.41.202 1 1" was 1. Starting Cassandra service on this seed...
Aug  3 21:16:54 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Nodes registered on the cluster: 0 out of 2
Aug  3 21:16:53 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Nodes registered on the cluster: 0 out of 2
Aug  3 21:16:52 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.31.41.202 status"...
Aug  3 21:16:52 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.31.41.202 status"...
Aug  3 21:16:51 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Nodes registered on the cluster: 0 out of 2
Aug  3 21:16:51 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Nodes registered on the cluster: 0 out of 2Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Performing a quick check from ip-172-31-35-227 in order to decide on Cassandra restart
Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.31.41.202 status"...
Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Waiting for all nodes to become available...
Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: No VMs with the "cassandra" role have been found on this AI
Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: The VMs with the "seed" role on this AI has the following IPs: 172.31.41.202,172.31.35.227Aug  3 21:16:50 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Cassandra token is ""
Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Performing a quick check from ip-172-31-41-202 in order to decide on Cassandra restart
Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Waiting for all nodes to become available...
Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.31.41.202 status"...
Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: No VMs with the "cassandra" role have been found on this AI
Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: The VMs with the "seed" role on this AI has the following IPs: 172.31.41.202,172.31.35.227Aug  3 21:16:50 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Cassandra token is ""
Aug  3 21:16:43 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Updating vm application startup time with value 105
Aug  3 21:16:43 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Port 9160 on host 172.31.35.227 was NOT found open after 21 attempts!
Aug  3 21:16:43 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Cassandra service failed to start on this seedAug  3 21:16:43 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Updating vm application startup time with value 105
Aug  3 21:16:43 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Port 9160 on host 172.31.41.202 was NOT found open after 21 attempts!
Aug  3 21:16:43 ip-172-31-41-202.us-west-2.compute.internal  - ip-172-31-41-202 /home/ubuntu/cb_restart_seed.sh: Cassandra service failed to start on this seed
Aug  3 21:15:03 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Service "cassandra" was successfully restarted
Aug  3 21:15:03 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Enabling service "cassandra", with command "sudo update-rc.d -f cassandra defaults"...
Aug  3 21:15:03 ip-172-31-35-227.us-west-2.compute.internal  - ip-172-31-35-227 /home/ubuntu/cb_restart_seed.sh: Restarting service "cassandra", with command "sudo service cassandra restart", attempt 1 of 7...

Does this mean Cassandra isn't installed. I'm pretty sure it is because my VM says that it has it installed...

Yuna

unread,
Aug 3, 2016, 5:28:24 PM8/3/16
to cbtool-users
Also, this is running through SPEC Cloud, not just through cbtool.

Marcio Silva

unread,
Aug 4, 2016, 4:49:34 PM8/4/16
to cbtool-users
Hello,

If are having trouble with the deployment of the Cassandra cluster, you can attempt to debug it outside of SPECCloud first (i.e., directly on CBTOOL).

Basically you can go back to the CBTOOL CLI, run the command "appdev" and then the command "aiattach cassandra_ycsb". (as explained in https://github.com/ibmcb/cbtool/wiki/HOWTO:-Debug-initial-setup).

This will print out all commands that will be automatically executed on each VM, allowing you to run those manually and check what are the actual errors.

-------------------------------------------------------------
Marcio A. Silva, PhD.
Software Engineer
DataCenter Systems Software
IBM Thomas J. Watson Research Center
e-mail: mar...@us.ibm.com

Ivan Cuevas

unread,
Aug 18, 2016, 6:13:54 PM8/18/16
to cbtool-users
Hi Marcio,

I'm having the same error as Yuna. In the CBTOOL CLI I ran "aiattach cassandra_ycsb" and it creates me 3 instances:
1 YCSB - 172.24.243.22
2 seeds - 172.24.243.21, 172.24.243.23
And it fails when running the application-specific "setup" with:

status: Command "~/cb_restart_seed.sh" failed to execute on hostname 172.24.243.21 after attempt 2. Will try 1 more times.
status: Command "~/cb_restart_seed.sh" failed to execute on hostname 172.24.243.23 after attempt 2. Will try 1 more times.

So I ran the command "appdev" in the CBTOOL CLI and again "aiattach cassandra_ycsb" and I got the commands that are actually failing:

status: This is the command that would have been executed from the orchestrator on STEP 1 :
ssh  -i /home/cbuser/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser 172.24.243.21 "~/cb_restart_seed.sh"
status: This is the command that would have been executed from the orchestrator on STEP 1 :
ssh  -i /home/cbuser/osgcloud/cbtool/lib/auxiliary//../../credentials/cbtool_rsa  -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null  -o BatchMode=yes  -l cbuser 172.24.243.23 "~/cb_restart_seed.sh"

Then I ran manually those commands in my harness machine and it enter into a loop that only look for the Cassandra service in 1 seed node instead of looking in both nodes:

...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: The VMs with the "seed" role on this AI has the following IPs: 172.24.243.21,172.24.243.23
...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Port 9160 on host 172.24.243.21 was found open after 20 attempts
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Cassandra service running on seed 172.24.243.21
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Waiting for all nodes to become available...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.24.243.21 status"...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Nodes registered on the cluster: 1 out of 2
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.24.243.21 status"...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Nodes registered on the cluster: 1 out of 2
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.24.243.21 status"...
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Nodes registered on the cluster: 1 out of 2
<175> - cb-cbuser-myopenstack-vm2-seed-ai-1 /home/cbuser/cb_restart_seed.sh: Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 172.24.243.21 status"...
...

Any Idea of how can I made the "./cb_restart_seed.sh" also consider my second seed node?

Thank you

Ivan

Ivan Cuevas

unread,
Aug 19, 2016, 10:11:36 AM8/19/16
to cbtool-users
I think I found the error, it's a wrong configuration in my Cassandra/YCSB image.

Thank you

Ivan
Message has been deleted

Jean Renard

unread,
Jun 10, 2019, 12:03:20 PM6/10/19
to cbtool-users
Hello ,

i have the same isue, can you help me with this ?

Here is the resulte when i'm running the script cb_restart_seed.sh :

cbuser@ip-172-31-35-190:~$ ./cb_ycsb_common.sh 
open
port checker: host 172.31.5.149 is open.
cb_common.sh (7898): The JAVA_HOME was set to "auto". Attempting to find the most recent in /opt/ibm
ls: cannot access /opt/ibm/java-*: No such file or directory
cb_common.sh (7898): The JAVA_HOME was set to "auto". Attempting to find the most recent in /usr/lib/jvm
cb_common.sh (7898): JAVA_HOME determined to be "/usr/lib/jvm/java-7-openjdk-amd64/jre"
cb_common.sh (7898): Cassandra token is ""
cb_common.sh (7898): No VMs with the "cassandra" role have been found on this AI
cb_common.sh (7898): The VMs with the "seed" role on this AI has the following IPs: 18.191.77.187,3.14.10.101
cbuser@ip-172-31-35-190:~$ cassandra-cli -h 3.17.206.56 -f cassandra-cli_list_keyspace.cassandra
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:66)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:239)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:580)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 4 more
Exception connecting to 3.17.206.56/9160. Reason: Connection refused (Connection refused).
Not connected to a cassandra instance.
cbuser@ip-172-31-35-190:~$ ./cb_restart_seed.sh 
open
port checker: host 172.31.5.149 is open.
cb_common.sh (9032): The JAVA_HOME was set to "auto". Attempting to find the most recent in /opt/ibm
ls: cannot access /opt/ibm/java-*: No such file or directory
cb_common.sh (9032): The JAVA_HOME was set to "auto". Attempting to find the most recent in /usr/lib/jvm
cb_common.sh (9032): JAVA_HOME determined to be "/usr/lib/jvm/java-7-openjdk-amd64/jre"
cb_common.sh (9032): Cassandra token is ""
cb_common.sh (9032): No VMs with the "cassandra" role have been found on this AI
cb_common.sh (9032): The VMs with the "seed" role on this AI has the following IPs: 18.191.77.187,3.14.10.101
/etc/cassandra/jmxremote.password
cb_common.sh (9032): Performing a quick check from ip-172-31-35-190 in order to decide on Cassandra restart
cb_common.sh (9032): Waiting for all nodes to become available...
cb_common.sh (9032): Successfully contacted Cassandra with command "cassandra-cli -h 18.191.77.187 -f cassandra-cli_list_keyspace.cassandra" through node 18.191.77.187
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:66)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:239)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:580)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 4 more
Exception connecting to 18.191.77.187/9160. Reason: Connection refused (Connection refused).
cb_common.sh (9032): Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 3.17.206.56 status"...
nodetool: Failed to connect to '3.17.206.56:7199' - ConnectException: 'Connection refused (Connection refused)'.
cb_common.sh (9032): Nodes registered on the cluster: 0 out of 2
cb_common.sh (9032): Make sure that Keyspace "system" is present
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:66)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:239)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:580)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 4 more
Exception connecting to 18.191.77.187/9160. Reason: Connection refused (Connection refused).
cb_common.sh (9032): Obtaining the node list for this Cassandra cluster by running "nodetool -u cassandra -pw cassandra -h 3.17.206.56 status"...
nodetool: Failed to connect to '3.17.206.56:7199' - ConnectException: 'Connection refused (Connection refused)'.
cb_common.sh (9032): Nodes registered on the cluster: 0 out of 2
cb_common.sh (9032): Make sure that Keyspace "system" is present
org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.thrift.transport.TSocket.open(TSocket.java:187)
at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
at org.apache.cassandra.cli.CliMain.connect(CliMain.java:66)
at org.apache.cassandra.cli.CliMain.main(CliMain.java:239)
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:580)
at org.apache.thrift.transport.TSocket.open(TSocket.java:182)
... 4 more
Exception connecting to 18.191.77.187/9160. Reason: Connection refused (Connection refused).
cb_common.sh (9032): The exit code of "check_cassandra_cluster_state 3.17.206.56 1 1" was 1. Starting Cassandra service on this seed...
cb_common.sh (9032): Restarting service "cassandra", with command "sudo service cassandra restart", attempt 1 of 7...
 * Restarting Cassandra cassandra                                                                                                                                                                           [ OK ] 
cb_common.sh (9032): Service "cassandra" was successfully restarted
cb_common.sh (9032): Enabling service "cassandra", with command "sudo update-rc.d -f cassandra defaults"...
 System start/stop links for /etc/init.d/cassandra already exist.



Marcio Silva

unread,
Jun 10, 2019, 8:28:02 PM6/10/19
to cbtool-users
Hello,

From the messages, it looks like CB is trying configure Cassandra over a network where the port 9160 is not open/allowed (hence the "Connection Refused")... A couple of questions: a) which cloud are you using, b) do you have multiple vNICs on each instance (e.g., a "public" and a "private" network)?

Jean Renard

unread,
Jun 11, 2019, 7:59:21 AM6/11/19
to cbtool-users
Hello,

I'm using AWS EC2 so my instances have private IP and public IP.

Edit: It's the same when i try to  attach my hadoop AI. The master can't reach the 3 others slaves. I think this is a network issue but my security groups are ok and i have no firwall up.

Here is my cbuser_cloud_definitions.txt , maybe I did a mistake:

[VM_TEMPLATES : CLOUDOPTION_MYEC2]
CASSANDRA = size:t2.micro, imageid1:ami-0203885b365c106fb
HADOOPMASTER = size: t2.micro, imageid1:ami-0c6fc27f6ff1638fa
HADOOPSLAVE = size: t2.micro, imageid1:ami-0c6fc27f6ff1638fa
YCSB = size:t2.micro, imageid1:ami-0203885b365c106fb
SEED = size:t2.micro, imageid1:ami-0203885b365c106fb
[VM_DEFAULTS : EC2_CLOUDCONFIG ]
RUN_NETNAME = public
PROV_NETNAME = public


Thank You !



- Jean -

Marcio Silva

unread,
Jun 11, 2019, 2:35:08 PM6/11/19
to cbtool-users
Good good... I would recommend you use the "private" network for "RUN" (i.e., `RUN_NETNAME = private`, as per example in https://github.com/ibmcb/cbtool/wiki/FAQ-S#sq6), and make sure the security groups for private allow access on all TCP/UDP ports.

Regards,

Marcio

Jean Renard

unread,
Jun 11, 2019, 2:46:00 PM6/11/19
to cbtool-users

You're right, it was that. Thank you !
I just had now an error with the cb_start_load_manager.sh .It will never stop haha.

Marcio Silva

unread,
Jun 11, 2019, 10:38:56 PM6/11/19
to cbtool-users
Hmmm... interesting (and aggravating hehe)... can what kind of error? 

Jean Renard

unread,
Jun 12, 2019, 8:07:23 AM6/12/19
to cbtool-users
 I have this kind of error :

WARNING measure Problem creating the application: AI object 6589E52A-B7EF-5327-83D7-D7BE542752D2 (named "ai_5") could not be attached to this experiment: AI post-attachment operations failure: Parallel VM configuration for ai_5 failure (81717): Failure while executing application-specific configuration on on all VMs beloging to ai_5 (6589E52A-B7EF-5327-83D7-D7BE542752D2):
 
Parallel run os command operation failure: Giving up on executing command "~/cb_start_load_manager.sh" on hostname 52.14.181.57. Too many attempts (3).



Marcio Silva

unread,
Jun 17, 2019, 10:26:39 AM6/17/19
to cbtool-users
Ah, I see... Apologies for the long hiatus (I've been out due to paternity leave). Can you take a look at this https://github.com/ibmcb/cbtool/wiki/HOWTO:-Debug-initial-setup, and check if the the `appnoload` command (the very last outer bullet on that page), helps in debugging the problem? 
Message has been deleted

Jean Renard

unread,
Jul 2, 2019, 10:01:43 AM7/2/19
to cbtool-users
Hello,

I resolved the problem in cloud_definitions.txt . cbtool is working properly.
I have one more question : Are you familiar with SPEC CLOUD 2018 ?
because when i lauch i, I have this :

Creating new K-Means Experiment ID of test020719-KMEANS-BASELINE-0-20190702124529UTC...
2019-07-02 12:45:34,577 INFO measure ...Success.
2019-07-02 12:45:34,577 INFO measure Setting application parameters for workloadhadoop...
2019-07-02 12:45:34,577 INFO setKMeansConfig ...Setting parameter load_level to value 1
2019-07-02 12:45:34,595 INFO setKMeansConfig ...Setting parameter hadoopslave_data_dir to value /hadoopstore/hdfs/datanode
2019-07-02 12:45:34,614 INFO setKMeansConfig ...Setting parameter num_of_clusters to value 5
2019-07-02 12:45:34,632 INFO setKMeansConfig ...Setting parameter load_factor to value 1000000
2019-07-02 12:45:34,650 INFO setKMeansConfig ...Setting parameter hadoopmaster_data_dir to value /hadoopstore/hdfs/datanode
2019-07-02 12:45:34,669 INFO setKMeansConfig ...Setting parameter dfs_name_dir to value /hadoopstore/hdfs/namenode
2019-07-02 12:45:34,688 INFO setKMeansConfig ...Setting parameter regenerate_data to value True
2019-07-02 12:45:34,706 INFO setKMeansConfig ...Setting parameter java_home to value /usr/lib/jvm/java-8-openjdk-amd64
2019-07-02 12:45:34,724 INFO setKMeansConfig ...Setting parameter load_profile to value kmeans
2019-07-02 12:45:34,743 INFO setKMeansConfig ...Setting parameter workload to value hadoop
2019-07-02 12:45:34,760 INFO setKMeansConfig ...Setting parameter dimensions to value 20
2019-07-02 12:45:34,778 INFO setKMeansConfig ...Setting parameter samples_per_inputfile to value 500000
2019-07-02 12:45:34,796 INFO setKMeansConfig ...Setting parameter vapp_pattern to value simplehd
2019-07-02 12:45:34,814 INFO setKMeansConfig ...Setting parameter num_maps to value 8
2019-07-02 12:45:34,833 INFO setKMeansConfig ...Setting parameter sut to value hadoopmaster->5_x_hadoopslave
2019-07-02 12:45:34,851 INFO setKMeansConfig ...Setting parameter max_iteration to value 5
2019-07-02 12:45:34,869 INFO setKMeansConfig ...Setting parameter hadoop_home to value /usr/local/hadoop
2019-07-02 12:45:34,888 INFO setKMeansConfig ...Setting parameter num_reds to value 4
2019-07-02 12:45:34,906 INFO setKMeansConfig ...Setting parameter load_duration to value 5
2019-07-02 12:45:34,924 INFO setKMeansConfig Setting Virtual Application Submitter for parameters hadoop
2019-07-02 12:45:34,935 INFO measure ...Success.
2019-07-02 12:45:34,935 INFO measure Baseline run number0
2019-07-02 12:45:34,935 INFO measure Creating kmeans application instance...
2019-07-02 12:49:35,214 INFO measure ...Success. Application instance name: ai_1
2019-07-02 12:49:35,215 INFO measure ...List of Virtual Machines for this application instance:
2019-07-02 12:49:35,215 INFO measure ...Name | Role | Harness UUID | Deploy Time
2019-07-02 12:49:35,249 INFO measure ...Instance vm_1 | role = hadoopmaster | UUID = 6BC52739-FB69-5C0B-8E49-0D58B49B21B5 | Instance DeployTime = 33 | DeployTimeIncludingApp = 223
2019-07-02 12:49:35,274 INFO measure ...Instance vm_5 | role = hadoopslave | UUID = 6BEA7DA2-2354-5857-AA9A-827A646F0336 | Instance DeployTime = 37 | DeployTimeIncludingApp = 226
2019-07-02 12:49:35,298 INFO measure ...Instance vm_2 | role = hadoopslave | UUID = C22E9F49-E165-5F79-99D1-B4092874CA6E | Instance DeployTime = 43 | DeployTimeIncludingApp = 232
2019-07-02 12:49:35,324 INFO measure ...Instance vm_4 | role = hadoopslave | UUID = 25D666AB-154F-5B81-B97C-3D5299F331D7 | Instance DeployTime = 44 | DeployTimeIncludingApp = 232
2019-07-02 12:49:35,348 INFO measure ...Instance vm_3 | role = hadoopslave | UUID = FDC5D86E-018C-5132-A331-B9A2613E9AD7 | Instance DeployTime = 45 | DeployTimeIncludingApp = 235
2019-07-02 12:49:35,373 INFO measure ...Instance vm_6 | role = hadoopslave | UUID = 581FFB0F-D6C1-5EB3-BC37-F2AEA09AE4B7 | Instance DeployTime = 51 | DeployTimeIncludingApp = 240
2019-07-02 12:49:35,373 INFO measure Waiting forever minutes for application to complete.  CTRL-C to abort.
2019-07-02 12:49:35,424 INFO measure ...No results after 1 minutes. Sleeping for 60 seconds.
2019-07-02 12:50:35,535 INFO measure ...No results after 2 minutes. Sleeping for 60 seconds.
2019-07-02 12:51:35,643 INFO measure ...No results after 3 minutes. Sleeping for 60 seconds.
2019-07-02 12:52:35,752 INFO measure ...No results after 4 minutes. Sleeping for 60 seconds.
2019-07-02 12:53:35,859 INFO measure ...No results after 5 minutes. Sleeping for 60 seconds.
2019-07-02 12:54:35,920 INFO measure ...No results after 6 minutes. Sleeping for 60 seconds.
2019-07-02 12:55:36,005 INFO measure ...No results after 7 minutes. Sleeping for 60 seconds.
2019-07-02 12:56:36,115 INFO measure ...No results after 8 minutes. Sleeping for 60 seconds.
2019-07-02 12:57:36,222 INFO measure ...No results after 9 minutes. Sleeping for 60 seconds.
2019-07-02 12:58:36,326 INFO measure ...No results after 10 minutes. Sleeping for 60 seconds.
...
2019-07-02 13:59:41,577 INFO measure ...No results after 71 minutes. Sleeping for 60 seconds.
...



and I have this in cbuser_remotescripts.log:

Jul  2 13:50:58 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): Command line is: /home/cbuser/HiBench/kmeans/bin/run.sh. Output file is /tmp/tmp.wEua5a00o1 (LOAD_ID=2, AI_UUID=BD3261C3-9750-5213-992B-DC9C0B2EF0AA, VM_UUID=6BC52739-FB69-5C0B-8E49-0D58B49B21B5)
Jul  2 13:50:58 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): Command output will be shown
Jul  2 13:50:58 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): ========== running kmeans bench ==========
Jul  2 13:50:59 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): HADOOP_EXECUTABLE=/usr/local/hadoop/bin/hadoop
Jul  2 13:50:59 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): HADOOP_CONF_DIR=/usr/local/hadoop/
Jul  2 13:50:59 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): HADOOP_EXAMPLES_JAR=/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.5.jar
Jul  2 13:51:02 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): rm: `/HiBench/KMeans/Output-comp': No such file or directory
Jul  2 13:51:04 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): du: `/HiBench/KMeans/Input-comp/cluster': No such file or directory
Jul  2 13:51:05 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): /home/cbuser/HiBench/kmeans/bin/run.sh: line 39: +: syntax error: operand expected (error token is "+")
Jul  2 13:51:05 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Jul  2 13:51:05 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR=/usr/local/hadoop/
Jul  2 13:51:05 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): MAHOUT-JOB: /home/cbuser/HiBench/common/mahout-distribution-0.7-hadoop2/examples/target/mahout-examples-0.7-job.jar
Jul  2 13:51:07 ip-172-31-40-166_172-31-40-166 cloudbench test020719KMEANSBASELINE020190702124529UTC cb_common.sh (25618): 2019-07-02 13:51:07,789 WARN  [main] driver.MahoutDriver (MahoutDriver.java:addClass(239)) - Unable to add class: org.apache.mahout.clustering.kmeans.GenKMeansDataset



Thank You !


PS : Sorry for the delay, I was traveling. ( and congratulations Dad :D )


Michael R. Hines

unread,
Jul 2, 2019, 10:30:36 AM7/2/19
to Jean Renard, cbtool-users

I can take over helping, Marcio. Go take care of your baby. =)

Both Marcio and I directly work with Spec Cloud 2018. We have a support channel dedicated to SPEC Cloud on the spec website via a slack clone called "MatterMost".

Would you like to move the conversation over there? I can send you an invite to the channel?

/*
 * Michael R. Hines
 * Staff Engineer, DigitalOcean.
 */
--
You received this message because you are subscribed to the Google Groups "cbtool-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbtool-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cbtool-users/06d3fb78-166f-4de2-9398-9f374317e0ba%40googlegroups.com.

Jean Renard

unread,
Jul 2, 2019, 10:47:47 AM7/2/19
to cbtool-users
Hello,


Yes, You can send me an invitation please.


Thank You.

Michael R. Hines

unread,
Jul 2, 2019, 12:07:19 PM7/2/19
to Jean Renard, cbtool-users

Invitation sent. Once you've logged into MatterMost,
we'll get you into the support channel.

If you continue to have cbtool-specific issues that are strictly
related to CB, that's fine too and we can continue the discussion here.

- Michael

--
You received this message because you are subscribed to the Google Groups "cbtool-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cbtool-users...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages