ipmitool not called after running razor reboot-node

76 views
Skip to first unread message

Mike Citrix

unread,
Mar 18, 2015, 6:07:53 PM3/18/15
to puppet...@googlegroups.com
Reboot via IPMI was working at some point.  (we are using razor ver 0.15.0)

However, now when we run razor reboot-node it never calls the ipmitool

The command line call and output:

razor reboot-node --name=node1

   result: reboot request queued


If you check the command status at http://192.168.1.1:8080/api/collections/commands/2171:

{"spec":"http://api.puppetlabs.com/razor/v1/collections/commands/member","id":"http://192.168.1.1:8080/api/collections/commands/2171","name":"2171","command":"reboot-node","params":{"name":"node1"},"errors":[],"status":"finished","submitted_at":"2015-03-18T17:15:40-04:00","finished_at":"2015-03-18T17:15:40-04:00"}


But the server does not reboot

On the same razor system, if I run:
ipmitool -I lanplus  -H 192.168.2.2 -U admin -P password chassis power reset

Then the target server reboots immediately

I created a shell script called ipmitool to use as a wrapper and to log every call to ipmitool

I can see that razor is periodically checking the power state of some servers, but I never see our calls to reboot any server

Scott McClellan

unread,
Mar 19, 2015, 1:24:35 AM3/19/15
to puppet...@googlegroups.com
Hi Mike,

Thanks for the report. Could you check for me if there are any errors in server.log or console.log when you run reboot-node? The other possibility is that the Torquebox/HornetQ queue is bogged down with something, preventing the asynchronous ipmitool call from occurring. Either way, those log files will definitely help us track this down.

Scott

--
You received this message because you are subscribed to the Google Groups "puppet-razor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-razor...@googlegroups.com.
To post to this group, send email to puppet...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-razor.
For more options, visit https://groups.google.com/d/optout.



--
Join us at PuppetConf 2015, October 5-9 in Portland, OR - http://2015.puppetconf.com.  
Register early to save 40%!

Mike Citrix

unread,
Mar 19, 2015, 11:04:14 AM3/19/15
to puppet...@googlegroups.com
Hi Scott,

Thanks for your reply!   Here's an excerpt from the server.log after issuing reboot-node:

10:22:06,364 INFO  [razor.web.log] (http-/0.0.0.0:8080-2) 127.0.0.1 - - [19/Mar/2015 10:22:06] "GET /api " 200 4962 0.0670
10:22:06,806 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) CREATE TABLE IF NOT EXISTS "schema_info" ("version" integer DEFAULT 0 NOT NULL)
10:22:06,814 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT * FROM "schema_info" LIMIT 1
10:22:06,820 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT 1 AS "one" FROM "schema_info" LIMIT 1
10:22:06,828 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT count(*) AS "count" FROM "schema_info" LIMIT 1
10:22:06,836 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:06,843 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:06,854 INFO  [razor.web.log] (http-/0.0.0.0:8080-2) 127.0.0.1 - - [19/Mar/2015 10:22:06] "GET /api/commands/reboot-node " 200 2308 0.0540
10:22:06,889 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) CREATE TABLE IF NOT EXISTS "schema_info" ("version" integer DEFAULT 0 NOT NULL)
10:22:06,897 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT * FROM "schema_info" LIMIT 1
10:22:06,906 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT 1 AS "one" FROM "schema_info" LIMIT 1
10:22:06,913 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT count(*) AS "count" FROM "schema_info" LIMIT 1
10:22:06,921 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:06,928 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:06,938 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT * FROM "nodes" WHERE ("name" = 'node15') LIMIT 1
10:22:06,951 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT * FROM "nodes" WHERE ("name" = 'node15') LIMIT 1
10:22:06,975 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) BEGIN
10:22:06,983 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) INSERT INTO "commands" ("command", "submitted_by", "submitted_at", "status", "finished_at", "params") VALUES ('reboot-node', NULL, '2015-03-19 10:22:06.949000-0400', 'finished', '2015-03-19 10:22:06.973000-0400', '{"name":"node15"}') RETURNING *
10:22:06,999 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) COMMIT
10:22:07,007 INFO  [razor.web.log] (http-/0.0.0.0:8080-2) 127.0.0.1 - - [19/Mar/2015 10:22:07] "POST /api/commands/reboot-node " 202 98 0.1220
10:22:19,883 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) CREATE TABLE IF NOT EXISTS "schema_info" ("version" integer DEFAULT 0 NOT NULL)
10:22:19,895 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT * FROM "schema_info" LIMIT 1
10:22:19,905 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT 1 AS "one" FROM "schema_info" LIMIT 1
10:22:19,912 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) SELECT count(*) AS "count" FROM "schema_info" LIMIT 1
10:22:19,919 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:19,926 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.000000s) SELECT "version" FROM "schema_info" LIMIT 1
10:22:19,936 INFO  [razor.web.log] (http-/0.0.0.0:8080-2) 10.108.81.78 - - [19/Mar/2015 10:22:19] "GET /svc/mk/extension.zip " 404 40 0.0600
10:22:20,139 INFO  [razor.sequel] (http-/0.0.0.0:8080-2) (0.001000s) CREATE TABLE IF NOT EXISTS "schema_info" ("version" integer DEFAULT 0 NOT NULL)

----
I don't see anything else related to the nodes we are trying to reboot.   We have some other systems that are undergoing maintenance and I see razor is repeatedly attempting to get the power state for 4 of them (unsuccessfully)

10:37:49,285 INFO  [razor.messaging.sequel] (Thread-3 (HornetQ-client-global-threads-1414145431)) retry message ID:9f353111-ccf3-11e4-9a14-bb950502efae after 1.02: executing ["ipmitool", "-I", "lanplus", "-H", "192.168.2.33", "-U", "user", "-f", "/tmp/ipmitool-password20150319-52745-11tcjfo", "power", "status"] failed: #<Process::WaitThread:0x53022940>
Error: Unable to establish IPMI v2 / RMCP+ session
Error: Unable to establish IPMI v2 / RMCP+ session
Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status


Maybe this is clogging up the queue?   I'm not sure how to manage the queue.

Thanks for your help,
Mike

Scott McClellan

unread,
Mar 23, 2015, 7:29:55 PM3/23/15
to puppet...@googlegroups.com
Hi Mike,

Apologies for the late reply. Can you check `razor commands` to see if there are any failed or pending commands there?

As for the queue, we're looking at adding debugging commands to view its state, which we expect to be added in a future release. If you need to dig deeper for now, read on . . .

Torquebox includes a package for interrogation/management of the Torquebox queue. For now, the easiest way is to deploy this gem (`torquebox-backstage`), which is a self-contained application that needs to know the TORQUEBOX_HOME directory. From there, you should be able to query the queues to see if that is the issue.

Scott

--
You received this message because you are subscribed to the Google Groups "puppet-razor" group.
To unsubscribe from this group and stop receiving emails from it, send an email to puppet-razor...@googlegroups.com.
To post to this group, send email to puppet...@googlegroups.com.
Visit this group at http://groups.google.com/group/puppet-razor.
For more options, visit https://groups.google.com/d/optout.

Mike Citrix

unread,
Mar 24, 2015, 1:40:57 PM3/24/15
to puppet...@googlegroups.com
Hi Scott,

Thanks for getting back to me!   I checked the razor commands.  All commands are 'finished' with 0 errors.

I installed torquebox-backstage, browsed to /backstage/queues, and observed 0 scheduled for /queues/razor/sequel-instance messages.  There were over 73,000 messages.  I hit clear.  Now I'm just waiting for that to finish.

When I issue razor reboot-node it states that the job was scheduled but when I look at backstage/queues the scheduled count stays at 0

Thanks again for your help.

Regards,
Mike



On Wednesday, March 18, 2015 at 6:07:53 PM UTC-4, Mike Citrix wrote:

Michael Conn

unread,
Jul 31, 2015, 1:48:13 PM7/31/15
to puppet-razor, mikem....@gmail.com
Hi Mike,

I was having this exact problem. In my setup, the machines are all Dell servers with IDRAC heads. One server wasn't setup to allow IPMI over LAN which caused the same error you highlighted above:

"Error: Unable to establish IPMI v2 / RMCP+ session
Error: Unable to establish IPMI v2 / RMCP+ session
Error: Unable to establish IPMI v2 / RMCP+ session
Unable to get Chassis Power Status"

It was, in fact, clogging the queue up. As soon as I enabled IPMI over LAN on the problem machine's IDRAC settings, Razor-Server was able to get the power status. As soon as that happened, all the queued reboot commands I had sent for various nodes kicked off.

I don't know if you ever solved the issue on your end, but this thread pointed me to the right place to figure it out. I thought I'd post my solution for anyone running into the same issue.

-Michael 
Reply all
Reply to author
Forward
0 new messages