needs to be running before executing a script on it. current state: PENDING

29 views
Skip to first unread message

Tom Barber

unread,
Oct 24, 2012, 11:57:05 AM10/24/12
to jcl...@googlegroups.com
HI Guys,

I'm testing on the Rackspace nextgen stuff, and my code that runs fine on EC2 gives me:
Caused by: java.lang.IllegalStateException: node DFW/073e9685-cd25-407d-a7b7-d477d8a40148 needs to be running before              
                      executing a script on it. current state: PENDING

I know the server was up, I was ssh'd into it on a console as well.

I'm using the client.submitScriptOnNode() method.

Anyone seen similar?

Thanks

Tom

Everett Toews

unread,
Oct 24, 2012, 12:10:49 PM10/24/12
to jcl...@googlegroups.com
I've done similar using the ComputeService.runScriptOnNode() without issues on the Rackspace NextGen stuff, see [1] and [2]. Maybe something in there will help.

Is it possible for you to isolate the issue in a small app and share it with us in a gist?


Regards,
Everett

--
You received this message because you are subscribed to the Google Groups "jclouds" group.
To post to this group, send email to jcl...@googlegroups.com.
To unsubscribe from this group, send email to jclouds+u...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/jclouds?hl=en.

Chris Strand

unread,
Oct 24, 2012, 5:31:59 PM10/24/12
to jcl...@googlegroups.com
Hey Tom & Everett,

I experienced this regularly on the old Rackspace.

The problem for me was that Rackspace API often reports the node as pending even after it has booted up. Due to the way my code was set up I was polling port 22 myself to see when the server was set up rather than using jclouds blocking methods. I then went on to use ComputeService.runScriptOnNode() which, from my understanding, checks the API to see the status of the server. Since it was sometimes still pending and not running/active it could fail.

If you aren't using the jclouds ways of blocking until a server comes up, or if you aren't polling it's state from the API then this could be the reason. I haven't actually upgraded to next gen yet so I could be barking up the wrong tree of course!

I fixed this by polling the API and then trying to run commands. This was implemented with a Predicate to check if the NodeState was Running and using that within a RetryablePredicate (a super awesome jclouds class). If you think it will help you I can dig out the code for this - just ask!

Hope this can shed some light,

Chris

Everett Toews

unread,
Oct 24, 2012, 6:48:42 PM10/24/12
to jcl...@googlegroups.com
Great point Chris. I've done something similar to that too and I literally just happened to have the code on hand. :)

Tom, have a look at [1] ServerStatusPredicate.java and [2] IterableServersStatusPredicate.java for some ideas on how to do this.


Cheers,
Everett

Tom Barber

unread,
Oct 26, 2012, 4:40:33 AM10/26/12
to jcl...@googlegroups.com
Thanks guys,

Looks like whats needed. Wouldn't it make sense for JClouds to check this internally?

Cheers

Tom

Everett Toews

unread,
Oct 26, 2012, 9:22:32 AM10/26/12
to jcl...@googlegroups.com
jclouds does but only in the portable ComputeService API.

Adrian Cole

unread,
Oct 26, 2012, 9:39:33 AM10/26/12
to jcl...@googlegroups.com, jclou...@googlegroups.com

It sounds like from this thread, rackspace incorrectly reports accessible servers as pending state.

The only solutions I can think of are:
* remove the server status check and instead only rely on TCP port accessibility
* doublecheck status on PENDING

Any other thoughts?
-A

--

Chris Strand

unread,
Oct 30, 2012, 5:11:56 AM10/30/12
to jcl...@googlegroups.com
Relying on TCP port accessibility seems to make sense to me in this case since I can't think of any cases where this wouldn't be accurate.

Would it make sense to do this just for Rackspace or for other providers too?

Chris

Adrian Cole

unread,
Oct 30, 2012, 10:33:01 AM10/30/12
to jcl...@googlegroups.com, jclou...@googlegroups.com

We should talk about it, as right now, rackspace is the only one with this bug.

Even if the socket test is tuned to something we consider just long enough to be valid, it will cause an unnecessary delay on pending or terminated nodes.  If someone has hundreds of nodes, this could add up quite a bit.

What I'd suggest is that we:
* raise this bug to rackspace, hopefully in a way they can see for themselves the glitch
* make the health check pluggable and default to current logic
* make a temporary change to rackspace (or module in rackspace) to double-check pending or skip state check

I suppose we could alternatively introduce a check strategy flag..

Wdyt?
-A

Everett Toews

unread,
Oct 30, 2012, 11:05:16 AM10/30/12
to jcl...@googlegroups.com, jclou...@googlegroups.com, Tom Barber
I haven't run into this bug yet but I agree it needs to be raised if it's an issue. 

Tom Barber came across it originally. Tom, is it possible for you to isolate the issue in a small app and share it with us in a gist?

If not, we can work to isolate it ourselves.

Thanks,
Everett
Reply all
Reply to author
Forward
0 new messages