How to debug pipelines not starting?

597 views
Skip to first unread message

stevem

unread,
Mar 31, 2015, 11:55:51 AM3/31/15
to go...@googlegroups.com
Hi,

I am defining a lot of new pipelines at the moment in GO and I sometimes see this frustrating behavior: I manually trigger a pipeline, an hourglass shows up for a couple of seconds beside the Label area (I presume, trying to compute the next label number) and then nothing... like I never actually click on the manual trigger. 

What could explain this? I know that if the pipeline is not associated to a defined environment or if the materials cannot be fetched (ex: wrong credentials for SVN), it could have this behavior. But what else? I checked those two possibilities already and now, I have no idea why my pipeline does not run anymore.

Any idea? Maybe some tricks as to how to debug GO agents?

Thanks a lot,

Steve

Aravind SV

unread,
Mar 31, 2015, 1:42:33 PM3/31/15
to stevem, go...@googlegroups.com
On Tue, Mar 31, 2015 at 11:55 AM, stevem <steeve....@gmail.com> wrote:
Any idea? Maybe some tricks as to how to debug GO agents?

Usually, if that happens, you should see a server health message (bottom-right corner, red box). You shouldn't need to look at the agents, because, if it did not schedule a build, then the problem is at the server side, not the agent side. Meaning, it hasn't reached the agent side at all, yet.

stevem

unread,
Mar 31, 2015, 2:06:14 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
All right, I looked into that, thanks. 

It seems that a pipeline is locked for more than 24 hours in the "building" state (but nothing is really building) and I cannot cancel the job from the Web UI. The GO server has been restarted twice and the agents also restarted. How can I perform some sort of cleanup in GO if a restart of the server does not suffice to be able to cancel jobs. This job is preventing all the other builds to start so essentially, my complete build system is not working now ;-(

Thanks for any help as to how to recover from a stalled GO system.

Steve

Marius Ciotlos

unread,
Mar 31, 2015, 2:16:48 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
You should be able to cancel the job from the UI. You can try to click on Agents tab and click on the "building" link and then cancel the job from the details (clicking one level up in the breadcrumb navigation). 
I think Go has some queue system that it publishes jobs on, so your job might still be in the queue until it completes (this is just a guess). 

If I am not mistaken, the way I did it was to disable the agent that is building, then delete the agent. 
If the agent is up it will re-register, so you only need to enable it again and re-assign the resources to it. This might be better cleanup.

You might also want to look at the agent and job APIs to do this easier then through the interface (http://www.go.cd/documentation/user/current/api/agent_api.html and http://www.go.cd/documentation/user/current/api/job_api.html). In theory the API should be more robust then the UI. 

Marius

stevem

unread,
Mar 31, 2015, 2:35:46 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
Hi Marius, 

Thanks for the quick answer; I cannot cancel from the UI, it seems to time out and fail and still offer the X for cancellation. I was able to disable the agent, but not to delete it; It gives me the following error:

I will look into the API as you suggest and see if I have more success.


Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request POST /go/agents/edit_agents.

Reason: Error reading from remote server

Steve

Aravind SV

unread,
Mar 31, 2015, 2:40:11 PM3/31/15
to stevem, go...@googlegroups.com
That timeout error seems to be coming from a proxy in front of Go. Doesn't look like a message from Go to me. Do you have Apache or something in front of the Go server?

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

stevem

unread,
Mar 31, 2015, 2:50:40 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
Hi,

Yes, here's the detailed message from the curl commands I used (same message for both agent delete and stage cancel):

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>502 Proxy Error</title>
</head><body>
<h1>Proxy Error</h1>
<p>The proxy server received an invalid
response from an upstream server.<br />
The proxy server could not handle the request <em><a href="/go/api/agents/d4f29eae-c732-414a-a41b-06b838ce80b0/delete">P
OST&nbsp;/go/api/agents/d4f29eae-c732-414a-a41b-06b838ce80b0/delete</a></em>.<p>
Reason: <strong>Error reading from remote server</strong></p></p>
<hr>
<address>Apache/2.2.15 (CentOS) Server at srv-go Port 80</address>
</body></html>

Any idea what I should ask my IT people to look for?

Thanks again!

Steve

Aravind SV

unread,
Mar 31, 2015, 2:53:29 PM3/31/15
to stevem, go...@googlegroups.com
It's probably just taking longer than the timeout setup in the proxy. If you can hit the Go server (port 8153) directly, you should try it. If not, you should ask them to increase the proxy timeout and see if it helps.

stevem

unread,
Mar 31, 2015, 3:26:16 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
I tried directly on the server; after commanding a delete of the agent using curl and the REST API and after waiting for more than 15 minutes, nothing happens. It looks like the internal GO DB is in a bad state. Any way to recover a GO DB beside a backup restore?

Thanks,

Steve

Aravind SV

unread,
Mar 31, 2015, 3:38:11 PM3/31/15
to stevem, go...@googlegroups.com
Not knowing what's going on inside there, I can't say what it would take to "recover" the DB. I've never even really seen a DB get into a bad state, except in cases where someone did a kill -9 on a server or something of that sort.

I'd be interested in seeing what's happening in those 15 minutes. If you can send the output of /go/api/support while it is stuck, I might be able to help. I'll probably take a look at it later in the evening, though.

Cheers,
Aravind

Marius Ştefan Ciotloş

unread,
Mar 31, 2015, 4:04:01 PM3/31/15
to Aravind SV, stevem, go...@googlegroups.com
How is your server setup? Is the storage local or remote? Do you have enough free space on your storage?
You received this message because you are subscribed to a topic in the Google Groups "go-cd" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/go-cd/peRL3AhuZVo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to go-cd+un...@googlegroups.com.

stevem

unread,
Mar 31, 2015, 4:09:42 PM3/31/15
to go...@googlegroups.com, steeve....@gmail.com
Hi, 

OK, I finally succeeded to use the API and delete the faulty agent and cancel the blocked stage while the server was restarted (and before it went into some sort of infinite loop trying to build). It seems the server is back to normal and stable; I used the occasion to perform a backup as soon as I could ;-)

Thanks to all for your support. Weird issue; never got anything like that in the past.

Steve

stevem

unread,
Mar 31, 2015, 4:10:57 PM3/31/15
to go...@googlegroups.com, arv...@thoughtworks.com, steeve....@gmail.com, marius....@gmail.com
Hi Please see my other answer below. It seems the server is back to normal now. Had to be precise in interacting with the starting server before it went into some sort of infinite loop. I will certainly revise our backup strategy right now!

Steve

Aravind SV

unread,
Mar 31, 2015, 4:14:23 PM3/31/15
to stevem, go...@googlegroups.com
On Tue, Mar 31, 2015 at 4:10 PM, stevem <steeve....@gmail.com> wrote:
Hi Please see my other answer below. It seems the server is back to normal now. Had to be precise in interacting with the starting server before it went into some sort of infinite loop. I will certainly revise our backup strategy right now!

Happy it's back. I'm ruing the lost opportunity to find out why it was happening.

I promise I'm not hoping it will happen again. Not much anyway. :)
Reply all
Reply to author
Forward
0 new messages