Pipeline intermittently not triggered with Github PR plugin

138 views
Skip to first unread message

ju...@thoughtworks.com

unread,
May 23, 2016, 5:57:29 PM5/23/16
to go-cd
We are using GoCD version 15.2 with Github Pull Request Builder Plugin version 1.2.4.


An issue we are seeing is that at least once a day, the pipeline that uses the PR plugin is unable to be triggered despite PRs being created. The only temporary solution we've found was to restart the Go server.

In "Pipeline" view, the play buttons are grayed out, while in "Environment" view, the pipeline is "preparing to schedule".

Pipeline tab:


































Here is what we see in the plugin logs:
2016-05-23 11:03:14,752  WARN [92@MessageListener for MaterialUpdateListener] GitHubPRBuildPlugin:67 - get latest revisions since:
java.lang.RuntimeException: Exception Occurred: [git, fetch, origin, +refs/pull/*/head:refs/remotes/origin/pull-request/*] - /home/go-server/pipelines/flyweight/2d857165-df0a-4f33-b466-0b62b4d54b68
        at com.tw.go.plugin.cmd.Console.runOrBomb(Console.java:33)
        at com.tw.go.plugin.git.GitCmdHelper.runAndGetOutput(GitCmdHelper.java:397)
        at com.tw.go.plugin.git.GitCmdHelper.runOrBomb(GitCmdHelper.java:385)
        at com.tw.go.plugin.git.GitCmdHelper.fetch(GitCmdHelper.java:195)
        at com.tw.go.plugin.GitHelper.fetchAndReset(GitHelper.java:109)
        at com.tw.go.plugin.GitHelper.fetchAndResetToHead(GitHelper.java:99)
        at com.tw.go.plugin.GitHelper.cloneOrFetch(GitHelper.java:41)
        at in.ashwanthkumar.gocd.github.GitHubPRBuildPlugin.handleLatestRevisionSince(GitHubPRBuildPlugin.java:190)
        at in.ashwanthkumar.gocd.github.GitHubPRBuildPlugin.handle(GitHubPRBuildPlugin.java:88)
        at com.thoughtworks.go.plugin.infra.DefaultPluginManager$1.execute(DefaultPluginManager.java:172)
        at com.thoughtworks.go.plugin.infra.DefaultPluginManager$1.execute(DefaultPluginManager.java:167)
        at com.thoughtworks.go.plugin.infra.FelixGoPluginOSGiFramework.executeActionOnTheService(FelixGoPluginOSGiFramework.java:315)
        at com.thoughtworks.go.plugin.infra.FelixGoPluginOSGiFramework.doOn(FelixGoPluginOSGiFramework.java:245)
        at com.thoughtworks.go.plugin.infra.DefaultPluginManager.submitTo(DefaultPluginManager.java:167)
        at com.thoughtworks.go.plugin.access.PluginRequestHelper.submitRequest(PluginRequestHelper.java:32)
        at com.thoughtworks.go.plugin.access.scm.SCMExtension.latestModificationSince(SCMExtension.java:156)
        at com.thoughtworks.go.server.service.materials.PluggableSCMMaterialPoller.modificationsSince(PluggableSCMMaterialPoller.java:77)
        at com.thoughtworks.go.server.service.materials.PluggableSCMMaterialPoller.modificationsSince(PluggableSCMMaterialPoller.java:46)
        at com.thoughtworks.go.server.service.MaterialService.modificationsSince(MaterialService.java:110)
        at com.thoughtworks.go.server.materials.ScmMaterialUpdater.insertLatestOrNewModifications(ScmMaterialUpdater.java:52)
        at com.thoughtworks.go.server.materials.PluggableSCMMaterialUpdater.insertLatestOrNewModifications(PluggableSCMMaterialUpdater.java:61)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.insertLatestOrNewModifications(MaterialDatabaseUpdater.java:155)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterialWithNewRevisions(MaterialDatabaseUpdater.java:147)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater$2.doInTransaction(MaterialDatabaseUpdater.java:110)
        at com.thoughtworks.go.server.transaction.TransactionCallback.doWithExceptionHandling(TransactionCallback.java:24)
        at com.thoughtworks.go.server.transaction.TransactionTemplate$3.doInTransaction(TransactionTemplate.java:53)
        at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:130)
        at com.thoughtworks.go.server.transaction.TransactionTemplate.executeWithExceptionHandling(TransactionTemplate.java:49)
        at com.thoughtworks.go.server.materials.MaterialDatabaseUpdater.updateMaterial(MaterialDatabaseUpdater.java:107)
        at com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:48)
        at com.thoughtworks.go.server.materials.MaterialUpdateListener.onMessage(MaterialUpdateListener.java:29)
        at com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.runImpl(JMSMessageListenerAdapter.java:65)
        at com.thoughtworks.go.server.messaging.activemq.JMSMessageListenerAdapter.run(JMSMessageListenerAdapter.java:50)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.commons.exec.ExecuteException: Process exited with an error: 128 (Exit value: 128)
        at org.apache.commons.exec.DefaultExecutor.executeInternal(DefaultExecutor.java:404)
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:166)
        at org.apache.commons.exec.DefaultExecutor.execute(DefaultExecutor.java:153)
        at com.tw.go.plugin.cmd.Console.runOrBomb(Console.java:25)
        ... 33 more

Has anyone come across this issue? 

We're also curious about why the restart fixes the issue (although only temporarily), given that the error was happening when a git command was executed on the flyweight directory. Any pointers on this perhaps?

sta...@thoughtworks.com

unread,
Jun 27, 2016, 11:39:35 AM6/27/16
to go-cd
Having worked with Julia on this issue, I can share some findings:

We tracked this down to git processes intermittently hanging when fetching material updates on the GoCD server:
  • in the go/api/support endpoint, we saw MaterialUpdateListener threads hanging (the issue was affecting more than one pipeline)
  • ps aux | grep git on the server showed that a few git processes have been running for hours
A restart fixes the issue temporarily because there's an in-memory data structure tracking material updates that is only cleared (for a particular material) when its corresponding MaterialUpdateListener thread has completed its run. In our case, the thread kept waiting for the hung processes so the data wasn't cleared. Restarting GoCD server clears this data.

We suspected that the hanging git processes was related to some network instability. Per Aravind's suggestion, we wrapped the git binary with strace, and indeed we found that the process keeps waiting on a socket read.

Another possible factor is the git version used in the server, which is 1.8.3.1 (the latest available version in the internal repository, but really old). Our agents use the same old version, and we have seen the same hanging issue there as well.

A temporary workaround which has worked for us is to wrap the git binary used by GoCD around timeout. We are also looking into upgrading git and understanding the observed network instability.

Matt Richardson

unread,
Jun 29, 2016, 9:21:56 AM6/29/16
to go-cd
Are you running on windows or linux?

We found (a few years back, admittedly) that git on windows didn't work under load (something about kernel file locks), so we had to change all our go instances to linux, and the issue went away.

It was a slightly different issue - git fetch's were taking an awfully long time on windows (17 minutes rings a bell), but once on linux they were back to ~5 seconds. I've no idea if the root cause of the issue has been fixed or not, but I thought I'd add my 2 cents in.

cheers,
Matt

Sofia Tania

unread,
Jun 29, 2016, 9:46:13 AM6/29/16
to go...@googlegroups.com
Hi Matt,

Thanks for chiming in. We're using Linux (RHEL) and our git process seems to wait on a socket, so it's likely not the same issue.

Cheers,
Tania

--
You received this message because you are subscribed to the Google Groups "go-cd" group.
To unsubscribe from this group and stop receiving emails from it, send an email to go-cd+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages