--
Best regards,
Fredrik Luthander
Sony Ericsson
Mobile Communications
sonyericsson.com
Hi Fredrik,
It'd be very helpful to see some info from your replication.config and to know
which version of Gerrit you're running.
Chances are all you need to do is set:
remote.<name>.timeout (described fully here:
http://gerrit.googlecode.com/svn/documentation/2.0/config-replication.html)
It defaults to waiting indefinitely, but it seems you're getting stuck with that.
That said, we also do replication, but I've never seen it get stuck, even on a
host with a very high load.
Nasser
FYI, timeout is busted due to a nasty race condition bug inside the JSch client library we use for replication over ssh.
Unfortunately there also isn't a way to kill or restart the replication to a particular destination once it gets stuck. Java doesn't really provide a great way to safely abort a running thread which is currently running code that we did not write (JSch) to be abortable.
I need to go a back into JGit here and rework how timeout is implemented. I (and several others according to mailing archives) tried reporting the issues to the JSch developers but they say its working fine and don't want to fix it.
Best you can do right now is put each replication server into its on remote block. IIRC this will create one job queue per remote and at least the other sites will remain current when the one site gets stuck.
On Oct 21, 2009 10:19 AM, "Nasser Grainawi" <nas...@codeaurora.org> wrote:Luthander, Fredrik wrote: > Hello all Gerrit users! > > My team and I run a configuration of gerr...
Hi Fredrik,
It'd be very helpful to see some info from your replication.config and to know
which version of Gerrit you're running.
Chances are all you need to do is set:
remote.<name>.timeout (described fully here:
http://gerrit.googlecode.com/svn/documentation/2.0/config-replication.html)
It defaults to waiting indefinitely, but it seems you're getting stuck with that.
That said, we also do replication, but I've never seen it get stuck, even on a
host with a very high load.
Nasser
--~--~---------~--~----~------------~-------~--~----~ To unsubscribe, email repo-discuss+unsubscrib...
Check the replication docs, there is a thread parameter per remote that permits more than one project to replicate at a time. Each thread works independently so a restart is only necessary once all threads are stuck.
On Oct 23, 2009 12:07 PM, "Fredrik Luthander" <fredrik....@sonyericsson.com> wrote:
Hi everyone, and thanks for your prompt support!
In here we're running 2.0.22 currently, waiting eagerly for 2.0.24 any
day now. :-)
I'd like to thank Nasser for pointing us to the timeout-option. We
tried it, but that didn't work very well as pointed out by Shawn.
We've had our fair share of gerrit restarts during the day, hehe.
The git config already has one section per server, so as suggested
replication only hangs to the server in question and not all of them.
Can you have several threads but only one server in a section, and
thus have several threads sync to the same site? (I'm guessing no..)
Right now we're investigating other services on the quarreling server.
It's only one server that has these problems, and we have not been
able to identify if there's a configuration or service on the machine
that is the cause of our problems. This is the current theory though,
so we'll try to disable services one by one as long as we see the
problems. Maybe we'll get lucky with that.
Again, any debug info I can extract for you I'll be happy to give you.
Our current fix is to have a script monitor the show-queue command and
then restart the service automatically as soon as it's filling up and
not emptying as it should..
--
Best regards, Fredrik Luthander Sony Ericsson Mobile Communications sonyericsson.com
On Oct 22, 10:12 pm, Shawn Pearce <s...@google.com> wrote: > FYI, timeout is busted due to a nasty r...
> On Oct 21, 2009 10:19 AM, "Nasser Grainawi" <nas...@codeaurora.org> wrote: > > Luthander, Fredrik ...
> remote.<name>.timeout (described fully here:http://gerrit.googlecode.com/svn/documentation/2.0/config-replication...)
> > It defaults to waiting indefinitely, but it seems you're getting stuck with > that. > > That sai...
--~--~---------~--~----~------------~-------~--~----~ To unsubscribe, email repo-discuss+unsubscribe...