Ubuntu 15.04
java version "1.8.0_45" (Oracle)
On AWS
We're getting this exception:
2016-06-09 13:59:22,156] [SSH git-receive-pack '/product-configurator' (<uname>)] ERROR com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook : Cannot list open changes of <repo-name>
com.google.gwtorm.server.OrmException: java.io.IOException: null: NIOFSIndexInput(path="/mnt/ebs2/services/gerrit/index/changes_0025/open/_4g_Lucene50_0.tim")
at com.google.gerrit.lucene.LuceneChangeIndex$QuerySource.read(LuceneChangeIndex.java:465)
at com.google.gerrit.server.index.IndexedChangeQuery.read(IndexedChangeQuery.java:106)
at com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115)
at com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:162)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:103)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:87)
at com.google.gerrit.server.query.change.InternalChangeQuery.query(InternalChangeQuery.java:245)
at com.google.gerrit.server.query.change.InternalChangeQuery.byProjectOpen(InternalChangeQuery.java:207)
at com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook.advertiseOpenChanges(ReceiveCommitsAdvertiseRefsHook.java:101)
at com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook.advertiseRefs(ReceiveCommitsAdvertiseRefsHook.java:90)
at org.eclipse.jgit.transport.AdvertiseRefsHookChain.advertiseRefs(AdvertiseRefsHookChain.java:85)
at org.eclipse.jgit.transport.BaseReceivePack.sendAdvertisedRefs(BaseReceivePack.java:1042)
at org.eclipse.jgit.transport.ReceivePack.service(ReceivePack.java:179)
at org.eclipse.jgit.transport.ReceivePack.receive(ReceivePack.java:161)
....
Caused by: java.io.IOException: null: NIOFSIndexInput(path="/mnt/ebs2/services/gerrit/index/changes_0025/open/_4g_Lucene50_0.tim")
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:190)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnumFrame.loadBlock(SegmentTermsEnumFrame.java:157)
at org.apache.lucene.codecs.blocktree.SegmentTermsEnum.seekExact(SegmentTermsEnum.java:507)
...
Caused by: java.nio.channels.ClosedByInterruptException
at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:202)
at sun.nio.ch.FileChannelImpl.readInternal(FileChannelImpl.java:713)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:694)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:180)
... 47 more
I suspect that the problem is caused by this scenario:
- get a loaded server (in our case load is between 8 and 9). The load is usually coming from gerrit running a lucene reindex after a restart
- try to push a change and impatiently interrupt the push (in our case, eclipse egit automatically interrupts the push after 30s of no reply from server because of the load)
Above exception will pop up in gerrit's error_log
When pushed from the command line (no timeout), there is no exception.
It seems to me that an interrupted connection should not break the connection to the Lucene database.
Any comments?
Kresimir
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
On 9 Jun 2016, at 17:15, 'Shawn Pearce' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:No way. You've got to be kidding me.Gerrit had this exact bug in 2010. We fixed it by moving off NIO and using "old school" IO to read files from disk, because the SSH daemon inside of Gerrit absolutely sends interrupts to threads when users do things like disconnect.In this case it looks like someone aborted a `git push` by pressing Ctrl-C on their workstation, the SSH daemon interrupted the work thread, the work thread was reading from the Lucene index, and the interrupt shot the index in the head and killed the system.
Thanks guys.If I understand correctly, the change will make sure that there is a pool to the lucene database and if one connection dies another one will be used? Will there be an auto-reconnect option so that the pool would not get drained?
[index]
type = LUCENE
threads = 2
[2016-06-11 08:58:21,306] [SSH git-receive-pack '/<repo name>' (kresimir.tonkovic)] ERROR com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook : Cannot list open changes of <repo name>
com.google.gwtorm.server.OrmException: java.io.IOException: null: NIOFSIndexInput(path="/mnt/ebs2/services/gerrit/index/changes_0025/open/_4g_Lucene50_0.tim")
at com.google.gerrit.lucene.LuceneChangeIndex$QuerySource.read(LuceneChangeIndex.java:465)
at com.google.gerrit.server.index.IndexedChangeQuery.read(IndexedChangeQuery.java:106)
at com.google.gerrit.server.query.change.AndSource.readImpl(AndSource.java:115)
at com.google.gerrit.server.query.change.AndSource.read(AndSource.java:99)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:162)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:103)
at com.google.gerrit.server.query.change.QueryProcessor.queryChanges(QueryProcessor.java:87)
at com.google.gerrit.server.query.change.InternalChangeQuery.query(InternalChangeQuery.java:245)
at com.google.gerrit.server.query.change.InternalChangeQuery.byProjectOpen(InternalChangeQuery.java:207)
at com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook.advertiseOpenChanges(ReceiveCommitsAdvertiseRefsHook.java:101)
at com.google.gerrit.server.git.ReceiveCommitsAdvertiseRefsHook.advertiseRefs(ReceiveCommitsAdvertiseRefsHook.java:90)
...
Caused by: java.io.IOException: null: NIOFSIndexInput(path="/mnt/ebs2/services/gerrit/index/changes_0025/open/_4g_Lucene50_0.tim")
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:190)
at org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:342)
at org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:54)
at org.apache.lucene.store.DataInput.readVInt(DataInput.java:125)
at org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:221)
...
Caused by: java.nio.channels.ClosedChannelException
at sun.nio.ch.FileChannelImpl.ensureOpen(FileChannelImpl.java:109)
at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:688)
at org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:180)
... 47 more
I interrupted 5 connections. It seems to me that the interrupt still reaches Lucene, but at least now it looks like it recovers and I can continue work without having to restart gerrit.
Thank you very much for your help!
Kresimir
https://gerrit-review.googlesource.com/#/c/77380/ seems to address this problem?
On Fri, Jun 10, 2016 at 10:58 AM, 'Björn Pedersen' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:https://gerrit-review.googlesource.com/#/c/77380/ seems to address this problem?It protects from user canceling a push but it doesn't protect when the worker thread is killedafter reaching receive.timeout. See my comment in that change.
On Fri, Jun 17, 2016 at 7:42 AM, Saša Živkov <ziv...@gmail.com> wrote:On Fri, Jun 10, 2016 at 10:58 AM, 'Björn Pedersen' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:https://gerrit-review.googlesource.com/#/c/77380/ seems to address this problem?It protects from user canceling a push but it doesn't protect when the worker thread is killedafter reaching receive.timeout. See my comment in that change.(Run all Lucene queries on background threads)This may help, as it addresses the case from the OP when a push is Ctrl-C'd part way through. Almost any Git wire protocol operation (including fetch and push) need to search the index for changes in that project to perform filtering of the refs/changes/ namespace when not all refs are readable by the client.However I agree with Saša, this is still not enough. The indexAsync() futures can be cancelled by a calling thread, which may cause an interrupt in the wrong place within Lucene. We need to do more work to isolate Lucene from the interrupts.