I've also had a similar problem lately on a repository here where
there was a change on the Gerrit repository, i.e.
refs/changes/88/1199/1 that was pointing to a bad object. This was
causing errors for the users when they were trying to upload with a
similar message. You could try running git fsck --full on the
repository and see if there are any error messages.
It's also quite possible there is a bug in the branch access code, I
was not very familiar with the JGit code I modified and I might have
overlooked something.
> --
> To unsubscribe, email repo-discuss...@googlegroups.com
> More info at http://groups.google.com/group/repo-discuss?hl=en
>
Sounds like that might be the case. If the problem seems to disappear
by the client creating a new clone, it might be issue 390 on the
server, or a reference that is pointing at something that really is
gone and that `git fsck --full` would complain about.
During a push the server tells the client things it has. If the
client has similar content, it can create deltas for the server that
assumes the server has what it told the client it has. If the server
doesn't actually have something (e.g. it lied in that initial message
to the client), then the server can't unpack that delta and will crash
out.
In a fresh clone, its possible the client doesn't have something now
that the older clone had before. Which means the client might not be
using the same delta, and is therefore able to create a different
delta which the server can actually recognize.
Issue 390 caused the server to temporarily lose access to an object on
disk. But it may still have claimed to the client that it has that
object. Nico recently found a different bug where the server might
advertise it has something, even though its legitimately missing from
disk. This latter case can at least be discovered with `git fsck
--full`.
> It's also quite possible there is a bug in the branch access code, I
> was not very familiar with the JGit code I modified and I might have
> overlooked something.
Also possible, but I know I went back and fixed some issues in there.
It seemed reasonably correct to me when I left it. :-)
Dangling commits are OK. Its odd to see them on the server, but its
not corruption, and wouldn't be causing this. A future `git gc` will
remove them from the filesystem when its safe to do so.
If you aren't yet on 2.1.2.5 (or a recent master snapshot that uses
the newer JGit), I have to say that I still suspect issue 390 here.
I don't think any version that starts with 1.5 is supported.
Thanks
Nico
On Friday, June 4, 2010, Fredrik Luthander
We don't publish this, because in theory any Git client newer than the
earliest 0.99 can talk to Gerrit Code Review over SSH. I haven't
tested something that old, but the basic protocol is the same. Any
Git client newer than 1.5.0 can most certainly talk to Gerrit over SSH
as most protocol advances came between 0.99 and 1.5.
To use the http:// protocols you need 1.6.6 or later.
I always recommend the latest Git release, because its pretty well
tested, and the latest release always contains more bug fixes and
features than the prior releases. Sure there's a risk of breakage or
regression, but I think the folks who build Git do an amazing job at
trying to keep the system more stable with each new version. I wish
we were that good with Gerrit Code Review. Of course it helps testing
when there are literally hundreds of contributors per release. :-)
Can you humor me and cherry-pick
2366d781b35de7e2e5740560a5877a01e24e44b1 into your running version?
Its the commit that fixes issue 390. According to what you have told
me above, you don't have that in your build, which means this may be
just another symptom of the same fundamental bug in issue 390... JGit
loses access to objects on disk because a file was closed behind its
back.
Also, I think these are both red-herrings. The client version (I
assume you meant git 1.7.x.x above?) shouldn't have an impact on
whether or not the server can properly scan the pack that was sent to
it. No known client version of Git has ever produced an invalid pack
on the network. Some versions (1.5.4?) have had bugs while scanning a
pack fetched over the network in particular circumstances, but its
never sent a bad pack.
The read access controls add an extra level of processing during the
receive code path, which causes more objects to be checked inside of
the server. Because the server is checking more objects, its
increasing the odds that issue 390 will cause an object to be looked
for and not get found.
If I'm right about the root cause of issue 390, its pretty random.
The failure is typically started when a client uses Ctrl-C to kill a
running command, and then that project might start to show failures if
the server was accessing Git data when the Ctrl-C interrupt was
received. If you have enough users, you can't really predict who will
Ctrl-C when/where, it just happens. Humans are these fallible
creatures who hit up-arrow-enter in the wrong terminal window and
restart a command they didn't mean to run.
So please give commit 2366d781b35de7e2e5740560a5877a01e24e44b1 a try
and let us know if its resolved the problem.
We must have made some mistake when looking at the SHA1s then, Ulrik specifically have delivered a binary to us that have the 390-fix, but without the schema upgrade.
Ulrik, comments on this? Can you upload our commit to the local gerrit repo so I can run my second set of eyes on the commit-tree? :)
BR,
Fredrik
Arrrgh.
> However, we've made the error go away a few times, by just fiddling
> with things like caches and rights and stuff. However, we have so far
> not been able to spot any pattern, so this is still a long shot from
> our side.
The only thing that should have an impact here is READ +1 permissions.
And the only cache that matters is the "projects" cache, as that is
what holds a cached copy of the permissions records from the database.
> We had one users where we added full rights on the project to the
> user, plussing everything as much as possible. That didn't help, but
> when we then consequently flushed all caches on the upload server, it
> suddenly work. However, we have not been able to repeat that behaviour
> a second time. Using the latest version of gerrit on master doesn't
> seem to help either. Currently we are quite confused to be honest.
Yikes.
Can we get more information about a particular occurrence of this problem?
When the unpack fails with a missing object, I'd like to know:
- Message(s) on the client side.
- Full stack trace(s) from the server when the upload failed.
- Is the missing SHA-1 actually in the server repository? (`git
cat-file -t SHA1`)
- Are there branch level READ permissions in the project?
- Does the user have READ on all branches, or just a subset?
- If its a subset, is the following true? `git rev-list --objects
refs/heads/branch-user-can-see refs/heads/other-branch-user-can-see
... | grep SHA1`