Want not valid during clonning

1,346 views
Skip to first unread message

Nguyen Tuan Khang Phan

unread,
Jan 9, 2023, 1:49:16 PM1/9/23
to Repo and Gerrit Discussion
On Gerrit versions 3.1 and 3.4 we are encountering:

`stderr: fatal: remote error: want <sha1> not valid`

What we found out that a full aggressive gc fixes the issue, however the build is still failed. Our jenkins job pushes lots of tags to the server, can it be mitigated somehow?

Luca Milanesio

unread,
Jan 10, 2023, 6:35:54 PM1/10/23
to Nguyen Tuan Khang Phan, Luca Milanesio, Repo and Gerrit Discussion
Are these “want <sha1> not value” happening *whilst* you are pushing the tags?

Luca.

Nguyen Tuan Khang Phan

unread,
Jan 10, 2023, 6:59:00 PM1/10/23
to Repo and Gerrit Discussion
Are these “want <sha1> not value” happening *whilst* you are pushing the tags?

Luca.

It's after pushing a lot of tags that the clone operation for the next Jenkins run fails.  The job creates and deletes close to 600 tags every 5 mins.

Anthony Wallace

unread,
Jan 10, 2023, 7:07:09 PM1/10/23
to Repo and Gerrit Discussion
> we found out that a full aggressive gc fixes the issue
Is that aggressive gc run on the Gerrit server or on the Jenkins build machine? - thanks
/Anthony

Nguyen Tuan Khang Phan

unread,
Jan 10, 2023, 7:45:25 PM1/10/23
to Repo and Gerrit Discussion
> Is that aggressive gc run on the Gerrit server or on the Jenkins build machine? - thanks
> /Anthony\
 
We run it on that repo through a separate server (gc-conductor + gc-executor plugins) 

Sven Selberg

unread,
Jan 11, 2023, 4:46:51 AM1/11/23
to Repo and Gerrit Discussion
I think there's your issue.

First of all, you should never delete tags!

Secondly, we have discovered that when deleting refs for a highly requested repository you can end up in a timing situation where (something like) this happens:
1. Client: "these are my refs, show me yours [...]"
2. Server: "these are my refs [..., {refs/tags/tag-that-will-be-deleted, abc123}, ...]"
3.  Server: ref refs/tags/tag-that-will-be-deleted is deleted.
4. Client: "I want [..., {refs/tags/tag-that-will-be-deleted, abc123}, ...]'"
5. Server: "Want not valid"

We fixed this issue by starting to use protocol version 2 [1] where this situation is mitigated.

Luca Milanesio

unread,
Jan 11, 2023, 6:39:50 PM1/11/23
to Repo and Gerrit Discussion, Luca Milanesio, Sven Selberg

On 11 Jan 2023, at 09:46, Sven Selberg <sven.s...@axis.com> wrote:



On Wednesday, January 11, 2023 at 12:59:00 AM UTC+1 phan....@gmail.com wrote:
Are these “want <sha1> not value” happening *whilst* you are pushing the tags?

Luca.

It's after pushing a lot of tags that the clone operation for the next Jenkins run fails.  The job creates and deletes close to 600 tags every 5 mins.

I think there's your issue.

First of all, you should never delete tags!

+1, but the same would happen with any ref, isn’t it?


Secondly, we have discovered that when deleting refs for a highly requested repository you can end up in a timing situation where (something like) this happens:
1. Client: "these are my refs, show me yours [...]"
2. Server: "these are my refs [..., {refs/tags/tag-that-will-be-deleted, abc123}, ...]"
3.  Server: ref refs/tags/tag-that-will-be-deleted is deleted.
4. Client: "I want [..., {refs/tags/tag-that-will-be-deleted, abc123}, ...]'"
5. Server: "Want not valid"

Yes, that’s why I asked if they see this in concurrency with pushes or ref-updates.

We have seen this behaviour with many users and the typical situation is the “temporary refs” created by the CI systems that are tagging what they are building, for being able to rebuild it in the future.
Then, of course, accumulating refs would just blow up the Git protocol, that’s why they typically remove them in bulk, creating this issue.

Often people ask “why I do not have the same issue with GitHub or GitLab?”
Well, I recall when we found the security issue and fixed it in Gerrit where we were checking if the SHA1 requested was actually valid and reachable or not: before that fix, the problem did not happen also with Gerrit.

IMHO serving a SHA1 that isn’t reachable or visible is incorrect and is dangerous: therefore the “wants not valid” during cloning is the expected behaviour.

We fixed this issue by starting to use protocol version 2 [1] where this situation is mitigated.


True, but narrowing down what is advertised, we have less risks of including something of “volatile”.
We also have used successfully the ‘git-refs-filter’ [2] which also applies to earlier Gerrit versions, before the Git protocol v2 support.

HTH

Luca.



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/repo-discuss/d746cefb-955d-4212-afad-828d2d82d95an%40googlegroups.com.

euph...@gmail.com

unread,
Jan 23, 2023, 2:26:37 AM1/23/23
to Repo and Gerrit Discussion
I suspect there is a concurrency issue in here on the Gerrit server side.  I have a busy repository that I've started seeing want not valid errors with while doing a git remote update on a client.  When I go look at the Gerrit server, I find that the ref that the error is being reported on was pushed to the server shortly before the git remote update was run by the client (it's still there, and hasn't been deleted).  I would think that if the receive-pack is done and Gerrit is advertising that new ref to another client, that ref should be valid when that client asks for it.

It's pretty easy for me to reproduce.  I have a fairly active repo on my Gerrit server.  I did a git clone --mirror onto a client and set up a job to do a git remote update on it once a minute.  I started seeing want not valid errors immediately.  The largest lag I've measured so far was a failure on a commit that had been pushed in 19 seconds before the git remote update was run.

This is with Gerrit 3.7.0 on a CentOS 7 server, with git 2.39.0 on a Rocky 8 client.


--Andrew

Luca Milanesio

unread,
Jan 23, 2023, 5:26:33 AM1/23/23
to Repo and Gerrit Discussion, Luca Milanesio, euph...@gmail.com

On 23 Jan 2023, at 07:26, euph...@gmail.com <euph...@gmail.com> wrote:

I suspect there is a concurrency issue in here on the Gerrit server side.  I have a busy repository that I've started seeing want not valid errors with while doing a git remote update on a client.  When I go look at the Gerrit server, I find that the ref that the error is being reported on was pushed to the server shortly before the git remote update was run by the client (it's still there, and hasn't been deleted).  I would think that if the receive-pack is done and Gerrit is advertising that new ref to another client, that ref should be valid when that client asks for it.

It's pretty easy for me to reproduce.  I have a fairly active repo on my Gerrit server.  I did a git clone --mirror onto a client and set up a job to do a git remote update on it once a minute.  I started seeing want not valid errors immediately.  The largest lag I've measured so far was a failure on a commit that had been pushed in 19 seconds before the git remote update was run.

I’d say that it is expected, not really a bug here.

Luca.

Anthony Wallace

unread,
Jan 23, 2023, 12:45:35 PM1/23/23
to Repo and Gerrit Discussion
> > I would think that if the receive-pack is done and Gerrit is advertising that new ref to another client, that ref should be valid when that client asks for it
> > I did a git clone --mirror onto a client and set up a job to do a git remote update on it once a minute. I started seeing want not valid errors immediately

> I’d say that it is expected, not really a bug here.

Amping up the git remote updates to once a minute and seeing these intermittent errors sooner:  expected.  
If the same error occurs less often under milder conditions, but it does occur, then it sounds like a bug.  Unless the Git client requests were invalid or incorrect.   

What should the system administrator or CI-script-writer change to improve future outcomes?  

Thanks! 

anna.fr...@gmail.com

unread,
Aug 18, 2023, 8:37:18 AM8/18/23
to Repo and Gerrit Discussion
Hi,

We have a repo where this happens several times a day causing Jenkins jobs to fail. I would not call the repo busy, I see 50 changes today. Is there anything I can do to mitigate this?

BR Anna

euph...@gmail.com

unread,
Aug 18, 2023, 6:27:23 PM8/18/23
to Repo and Gerrit Discussion
On Friday, August 18, 2023 at 5:37:18 AM UTC-7 anna.fr...@gmail.com wrote:
Hi,

We have a repo where this happens several times a day causing Jenkins jobs to fail. I would not call the repo busy, I see 50 changes today. Is there anything I can do to mitigate this?

BR Anna


If you're hitting the same problem I was (Gerrit advertising a newly pushed ref, then giving a want not valid to a client after advertising that new ref to the client) on a less-busy repo, then there's a couple of things you could try.  If you're using the Gerrit Trigger plugin in Jenkins and triggering on Patchset Created, then your odds of hitting this are higher, because your Jenkins job is starting right after the new ref came in (and probably starts by fetching from your repo).  You could increase the Quiet Period setting in your job from the default 5 seconds to 30.  Or you could try adding a sleep at the beginning of the job.  If you're using scripted pipelines, you could also wrap your checkout steps and sh steps that run git commands in a try/catch block, and retry when they fail (which is what I've been doing).


--Andrew

Martin Fick

unread,
Aug 18, 2023, 6:41:25 PM8/18/23
to Repo and Gerrit Discussion
On Friday, 18 August 2023 at 16:27:23 UTC-6 euph...@gmail.com wrote:
On Friday, August 18, 2023 at 5:37:18 AM UTC-7 anna.fr...@gmail.com wrote:
Hi,

We have a repo where this happens several times a day causing Jenkins jobs to fail. I would not call the repo busy, I see 50 changes today. Is there anything I can do to mitigate this?

BR Anna


If you're hitting the same problem I was (Gerrit advertising a newly pushed ref, then giving a want not valid to a client after advertising that new ref to the client) on a less-busy repo, then there's a couple of things you could try.  If you're using the Gerrit Trigger plugin in Jenkins and triggering on Patchset Created, then your odds of hitting this are higher, because your Jenkins job is starting right after the new ref came in (and probably starts by fetching from your repo).  You could increase the Quiet Period setting in your job from the default 5 seconds to 30.  Or you could try adding a sleep at the beginning of the job.  If you're using scripted pipelines, you could also wrap your checkout steps and sh steps that run git commands in a try/catch block, and retry when they fail (which is what I've been doing).


--Andrew


Could this issue be because of asynchronous writes to Lucene?  Would using ES prevent this?

-Martin

Dmitry P

unread,
Sep 25, 2024, 4:06:38 AM9/25/24
to Repo and Gerrit Discussion
What to do if the suggested workaround doesn't help (in 3.4)? Does this issue happen in versions beyond 3.4?

Luca Milanesio

unread,
Sep 25, 2024, 4:18:05 AM9/25/24
to Repo and Gerrit Discussion, Luca Milanesio
The solution is to use the uploadpack.allowAnySHA1InWant in jgit.config, which isn’t support on Gerrit v3.4 because both Gerrit and JGit are EOLs.

I would suggest to:
1. Upgrade to Gerrit v3.10.1
2. In the meantime, if you know how to do it, apply the fix on the JGit code in stable-5.13 (see [1]).

HTH

Luca.



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
Sep 25, 2024, 5:47:58 AM9/25/24
to Luca Milanesio, Repo and Gerrit Discussion
We found that also with protocol v2 there is a race which can cause WantNotValidException.
 

Luca Milanesio

unread,
Sep 25, 2024, 6:18:24 AM9/25/24
to Repo and Gerrit Discussion, Luca Milanesio
Good catch @Matthias and @Thomas !

Would it make sense to push the fixes also to older branches?

Luca.

Dmitry P

unread,
Sep 25, 2024, 7:49:18 AM9/25/24
to Repo and Gerrit Discussion
So just more retries? 

Matthias Sohn

unread,
Sep 25, 2024, 7:59:13 AM9/25/24
to Luca Milanesio, Repo and Gerrit Discussion
As soon as the fix was approved and submitted on stable-6.10 we can cherry-pick it to older branches.
The oldest Gerrit release which isn't EOL is currently 3.8 and it already uses JGit 6.10.
Are there older versions you need the fix for ?

Dmitry P

unread,
Sep 25, 2024, 9:12:18 AM9/25/24
to Repo and Gerrit Discussion
It would be nice to have for 3.4 since upgrade to 3.10 is a long road for us.

As of now, can you tell me what users can do if they face this issue? I assume just more retries, but even that sometimes doesn't help.

Matthias Sohn

unread,
Sep 25, 2024, 10:36:40 AM9/25/24
to Dmitry P, Repo and Gerrit Discussion
Gerrit 3.4 is using JGit 5.13 which is 12 JGit releases before 6.10. If you could update to Gerrit 3.5 that's using JGit 6.6 which is much closer ...
 
As of now, can you tell me what users can do if they face this issue? I assume just more retries, but even that sometimes doesn't help.

If they face the issue fixed in this change, it's caused by a race with another thread.
This means retrying may help.
 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Matthias Sohn

unread,
Sep 25, 2024, 10:52:05 AM9/25/24
to Dmitry P, Repo and Gerrit Discussion
I moved the fix to the stable-6.6 branch

Dmitry P

unread,
Sep 25, 2024, 10:55:08 AM9/25/24
to Repo and Gerrit Discussion
'by a race with another thread' does it mean Gerrit has performance issues and needs more threads?

Matthias Sohn

unread,
Sep 25, 2024, 11:07:33 AM9/25/24
to Dmitry P, Repo and Gerrit Discussion
No, it means AdvertisedRequestValidator.checkWants() fails since another thread updates some ref 
which UploadPack read before which causes the check to fail.
 

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
Message has been deleted

Nguyen Tuan Khang Phan

unread,
Sep 25, 2024, 6:48:28 PM9/25/24
to Repo and Gerrit Discussion
We can try to port the fix to older jgit, the cherry-pick was clean. We plan to upgrade Gerrit to 3.8 as soon as we can, hopefully when its still supported. 

Piotr Szlązak

unread,
Sep 26, 2024, 4:36:47 AM9/26/24
to Repo and Gerrit Discussion
On Wednesday, September 25, 2024 at 10:18:05 AM UTC+2 Luca Milanesio wrote

What to do if the suggested workaround doesn't help (in 3.4)? Does this issue happen in versions beyond 3.4?

The solution is to use the uploadpack.allowAnySHA1InWant in jgit.config, which isn’t support on Gerrit v3.4 because both Gerrit and JGit are EOLs.

Hello,
Can you please check following issue reported for JGit[1]?
I noticed that once uploadpack.allowAnySHA1InWant is enabled, no allow-tip-sha1-in-want and/or allow-reachable-sha1-in-want capabilities are reported by JGit.
This will lead to 'Server does not allow request for unadvertised object'. More details to be found in [2].

[1] https://github.com/eclipse-jgit/jgit/issues/68

Regards,
Piotr Szlazak

Dmitry P

unread,
Nov 6, 2024, 10:28:23 AM11/6/24
to Repo and Gerrit Discussion
2 questions: how to mitigate the issue for those who still on 3.4 and why it happens on a host where users can't push (and so remove tags)?

Luca Milanesio

unread,
Nov 6, 2024, 10:38:25 AM11/6/24
to Repo and Gerrit Discussion
Use the git-refs-filter (see [3]) and you’ll be able to reduce the number of refs advertised and therefore reduce the chances of “wants-not-valid”.
And, as usual, upgrade :-) as you know v3.4 is EOL and unsupported by the community: you are 6 versions behind and, in 1 month, 7 versions behind as soon as v3.11 is out.

HTH

Luca.



--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.

Dmitry P

unread,
Nov 6, 2024, 10:52:24 AM11/6/24
to Repo and Gerrit Discussion
Is there something users can do from their side aside from trying to fetch again or do a fresh clone, if you may know? 

vlad...@gmail.com

unread,
May 11, 2025, 4:27:50 AMMay 11
to Repo and Gerrit Discussion
Recently I  noticed few cases of "want not valid" in 3.10.4.
i though this this is solved long time ago.
Anyone else sees it?  any chance it was "even better fixed" in more recent versions?

On Wednesday, November 6, 2024 at 5:38:25 PM UTC+2 Luca Milanesio wrote:
Reply all
Reply to author
Forward
0 new messages