Git LFS and Gerrit?

2,860 views
Skip to first unread message

Doug Kelly

unread,
Oct 5, 2015, 4:44:15 PM10/5/15
to Repo and Gerrit Discussion
Hey all,

I've been toying with the idea of adding Git LFS support to Gerrit for a while now -- possibly through the plugins API.  Most of this shouldn't be too bad, since you can easily register the new SSH command (although due to the limitations on how commands can be bound, the plugin would have to be named "git-lfs-authenticate")... and you can bind a new servlet to handle the Git LFS download/upload. There's a few key points with this:
* Obviously, the file storage part would have to be written.
* A new capability for LFS upload (by project?) would probably be worth adding (possible via the existing plugin extension points).
* The existing capability for project read is probably good enough, but it may be worthwhile adding a new capability, especially if not all users have access to refs/* for a project.  This would come with a caveat that there's no way to enforce branch permissions with Git LFS.
* I'd imagine the storage could be organized by project to isolate objects from different projects (along with making permissions per project).
* Another "nice to have" would be the ability for plugins to bind top-level commands differing from the name of the plugin.  This would just increase clarity for those using SSH (although HTTP authentication would need to be configured for LFS to work at all).

The trickier part would be interfacing the actual git lfs client.  It auto-configures from https://server/path/to/repo/info/lfs -- there's no way within GitOverHttpServlet (currently) to extend what is exposed per-project (in part, this is a limitation of the GitServlet/GitFilter as implemented in JGit, but it looks like we should be able to add a new path within Gerrit without too much fuss).  Additionally, you would probably want the entire system authenticating using the same auth filter as configured by GitOverHttpModule -- but these filters aren't exposed publicly.  So, I think this is something that could be extended, but I'm not confident on what would be the cleanest way to do this.

Does anyone else have some thoughts on this?  Currently, we don't use Git LFS at the day job, but I'd imagine this could be a really nice extension for users.

Thanks!

--Doug Kelly

Dave Borowitz

unread,
Oct 5, 2015, 5:29:02 PM10/5/15
to Doug Kelly, Repo and Gerrit Discussion
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

evlacan

unread,
Oct 5, 2015, 5:36:00 PM10/5/15
to Repo and Gerrit Discussion
Hi Doug,

Even though I see some value in having with Git LFS integrated with Gerrit however we had a more practical approach to this topic.
Artifactory[1] which we currently use in our organization offers integration for Git LFS [2]. All you care is to install or have access to Git LFS client and configure it to use Artifactory to store binaries.Git LGS server information is stored in .gitconfig and all you have to do is run "git lfs init".


In fact I prefer that large files are stored in a system that is designed and optimized to handle binariy files and Artifactory does a good job at this. Plus I get a bunch of other features that Artifatory offers by default which otherwise have to be developed by Gerrit community.
My opinion at this point is that instead of making Gerrit to handle everything better integrate Gerrit with other systems and let them provide nice features. Github have a business case on why they want to have their own dedicated binary storage but I don't see why this would be at least our case.


Doug Kelly

unread,
Oct 5, 2015, 11:28:41 PM10/5/15
to Repo and Gerrit Discussion, doug...@gmail.com


On Monday, October 5, 2015 at 4:29:02 PM UTC-5, Dave Borowitz wrote:
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

Yep.  That would be correct (why I suggested a special capability for the plugin)... so, it would work, just not by refname.

As for why considering such a plugin?  If nothing else, it could be possible to expose the info/lfs special path which would allow us to pass a reference to the actual Git LFS server (for example, if using Artifactory).  This would eliminate the manual configuration of Git LFS.

As far as actually storing the objects and accessing them from this hypothetical LFS plugin?  This could be a neat extension for some -- since it could work within the existing confines of Gerrit (credentials would match, no need for a separate server, etc.) -- the underlying storage could be anything (even putting it on NFS would possibly work for some users).

So, my thought is not to extend core Gerrit (other than to the extent necessary to make such features possible), but to provide a plugin as an alternative to standalone systems like Artifactory for those who would like an all-in-one solution (or, in the case of an existing Artifactory deployment, being able to point to it).

--Doug

luca.mi...@gmail.com

unread,
Oct 6, 2015, 2:07:03 AM10/6/15
to Doug Kelly, Repo and Gerrit Discussion
Hi Doug,
Thanks for sharing your plans :-) your  idea looks really appealing !!!

Luca

Sent from my iPhone
--

Saša Živkov

unread,
Oct 6, 2015, 4:18:38 AM10/6/15
to Dave Borowitz, Doug Kelly, Repo and Gerrit Discussion
On Mon, Oct 5, 2015 at 11:28 PM, 'Dave Borowitz' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

Some months ago we discussed exactly this issue.
If I remember it well, the URL used to put a large object is composed by the LFS server.
Assuming the LFS server is a Gerrit plugin it could build such a URL which would contain
the project-branch into the URL.

Edwin Kempin

unread,
Oct 6, 2015, 4:37:05 AM10/6/15
to Saša Živkov, Dave Borowitz, Doug Kelly, Repo and Gerrit Discussion
2015-10-06 10:17 GMT+02:00 Saša Živkov <ziv...@gmail.com>:


On Mon, Oct 5, 2015 at 11:28 PM, 'Dave Borowitz' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

Some months ago we discussed exactly this issue.
If I remember it well, the URL used to put a large object is composed by the LFS server.
Assuming the LFS server is a Gerrit plugin it could build such a URL which would contain
the project-branch into the URL.


On Mon, Oct 5, 2015 at 4:44 PM, Doug Kelly <doug...@gmail.com> wrote:
Hey all,

I've been toying with the idea of adding Git LFS support to Gerrit for a while now -- possibly through the plugins API.  Most of this shouldn't be too bad, since you can easily register the new SSH command (although due to the limitations on how commands can be bound, the plugin would have to be named "git-lfs-authenticate")... and you can bind a new servlet to handle the Git LFS download/upload. There's a few key points with this:
* Obviously, the file storage part would have to be written.
* A new capability for LFS upload (by project?) would probably be worth adding (possible via the existing plugin extension points).
* The existing capability for project read is probably good enough, but it may be worthwhile adding a new capability, especially if not all users have access to refs/* for a project.  This would come with a caveat that there's no way to enforce branch permissions with Git LFS.
* I'd imagine the storage could be organized by project to isolate objects from different projects (along with making permissions per project).
* Another "nice to have" would be the ability for plugins to bind top-level commands differing from the name of the plugin.  This would just increase clarity for those using SSH (although HTTP authentication would need to be configured for LFS to work at all).
You should be able to bind plugin SSH commands to top-level SSH commands by defining SSH aliases in the gerrit.config [1].


Edwin Kempin

Software Engineer

eke...@google.com

+16502534437

Google Germany GmbH

Dienerstraße 12

80331 München


Geschäftsführer: Graham Law, Christine Elizabeth Flores

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg


Diese E-Mail ist vertraulich. Wenn Sie nicht der richtige Adressat sind, leiten Sie diese bitte nicht weiter, informieren Sie den Absender und löschen Sie die E-Mail und alle Anhänge. Vielen Dank.

      

This e-mail is confidential. If you are not the right addressee please do not forward it, please inform the sender, and please erase this e-mail including any attachments. Thanks.

Matthias Sohn

unread,
Oct 6, 2015, 4:45:52 AM10/6/15
to Doug Kelly, Repo and Gerrit Discussion
we started working on LFS support in JGit in summer
this effort stalled since another project got higher priority but we intend to take it up again later this year.

We consider to provide LFS client support and a simple LFS server in JGit and integrate the LFS server part
into Gerrit in a new Gerrit plugin.

-Matthias

Dave Borowitz

unread,
Oct 6, 2015, 9:03:40 AM10/6/15
to Doug Kelly, Repo and Gerrit Discussion
On Mon, Oct 5, 2015 at 11:28 PM, Doug Kelly <doug...@gmail.com> wrote:


On Monday, October 5, 2015 at 4:29:02 PM UTC-5, Dave Borowitz wrote:
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

Yep.  That would be correct (why I suggested a special capability for the plugin)... so, it would work, just not by refname.

Personally, I think this is very dangerous. People spend all this time setting up complex branch level ACLs and then you give them a plugin that throws those ACLs out the window. It's only a matter of time until something is leaked because someone thought their branch-level ACLs would also apply to large files but they don't.

But I don't think the branch ACL problem is unsolvable. Two things come to mind:

1. Can we do a visibility check efficiently? Doing it naively (walk all visible refs looking for the particular large file ID) is easy. Throwing a cache in front of that may help. Doing something more sophisticated, like git does with reachability bitmaps, may help even more. (But note that since large file IDs are not git object IDs, existing reachability bitmaps don't actually help.)

2. Can we add a branch parameter to the API to give Gerrit a hint for what branch to check reachability from? I mentioned this in passing to Rick shortly after the launch, and he didn't seem opposed.
 
As for why considering such a plugin?  If nothing else, it could be possible to expose the info/lfs special path which would allow us to pass a reference to the actual Git LFS server (for example, if using Artifactory).  This would eliminate the manual configuration of Git LFS.

As far as actually storing the objects and accessing them from this hypothetical LFS plugin?  This could be a neat extension for some -- since it could work within the existing confines of Gerrit (credentials would match, no need for a separate server, etc.) -- the underlying storage could be anything (even putting it on NFS would possibly work for some users).

So, my thought is not to extend core Gerrit (other than to the extent necessary to make such features possible), but to provide a plugin as an alternative to standalone systems like Artifactory for those who would like an all-in-one solution (or, in the case of an existing Artifactory deployment, being able to point to it).

--Doug

--

Doug Kelly

unread,
Oct 6, 2015, 10:58:45 AM10/6/15
to Repo and Gerrit Discussion, doug...@gmail.com


On Tuesday, October 6, 2015 at 8:03:40 AM UTC-5, Dave Borowitz wrote:


On Mon, Oct 5, 2015 at 11:28 PM, Doug Kelly <doug...@gmail.com> wrote:


On Monday, October 5, 2015 at 4:29:02 PM UTC-5, Dave Borowitz wrote:
From what I recall of the spec (which, caveat, I haven't read closely since the announcement), one of the biggest incompatibilities is that objects are just looked up by ID, without passing a refname context in which they are being viewed. This doesn't work with Gerrit's permission system.

Yep.  That would be correct (why I suggested a special capability for the plugin)... so, it would work, just not by refname.

Personally, I think this is very dangerous. People spend all this time setting up complex branch level ACLs and then you give them a plugin that throws those ACLs out the window. It's only a matter of time until something is leaked because someone thought their branch-level ACLs would also apply to large files but they don't.

But I don't think the branch ACL problem is unsolvable. Two things come to mind:

1. Can we do a visibility check efficiently? Doing it naively (walk all visible refs looking for the particular large file ID) is easy. Throwing a cache in front of that may help. Doing something more sophisticated, like git does with reachability bitmaps, may help even more. (But note that since large file IDs are not git object IDs, existing reachability bitmaps don't actually help.)

2. Can we add a branch parameter to the API to give Gerrit a hint for what branch to check reachability from? I mentioned this in passing to Rick shortly after the launch, and he didn't seem opposed.

Interesting idea -- actually Sasa's idea isn't that bad (include the branch in the URL) -- but the client would have to be updated to support some sort of substitution from the info/lfs reference (i.e. %b expands to the "current branch" and %c expands to the current commit when fetching an object, or even specifying them in headers, such as X-Git-Branch and X-Git-Commit).

The complexity comes in with computing the reachability on the fly, as you point out.  The pointer format isn't terribly difficult to parse [1], but there's nothing short of opening the file that could tell you.  About the quickest thing is to open the file and read the first line, seeing if it matches a "version" key for Git LFS.  Maybe objects that match this on-upload could be tagged in some special way to quickly find them again, then it's just a matter of determining standard Git object reachability (especially if we can build key/value pairs for LFS OID to git object IDs referencing it)?  There might still have to be a bit of the naive scan for cases where whatever caching we do falls out of sync.

Also, we'd have to be kind to users that may already be using LFS with a 3rd party server: this could be accomplished through project.config and/or server.config, by having a parameter to set the lfs reference.  If this is set, we disable any and all object reachability checks on that project, since it's hosted elsewhere. :)

Maybe the first round of this can be to enhance support for those with 3rd party servers, by setting up the info/lfs URI and the git-lfs-authenticate SSH commands to return info about the LFS server (the extension points to change the GitHttpServlet would still be needed, unless this ends up in core -- which may or may not be the right place -- and the ssh alias configuration is nice, but being able to do this programmatically from a plugin would also be good).  As the remaining details are hashed out with how to handle the LFS implementation, that could be added later.  Having support in JGit to handle the backend storage would also be very nice -- so seeing that work was at least started on that is pretty cool.

Edwin Kempin

unread,
Oct 6, 2015, 11:07:15 AM10/6/15
to Doug Kelly, Repo and Gerrit Discussion
The idea of having the SSH alias in the gerrit.config is to avoid any conflicts between plugins trying to bind to the same top-level command.
You may still do this programmatically, e.g. the plugin may have an init step that writes the SSH alias configuration into gerrit.config.
 
 As the remaining details are hashed out with how to handle the LFS implementation, that could be added later.  Having support in JGit to handle the backend storage would also be very nice -- so seeing that work was at least started on that is pretty cool.

--

Sven Selberg

unread,
Oct 7, 2015, 2:58:06 AM10/7/15
to Repo and Gerrit Discussion, doug...@gmail.com
Great initiative Doug!

We also contemplated a Git LFS plugin to enable Git LFS with gerrit Authorization. Never had time to realize any of them though.
We though about taking advantage of the initial visibility check from the initial clone/fetch from gerrit, and propagate that to the git LFS plugin. Then the plugin would list all LF visible from the authorized and fetched ref, when the ssh call to authorize the LF comes from the client then the plugin would respond with short-lived url's of the LF OR authenticate the user in some other way.
But I never attempted to realize any of this so it might as well be crazy-talk.

I think one problem with the branch/LO mapping in the URL is that it's not 1-1 mapping:
Say you have a protected branch (prot) and upload a LF (LF-a)to that branch, later on you create a new branch (open) that is not protected but reaches 'LF-a', then everyone can access the 'open' branch but the visibility of 'LF-a' is determined by the visibility of the 'prot' branch. And in most cases you don't have a relationship where visibility of 'prot'  is a subset of visibility of 'open' so it doesn't make sense to set the URL to <something>~open~<something> either.
But I might have misunderstood your intentions.

/Sven

Sven Selberg

unread,
Oct 7, 2015, 3:00:10 AM10/7/15
to Repo and Gerrit Discussion, doug...@gmail.com


Den onsdag 7 oktober 2015 kl. 08:58:06 UTC+2 skrev Sven Selberg:
Then the plugin would list all LF visible from the authorized and fetched ref...
visible -> reachable 

Doug Kelly

unread,
Oct 7, 2015, 8:31:28 AM10/7/15
to Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 1:58 AM Sven Selberg <sven.s...@sonymobile.com> wrote:
Great initiative Doug!

We also contemplated a Git LFS plugin to enable Git LFS with gerrit Authorization. Never had time to realize any of them though.
We though about taking advantage of the initial visibility check from the initial clone/fetch from gerrit, and propagate that to the git LFS plugin. Then the plugin would list all LF visible from the authorized and fetched ref, when the ssh call to authorize the LF comes from the client then the plugin would respond with short-lived url's of the LF OR authenticate the user in some other way.
But I never attempted to realize any of this so it might as well be crazy-talk.

I think one problem with the branch/LO mapping in the URL is that it's not 1-1 mapping:
Say you have a protected branch (prot) and upload a LF (LF-a)to that branch, later on you create a new branch (open) that is not protected but reaches 'LF-a', then everyone can access the 'open' branch but the visibility of 'LF-a' is determined by the visibility of the 'prot' branch. And in most cases you don't have a relationship where visibility of 'prot'  is a subset of visibility of 'open' so it doesn't make sense to set the URL to <something>~open~<something> either.
But I might have misunderstood your intentions.
Right, I think that was what Dave was referring to, and I was trying to come up with some ideas off the top of my head.  The idea of passing a branch name would not be to store some information permanently about the objects, but to try to hint to the server where to start a reachability check.  If you know something about the commit the user's currently on, you may be able to more quickly find the LFS object and the associated branch(es) it's currently referenced from.  That was why I was also thinking a key-value store might be useful: you could determine the Git objects that reference the LFS object, and from that, see if any of the Git objects are reachable by the current user.  The worst case is a dumb scan of the repository, but this should at least clear up the common cases.

Saša Živkov

unread,
Oct 7, 2015, 9:25:51 AM10/7/15
to Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 2:31 PM, Doug Kelly <doug...@gmail.com> wrote:


On Wed, Oct 7, 2015 at 1:58 AM Sven Selberg <sven.s...@sonymobile.com> wrote:
Great initiative Doug!

We also contemplated a Git LFS plugin to enable Git LFS with gerrit Authorization. Never had time to realize any of them though.
We though about taking advantage of the initial visibility check from the initial clone/fetch from gerrit, and propagate that to the git LFS plugin. Then the plugin would list all LF visible from the authorized and fetched ref, when the ssh call to authorize the LF comes from the client then the plugin would respond with short-lived url's of the LF OR authenticate the user in some other way.
But I never attempted to realize any of this so it might as well be crazy-talk.

I think one problem with the branch/LO mapping in the URL is that it's not 1-1 mapping:
Say you have a protected branch (prot) and upload a LF (LF-a)to that branch, later on you create a new branch (open) that is not protected but reaches 'LF-a', then everyone can access the 'open' branch but the visibility of 'LF-a' is determined by the visibility of the 'prot' branch. And in most cases you don't have a relationship where visibility of 'prot'  is a subset of visibility of 'open' so it doesn't make sense to set the URL to <something>~open~<something> either.
But I might have misunderstood your intentions.
Right, I think that was what Dave was referring to, and I was trying to come up with some ideas off the top of my head.  The idea of passing a branch name would not be to store some information permanently about the objects, but to try to hint to the server where to start a reachability check.
+1
 
If you know something about the commit the user's currently on

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Doug Kelly

unread,
Oct 7, 2015, 9:39:25 AM10/7/15
to Saša Živkov, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 8:25 AM Saša Živkov <ziv...@gmail.com> wrote:
On Wed, Oct 7, 2015 at 2:31 PM, Doug Kelly <doug...@gmail.com> wrote:


On Wed, Oct 7, 2015 at 1:58 AM Sven Selberg <sven.s...@sonymobile.com> wrote:
Great initiative Doug!

We also contemplated a Git LFS plugin to enable Git LFS with gerrit Authorization. Never had time to realize any of them though.
We though about taking advantage of the initial visibility check from the initial clone/fetch from gerrit, and propagate that to the git LFS plugin. Then the plugin would list all LF visible from the authorized and fetched ref, when the ssh call to authorize the LF comes from the client then the plugin would respond with short-lived url's of the LF OR authenticate the user in some other way.
But I never attempted to realize any of this so it might as well be crazy-talk.

I think one problem with the branch/LO mapping in the URL is that it's not 1-1 mapping:
Say you have a protected branch (prot) and upload a LF (LF-a)to that branch, later on you create a new branch (open) that is not protected but reaches 'LF-a', then everyone can access the 'open' branch but the visibility of 'LF-a' is determined by the visibility of the 'prot' branch. And in most cases you don't have a relationship where visibility of 'prot'  is a subset of visibility of 'open' so it doesn't make sense to set the URL to <something>~open~<something> either.
But I might have misunderstood your intentions.
Right, I think that was what Dave was referring to, and I was trying to come up with some ideas off the top of my head.  The idea of passing a branch name would not be to store some information permanently about the objects, but to try to hint to the server where to start a reachability check.
+1
 
If you know something about the commit the user's currently on

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Ah, what I was referring to was in the LFS protocol, the communication is very basic -- it happens outside the normal git channel, and instead is just a HTTP GET request for "/objects/<OID>" where <OID> is the LFS object ID.  No information about the branch/commit in Git the user is on is contained within this request. (If using the batch API, it could be a little different, requesting /objects/batch with the OID posted in a JSON request, but otherwise is the same idea.)  This returns the reference to the actual storage link for the client, though.

Shawn Pearce

unread,
Oct 7, 2015, 11:27:41 AM10/7/15
to Saša Živkov, Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 6:25 AM, Saša Živkov <ziv...@gmail.com> wrote:

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Just as a matter of correction, post 1.8.3 git-core clients can try to fetch any commit. Its now a valid in the wire protocol to ask for something by SHA-1. Servers usually disallow this, as evaluating "is commit reachable" is very time consuming. Its more feasible with bitmap indexes, but not implemented in Gerrit Code Review at this time.

git-core 2.5.0 servers can opt-in with uploadpack.allowReachableSHA1InWant set to true.

Shawn Pearce

unread,
Oct 7, 2015, 11:34:11 AM10/7/15
to Doug Kelly, Saša Živkov, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 6:39 AM, Doug Kelly <doug...@gmail.com> wrote:
On Wed, Oct 7, 2015 at 8:25 AM Saša Živkov <ziv...@gmail.com> wrote:
On Wed, Oct 7, 2015 at 2:31 PM, Doug Kelly <doug...@gmail.com> wrote:
On Wed, Oct 7, 2015 at 1:58 AM Sven Selberg <sven.s...@sonymobile.com> wrote:
Great initiative Doug!

We also contemplated a Git LFS plugin to enable Git LFS with gerrit Authorization. Never had time to realize any of them though.
We though about taking advantage of the initial visibility check from the initial clone/fetch from gerrit, and propagate that to the git LFS plugin. Then the plugin would list all LF visible from the authorized and fetched ref, when the ssh call to authorize the LF comes from the client then the plugin would respond with short-lived url's of the LF OR authenticate the user in some other way.
But I never attempted to realize any of this so it might as well be crazy-talk.

I think one problem with the branch/LO mapping in the URL is that it's not 1-1 mapping:
Say you have a protected branch (prot) and upload a LF (LF-a)to that branch, later on you create a new branch (open) that is not protected but reaches 'LF-a', then everyone can access the 'open' branch but the visibility of 'LF-a' is determined by the visibility of the 'prot' branch. And in most cases you don't have a relationship where visibility of 'prot'  is a subset of visibility of 'open' so it doesn't make sense to set the URL to <something>~open~<something> either.
But I might have misunderstood your intentions.
Right, I think that was what Dave was referring to, and I was trying to come up with some ideas off the top of my head.  The idea of passing a branch name would not be to store some information permanently about the objects, but to try to hint to the server where to start a reachability check.
+1
 
If you know something about the commit the user's currently on

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Ah, what I was referring to was in the LFS protocol, the communication is very basic -- it happens outside the normal git channel, and instead is just a HTTP GET request for "/objects/<OID>" where <OID> is the LFS object ID.  No information about the branch/commit in Git the user is on is contained within this request. (If using the batch API, it could be a little different, requesting /objects/batch with the OID posted in a JSON request, but otherwise is the same idea.)  This returns the reference to the actual storage link for the client, though.

You could take an approach like bitmap indexes and use Lucene index to support lookups.

Build an index of LFS OID mapping to Git commit SHA-1 that introduced that LFS OID to the repository and store this in Lucene so that OID is an indexed term. Building this is just a git log over the history of each project, doing a tree diff against each ancestor to see if it introduces a new LFS OID. Updating this index is trivial to walk a list of "new" commits since the last index update and store them too.

When an OID is requested by LFS look it up in this index and you get back a set of commit SHA-1s. For each commit SHA-1, see if it is reachable in a repository by doing branch checks. If you use bitmap indexes this can be fast especially for OIDs that don't change often as they will be inside the bitmap index.

For commits too recent to be in the bitmap index something like the existing TagVisibilityCache could be reimplemented to track some more recent commit mappings before they go into the bitmap. 

Edwin Kempin

unread,
Oct 7, 2015, 11:36:57 AM10/7/15
to Shawn Pearce, Saša Živkov, Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 5:27 PM, 'Shawn Pearce' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Wed, Oct 7, 2015 at 6:25 AM, Saša Živkov <ziv...@gmail.com> wrote:

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Just as a matter of correction, post 1.8.3 git-core clients can try to fetch any commit. Its now a valid in the wire protocol to ask for something by SHA-1. Servers usually disallow this, as evaluating "is commit reachable" is very time consuming. Its more feasible with bitmap indexes, but not implemented in Gerrit Code Review at this time.
As discussed in issue 175 [1] you can also configure

[uploadpack] allowTipSha1InWant = true

on your repo and then fetch tips by SHA-1.
This works with Gerrit.

 

git-core 2.5.0 servers can opt-in with uploadpack.allowReachableSHA1InWant set to true.

--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en

---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--

Doug Kelly

unread,
Oct 7, 2015, 11:44:03 AM10/7/15
to Shawn Pearce, Saša Živkov, Sven Selberg, Repo and Gerrit Discussion
You read my mind -- I was thinking of the Lucene index as well.  Good point for the updates; I was considering someone might do something directly with the repository on the server outside of Gerrit -- and how to best handle that.

Saša Živkov

unread,
Oct 8, 2015, 11:14:51 AM10/8/15
to Edwin Kempin, Shawn Pearce, Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
On Wed, Oct 7, 2015 at 5:32 PM, Edwin Kempin <eke...@google.com> wrote:


On Wed, Oct 7, 2015 at 5:27 PM, 'Shawn Pearce' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Wed, Oct 7, 2015 at 6:25 AM, Saša Živkov <ziv...@gmail.com> wrote:

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Just as a matter of correction, post 1.8.3 git-core clients can try to fetch any commit. Its now a valid in the wire protocol to ask for something by SHA-1. Servers usually disallow this, as evaluating "is commit reachable" is very time consuming. Its more feasible with bitmap indexes, but not implemented in Gerrit Code Review at this time.
As discussed in issue 175 [1] you can also configure

[uploadpack] allowTipSha1InWant = true

on your repo and then fetch tips by SHA-1.

Can one then fetch any SHA-1 from that repository?
Are reachability/readability checks done for the SHA1 being fetched?

Saša Živkov

unread,
Oct 8, 2015, 11:15:39 AM10/8/15
to Shawn Pearce, Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
Thanks for the info! 

Edwin Kempin

unread,
Oct 8, 2015, 11:18:54 AM10/8/15
to Saša Živkov, Shawn Pearce, Doug Kelly, Sven Selberg, Repo and Gerrit Discussion
On Thu, Oct 8, 2015 at 5:14 PM, Saša Živkov <ziv...@gmail.com> wrote:


On Wed, Oct 7, 2015 at 5:32 PM, Edwin Kempin <eke...@google.com> wrote:


On Wed, Oct 7, 2015 at 5:27 PM, 'Shawn Pearce' via Repo and Gerrit Discussion <repo-d...@googlegroups.com> wrote:
On Wed, Oct 7, 2015 at 6:25 AM, Saša Živkov <ziv...@gmail.com> wrote:

A client can never fetch a commit by its SHA1. Instead it fetches via reference(s).
Therefore, I think, we always know "something about the commit the user's is on".

Just as a matter of correction, post 1.8.3 git-core clients can try to fetch any commit. Its now a valid in the wire protocol to ask for something by SHA-1. Servers usually disallow this, as evaluating "is commit reachable" is very time consuming. Its more feasible with bitmap indexes, but not implemented in Gerrit Code Review at this time.
As discussed in issue 175 [1] you can also configure

[uploadpack] allowTipSha1InWant = true

on your repo and then fetch tips by SHA-1.

Can one then fetch any SHA-1 from that repository?
With this setting you can only fetch the tips/HEAD of branches (this includes the tips of the change refs).
 
Are reachability/readability checks done for the SHA1 being fetched?
I haven't verified this.

Matthias Sohn

unread,
Jan 30, 2016, 7:59:43 PM1/30/16
to Doug Kelly, Repo and Gerrit Discussion
we pushed a first working LFS implementation for review which includes

JGit:
- protocol support for the LFS batch API [1]
two storage implementations for storing the large objects
- simple file system storage [1]
- Amazon S3 storage [2]

Gerrit integration:
- integration of the protocol implementation into gerrit core [3]
- plugin lfs-storage-fs [4] integrating the file system storage based implementation into Gerrit

The implementation doesn't yet check permissions for LFS objects and needs more testing.
We will push another similar Gerrit plugin for the S3 storage soon.


-Matthias

David Pursehouse

unread,
Jul 11, 2016, 4:42:10 AM7/11/16
to Matthias Sohn, Doug Kelly, Repo and Gerrit Discussion
On Sun, Jan 31, 2016 at 9:59 AM Matthias Sohn <matthi...@gmail.com> wrote:

we pushed a first working LFS implementation for review which includes

JGit:
- protocol support for the LFS batch API [1]
two storage implementations for storing the large objects
- simple file system storage [1]
- Amazon S3 storage [2]

Gerrit integration:
- integration of the protocol implementation into gerrit core [3]
- plugin lfs-storage-fs [4] integrating the file system storage based implementation into Gerrit


In the implementation so far, LFS is enabled by adding a plugin that implements the extension point. By doing so, LFS is enabled globally for all projects.

Was there any specific reason not to allow it to be configurable (enabled or not) on a per-project level?


Saša Živkov

unread,
Jul 11, 2016, 5:37:36 AM7/11/16
to David Pursehouse, Matthias Sohn, Doug Kelly, Repo and Gerrit Discussion
No.
Actually, I just wanted to start working on that feature: have a (white) list of projects for which LFS is enabled.
However, if you already started working on that then I will await your change and review it :-)
Let me know.

Jacek Centkowski

unread,
Jul 11, 2016, 6:19:24 AM7/11/16
to Repo and Gerrit Discussion, david.pu...@gmail.com, matthi...@gmail.com, doug...@gmail.com
There are two ideas behind it:
1. have it configurable in UI in "Project Options" called like "Enable Git LFS" with 3 values "disabled" (default), "read-only", "active"
where "disabled" when none plugin is loaded or project is hidden, "read-only" when project is "read-only" or when user selects it for when project is "active", "active" only when project is active and user selects feature to "active"

2. and/or (as these ideas don't exclude each other ;)) have extension point in LfsPluginServlet that would basically call for validation - again would have to reach to project.config to get it

Regards
Jacek

David Pursehouse

unread,
Jul 11, 2016, 6:48:23 AM7/11/16
to Saša Živkov, Matthias Sohn, Doug Kelly, Repo and Gerrit Discussion
On Mon, Jul 11, 2016 at 6:37 PM Saša Živkov <ziv...@gmail.com> wrote:
On Mon, Jul 11, 2016 at 10:41 AM, David Pursehouse <david.pu...@gmail.com> wrote:
On Sun, Jan 31, 2016 at 9:59 AM Matthias Sohn <matthi...@gmail.com> wrote:

we pushed a first working LFS implementation for review which includes

JGit:
- protocol support for the LFS batch API [1]
two storage implementations for storing the large objects
- simple file system storage [1]
- Amazon S3 storage [2]

Gerrit integration:
- integration of the protocol implementation into gerrit core [3]
- plugin lfs-storage-fs [4] integrating the file system storage based implementation into Gerrit


In the implementation so far, LFS is enabled by adding a plugin that implements the extension point. By doing so, LFS is enabled globally for all projects.

Was there any specific reason not to allow it to be configurable (enabled or not) on a per-project level?

No.
Actually, I just wanted to start working on that feature: have a (white) list of projects for which LFS is enabled.
However, if you already started working on that then I will await your change and review it :-)
Let me know.

No, I have not started working on it.  Please go ahead :)


Saša Živkov

unread,
Jul 11, 2016, 6:52:08 AM7/11/16
to Jacek Centkowski, Repo and Gerrit Discussion, David Pursehouse, Matthias Sohn, Doug Kelly
On Mon, Jul 11, 2016 at 12:19 PM, Jacek Centkowski <geminica...@gmail.com> wrote:
There are two ideas behind it:
1. have it configurable in UI in "Project Options" called like "Enable Git LFS" with 3 values "disabled" (default), "read-only", "active"
where "disabled" when none plugin is loaded or project is hidden, "read-only" when project is "read-only" or when user selects it for when project is "active", "active" only when project is active and user selects feature to "active" 

2. and/or (as these ideas don't exclude each other ;)) have extension point in LfsPluginServlet that would basically call for validation - again would have to reach to project.config to get it

I propose 1+2. 
With 2, Gerrit admins can control which projects are allowed to use LFS.
With 1, project owners can activate LFS for their project (if they are on the allowed list).

Jacek Centkowski

unread,
Jul 11, 2016, 9:34:15 AM7/11/16
to Repo and Gerrit Discussion, geminica...@gmail.com, david.pu...@gmail.com, matthi...@gmail.com, doug...@gmail.com
I am currently working on having it read from projec.config - will probably sent sth for review this week ;)

Regards
Jacek

Jacek Centkowski

unread,
Jul 13, 2016, 8:19:32 AM7/13/16
to Repo and Gerrit Discussion, geminica...@gmail.com, david.pu...@gmail.com, matthi...@gmail.com, doug...@gmail.com
Have just published 2 changes https://gerrit-review.googlesource.com/#/q/topic:LFS-ProjectValidation that:
- retrieve LfsRequestSpec (contains project name and operation type)
- validate request against project.lfsState variable read from project configuration

these are MVP for validation against project config - can imagine that we could have this value derived from parent project like other inheritable properties...

Looking forward to hearing from you ;)
Reply all
Reply to author
Forward
0 new messages