Hi Andreas,
If you wish to update to a fully working Replicated LFS solution based around FS storage, we have one available on github that you could use as a starting point.
We have many Multi-master Gerrit environments all using the WANdisco Gerrit Multisite product, which all act as full time read-write Masters.
We therefore needed to implement LFS replication to match our standard git repository replication.
We wished to have an FS based storage solution to allow replication to be done on systems which couldn't and didnt' want to use S3 for the file storage.
In our own case the process of replication for the LFS data is done by a server running locally on the same box, which means we communicate with it via a REST api, and let it do the replication for us.
You will not have access to this but you can still use all the same endpoints and instead split in your own code.
I was going to implement a basic version of this local server as a gerrit plugin for LFS replication which used RSYNC to other servers, or transferred content using the LFS protocol to the remote nodes, either way it would allow someone to use this as a POC / demo outside of using a full WANdisco replication solution using one other plugin.
Our solution allows for any repository to be replicated or not. For a replicated repository it can itself be selectively replicated, to 1 server or all servers depending on how the system is setup. So that it entirely up to you to decide what suits your environment best.
The good thing is that the standard LFS plugin isn't changed very much at all to support replication, there is simply some hooks to the read and create points to allow for replicated checks or updates.
All the replication itself happens outside of this plugin, which allows this plugin to move forward easily and hopefully we can then contribute this back instead of using a fork and users simply plug in whatever replication (Plugin or resource) they wish to do that actual replication for them.
If you wish to try to work on this here is some quick pointers:
The content itself is replicated to another location by a single class: LfsReplicateContent->replicateLfsData
This is a REST Api request to our local server for replication, but you could update this to be a local plugin in gerrit to keep the logic seperate or just make the call inside this plugin if you so wish using something like RSYNC, I wanted to keep the method
of replication as seperate from the LFS data layer as possible, as we use this replication to do more than just LFS content.
The next thing is the intercept of the LFS calls are already updated in LfsFsContentServlet class.
This will be looking for our replication information being present on the LargeFileRepository object in order to see if the repository is replicated and what its replication identity is.
You will want to update this jgit object with whatever will represent your replication information grouping.
For us a repository can have a replication grouping, and this is something you would need so that you can a request if the item is already replicated to the other servers
a) do nothing its already replicated to all, so return with size.
b) its locally available then use this local copy - replicate to all in the group and continue.
c) its not locally available take the LFS content from the request, place it locally on your FS destination, replicate to all other servers in the group.
The last thing for a successful replication integration is the doGet which I have integrated with.
I changed the repository.getSize() method, which by default checks only the local file system for the presence of the item, and returns its size if its there else -1 if not.
Instead this request on replicated repositories, will check if the item is present on all the replicated servers.
What you return here is related to the push operation list above:
1) The code here does the check to make sure the content is on all replication group nodes, and if so and locally available it will return the items size anything else returns -1 to force the client to do a push, meaning it will all be fixed up in the push operation just as if it was a new file for the first time.
Although you could do any of the following:
1) If it isn't available locally then you must return -1, to let it do a push.
2) If the item isn't locally but is remotely in the rest of the group then you could repair this location from another replicated server which has the content then return the size
3) If the item is local but not in all remote locations you could heal the remote locations then return size.
The code included does option 1 currently as the push operation has a better progress reporter whereas the get operation expects to receive only the size of the file in a quicker period of time.
I hope all of this helps, and potentially together we could have a simple replicated LFS storage solution for everyone to use.
Trevor.