Hi repo-discuss, & Luca specifically,Luca, during today's GerritMeets, I asked about Gerrit LFS future support, and you countered with other approaches to managing the impact of large files in git, mentioning git-sparse-checkout [1] & git-refs-filter [2] as possible tools.Could you elaborate further on the topic, perhaps giving an example sequence from a developer's perspective, for limiting the impact of one commit's large files on the workspaces of the rest of the team?I'm also interested in the community's experience with the above tools.
In particular, I'm scared by this all-caps message in the "git-sparse-checkout" manpage:"THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE."To explain where I'm coming from, my team's main code repo is 6.6 GB (.git only), going back about 9 years [3], and each year we add about 5 Data submodules between 2 GB - 28 GB, for ~50 GB / year, currently totaling 220 GB. These Data submodules are each relevant for about 18 months, after which they are all but archived.I am very reluctant to let all that "short-lived" data be stored in our main repo and forever pay the clone and storage cost for each developer. That said, submodule usage is full of sharp edges that my developers keep cutting themselves on [4], so I'm actively looking for alternatives. In particular, I value simple setups that are close to those widely used in the outside world, and I'm wary of custom scripts or configs that each developer must apply.[3] only 9 years because the team previously had the habit of cutting off and restarting the repo every year specifically to limit the size impact of history![4] just last week I had to fix a broken submodule reference, when a junior developer updated in Gerrit the commit message of a submodule commit, but didn't know to update the main repo commit to refer to the updated submodule commit! "oh I didn't know changing the commit message changed the SHA!"
--
--
To unsubscribe, email repo-discuss...@googlegroups.com
More info at http://groups.google.com/group/repo-discuss?hl=en
---
You received this message because you are subscribed to the Google Groups "Repo and Gerrit Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to repo-discuss...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/repo-discuss/9b5393b3-ae51-4970-b302-590bf9011e31n%40googlegroups.com.
On Thu, Feb 20, 2025 at 12:59 AM John de Largentaye <jlarg...@gmail.com> wrote:Hi repo-discuss, & Luca specifically,Luca, during today's GerritMeets, I asked about Gerrit LFS future support, and you countered with other approaches to managing the impact of large files in git, mentioning git-sparse-checkout [1] & git-refs-filter [2] as possible tools.Could you elaborate further on the topic, perhaps giving an example sequence from a developer's perspective, for limiting the impact of one commit's large files on the workspaces of the rest of the team?I'm also interested in the community's experience with the above tools.We developed both LFS support in JGit and the first version of the Gerrit lfs plugin using filesystem or AWS S3 for storing
the large files on the server side. After some trial phase piloting LFS usage at SAP we decided to not go productive with LFSsince we think it adds quite some complexity and risk to run it large-scale in production. In addition replacing the actual blobsby placeholder files in git makes the decision which objects are stored where non-transparent which is pretty intrusivesince reconsidering this decision requires rewriting history.And LFS doesn't have built-in transport support for the large files which meansit cripples a distributed versioning system to a centralized versioning system.And as you can see in the GitHub pricelist storing many large binary files also doesn't come for free.We taught our users to not store large binary files in git/gerrit and implementedand the uploadvalidator pluginto help them blocking large binary files.
In particular, I'm scared by this all-caps message in the "git-sparse-checkout" manpage:"THIS COMMAND IS EXPERIMENTAL. ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN THE FUTURE."To explain where I'm coming from, my team's main code repo is 6.6 GB (.git only), going back about 9 years [3], and each year we add about 5 Data submodules between 2 GB - 28 GB, for ~50 GB / year, currently totaling 220 GB. These Data submodules are each relevant for about 18 months, after which they are all but archived.I am very reluctant to let all that "short-lived" data be stored in our main repo and forever pay the clone and storage cost for each developer. That said, submodule usage is full of sharp edges that my developers keep cutting themselves on [4], so I'm actively looking for alternatives. In particular, I value simple setups that are close to those widely used in the outside world, and I'm wary of custom scripts or configs that each developer must apply.[3] only 9 years because the team previously had the habit of cutting off and restarting the repo every year specifically to limit the size impact of history![4] just last week I had to fix a broken submodule reference, when a junior developer updated in Gerrit the commit message of a submodule commit, but didn't know to update the main repo commit to refer to the updated submodule commit! "oh I didn't know changing the commit message changed the SHA!"In Gerrit you can configure a submodule subscription to automate the superproject automatically when the corresponding submodule branch is updated.Maybe this helps. See https://gerrit-review.googlesource.com/Documentation/user-submodules.html