This meets my wish to keep the original artifact name if possible. Only, if we
can not keep/gather the orginal artifact name (I fail to come up with a good
reason, though) I would use a syntetic one.
I proposed the following:
${artifacts}/group/<orginial-file-name>-<hash>.<orginial-file-type>
Of course, the group is artifact metadata specific. I would be ok with that,
but are open for better suggestions, that also maintains some structure a
developer can easily glance at. (Not sure if this is understandable English.)
| Hypothetical situation (I'm sure there are holes), but I hope it gets
| across the idea. I think tieing module-information to artifacts limits
| our ability to cache/re-use. I feel copying/symlinking "machine readable"
| files into "human-readable" files is the way to go.
Of course there are a lot of pitfalls one can run into with symlinks.
Equality, up-to-date-ness, same-filesystem-requirement. And falling back to
copying the artifacts is exactly the situation you want to avoid, right?
| Yes we are humans, and we need nicely formatted stuff. But that's a layer
| on-top of what the machine needs. The machine needs nice hashes to be
| effective. Let's not try to make the storage format sovle ALL The needs,
| let's add a layer of indirection so we can solve both "well". I.e. if we
| copy/symlink files into some local resolution cache for projects, then we
| can just outright DROP the HASH we use internally, and keep only
| human-readable info. When re-resolving things, we can just run SHA-diffs
| (they're pretty fast), or wipe out the project-local resolution cache and
| pull out of our artifact cache again.
|
|
|
| In any case, I think it's important to note this idea: Resolution should
| *NOT* be occuring during every freaking build. Resolution is something we
| do when asking for new dependencies or checking for new dependencies.
| While integration builds can opt-in to doing this all the time, it should
| not be an all the time thing.
Sidenote: SBuild always checks all dependencies for each target it has to run.
But "dependencies" refers here to any kind of dependency, also other targets.
If a dependency is handled by a scheme handler, which might be e.g. a
transitive dependency resolver like Adept or Aether or Ivy, then they will be
checked too. It is up to the configuration of the scheme handler if it caches
some decissions. Of course, SBuild has some mechanism to generically cache
results based on unchanged input, too. Until now, this all is pretty fast, and
magnitudes faster than e.g. firing the Scala compiler.
So, forcing a technically motivated huge cryptic artifacts directory of
thousands of files, one can not easily get rid of, besided of deleting all, is
somethings I want to question.
| - We want to pull metadata local to make resolving deps faster, but we
| don't really want to be hitting servers for metadata during our
| day-to-day dev (maybe 1x a day) or more frequently if doing integrations.
| - We want to give users the maximum flexibility in how they define
| modules, and the machine the optimal mechanism for caching
| artifacts/metadata
| - Builds should be 100% reproducible for given git reviisons. This
| means some aspects of version control may need to be commited in a repo,
| like the git SHA of the metadata used, or some kind of intermediate
| "these are the artifacts we're using, based on this dependnecy
| requirements list" file.
| - Making something efficient for a machine makes it inefficient for a
| human. We should optimize the core for the machine, and put a
| "porcelain" layer on top for humans that is so easy to use, I don't want
| to kick my repository server every day. However, the machine should be
| able to do the tricksy stuff, like parallel downloads, avoiding cache
| corruption, etc.
I think we agree if terms of always reproducable builds that avoid unnecessary
work. I'm not convinced, that we should optimize for the machine in this case.
I think, that a clear stucture a human understands will also be
programmatically processed in a timely fassion. After all, the artifact
download directory is only the output of the dependency resolver. Besides
checking for up-to-date-ness it should not impact the resolution process in
any way.
If you are talking about a repository server, I agree, such a system should
work with a technically motivated directory/file structure. (In my vision of a
good dependency management system, a repository server is not needed at all.)
| So yeah, interested to hear feedback. I could be a grumpy-old-man here (or
| just avoiding anything that reminds me of the maven repo format). In any
| case, a lot of repository tool folks I talk with take a similar approach.
| Artifacts are stored 100% by SHA/hash, and you reconstitute a friendly
| name from the metadata on demand.
Again: We don't discuss the repository format here, right? We talk about the
directory, where downloaded artifacts will go to.
If I understand correctly, you can not reconstitute a friendly name for such a
artifacts directory you just found in your home directory, without knowing the
metadata repository(s) that was used to produce/fetch all these files. And even
then, you need to fire up some tool to get some information. And remember that
awkward moment when you again need that specific oracle driver that is fetch
protected. You know, you have it in your artifact dir somewhere. I, for me, am
very bad at guessing hashes and am probably faster when I redownload it from
the website.
In a more human friendly artifacts directory, I can easily find a specific
artifact. Also I can glance through all these groups I no longer use since a
half year and wipe them out with a simple "rm -r".