Does that apply for the remote cache as well? I'd like to fetch files myself out of the cache, so I'd need to know how to create the same hash. Would that just be a SHA-256 of the file contents alone? Or do I need to also hash a timestamp, "Sundar Pichai", and other things to get the correct hash?
With the caveat that, from what my testing shows, the active_env is /not/ consumed by all rules which means that the env variables aren't always included.
Hope this helps!
Robin
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/37e4f5ee-e1a4-4267-b94a-80a406165bef%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I thought that once Bazel builds something that it then hashes the resulting binary (or whatever) and then stores it the remote cache with that hash. Not the hashes of the inputs it used to create the binary. Is that wrong?
Either I had a fundamental misunderstanding of how this works, or my question was worded poorly..
I thought that once Bazel builds something that it then hashes the resulting binary (or whatever) and then stores it the remote cache with that hash. Not the hashes of the inputs it used to create the binary. Is that wrong?
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/74a678e6-d202-49de-9673-40f45a3286f5%40googlegroups.com.
Maybe that was wishful thinking on my part. I was hoping to write a FUSE that retrieved files (both source and binaries) via hash. I figured I could hijack the Bazel remote cache for the source files right long side the binaries. For that I was assuming the hash was a hash of the content itself.
Is there a way for me to get the hash from a given binary? That way I can remember that hash and retrieve the content later with it?
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/060a7f3f-9e93-4dab-94c0-5fbf18ecb397%40googlegroups.com.
So I was thinking that the FUSE will store a list of hashes and retrieve files from the remote cache by hash on demand. If somebody modifies a file, it would store a local copy of it. However, if somebody wrote a file and the hash for that file is already in the remote cache, I wanted to delete the local copy and just store that as a hash again (for example if they checkout another branch). That means that I will need to be able to hash the content (without knowing how it was created) and use http HEAD to see if that hash already exists. But if the hashes are totally different, then I can't do it it hat way.
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/98fcc7f1-decf-4643-a74e-b359b4ad9543%40googlegroups.com.
So I was thinking that the FUSE will store a list of hashes and retrieve files from the remote cache by hash on demand. If somebody modifies a file, it would store a local copy of it. However, if somebody wrote a file and the hash for that file is already in the remote cache, I wanted to delete the local copy and just store that as a hash again (for example if they checkout another branch). That means that I will need to be able to hash the content (without knowing how it was created) and use http HEAD to see if that hash already exists. But if the hashes are totally different, then I can't do it it hat way.
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/98fcc7f1-decf-4643-a74e-b359b4ad9543%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
Jakob Buchgraber
Software Engineer
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
I was thinking of mostly source files (I was hoping to use the cache for both source and binaries). That if a user did a checkout of one branch and then switch to another branch and so on, I didn't want the number of local files to necessarily increase for perpetuity and never get "reclaimed".
I should have said that it would reserve the right to delete the local copy. If I had no way to hash the content and compare to the cache then I would have no way of doing it ever. You are right it would be faster to keep files local that get accessed often.
I was thinking of mostly source files (I was hoping to use the cache for both source and binaries). That if a user did a checkout of one branch and then switch to another branch and so on, I didn't want the number of local files to necessarily increase for perpetuity and never get "reclaimed".
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/e06da9c0-598c-4a4e-8651-b75eefcb314c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I looked into GVFS and the problem with that is that it is currently only available for Windows, and my project does most of our development in Linux. Microsoft says that they will support Mac and then Linux at some point, but there is no timeframe for that.
I was thinking of using the remote cache to provide file content for our FUSE, but it sounds like you are encouraging me to not use remote caching at all and to FUSE instead. How would that work? Wouldn't Bazel, not knowing the files are on FUSE, access every file too see if what has been modified? If so, wouldn't that be slower then using a native FS?
Another problem I have is even if I were to get Bazel seamlessly integrated with my FUSE, there are other apps we use such as git. If it takes an hour for git to clone/checkout the repo (as it tries to write a gazillion files to my FUSE). That would defeat half the purpose of a mono-repository.
So I was thinking of writing the FUSE in a way where it tracks modified files and provides that info to anybody who wants it. And that any app, including Bazel and Git, could use that information to optimize their tasks (if they had some future --useFuse flag activated).
(BTW, I'm aware of inotify, but I'm not sure how resource intensive that is if events were registered for so many files. Maybe I should look into this more.)
The way I was thinking of doing it was exposing a virtual .fuse directory within my file system. Within that there would be a changeLog directory. Git would create an empty a "git.log" file in that directory after every commit and Bazel would create an empty a "bazel.log" file after every build (any app could do the same). Every time a file was modified, the FUSE would append the path to the modified file to every log file within that directory. Any app that wants can know what files changed since the time they created their file.
Ideally (maybe fantasyland) I would convince my company to allow me to open source the whole thing and a bunch of apps would adopt --useFuse flag and become "fuse-aware" on their own. But will probably happen is I would merely modify these apps myself and try to submit patches.
Obviously, before I start any if this, I need to make sure the idea is sound and not idiotic. I expected a FUSE implementation for this would exist already, but GVFS is the only one I can find, but we need Linux...
Hide quoted text
On Mar 13, 2018 2:06 AM, "Jakob Buchgraber" <buc...@google.com> wrote:
Hi Duane,
thanks for the details. Here's a link to how Google does that [1]. There's also [2] about which I have heard good things about. I know of Bazel users who implemented their own FUSE for remote caching. The big advantage there is when using a FUSE file system instead of remote caching via HTTP is that Bazel will only fetch what it needs lazily. We have been thinking about baking this lazy fetch behavior into Bazel's filesystem abstraction and I think this will happen in Bazel eventually as it can lead to huge performance gains for remote execution, but it's just not a priority at the moment.
I am happy to answer any specific questions about remote caching.
Best,
Jakob
[1] https://cacm.acm.org/magazines/2016/7/204032-why-google-stores-billions-of-lines-of-code-in-a-single-repository/fulltext
[2] https://github.com/Microsoft/GVFS
On Mon, Mar 12, 2018 at 8:37 PM Duane Knesek <duane....@gmail.com> wrote:
Okay... I appreciate your time...
In short, I'm investigating what it would take to move my team to a mono-repository. I was thinking of prototyping a FUSE implementation that allows us to work in a large working directory without using a ton of space. That most "files" would be virtual and only the ones in common use would be stored locally.
I wanted to hijack the remote cache to store all files (including source) not just binaries produced by Bazel. I also wanted my FUSE to provide info on what files have been changed and possibly modify git & Bazel to use that information to speed up commits, builds, and so on rather than read every file and hash them every time.
Is that enough context? I could provide more if you like.