Jakob Buchgraber
Software Engineer
![]()
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAJ5fxHJ5M3b8xmbB8PRLxCN9feMxg%3DdpCp_mmQeGGpa-GU1VfQ%40mail.gmail.com.
Jakob Buchgraber
Software Engineer
![]()
Google Germany GmbH
Erika-Mann-Straße 33
80636 München
Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
Diese E-Mail ist vertraulich. Falls sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-...@googlegroups.com.
This isn't supported well. For example, genrule(..., srcs=["subdir/"])
won't traverse "subdir/", nor will it stage the contents in the
sandbox. Bazel won't keep track of dependencies on the files under
"subdir/", so modifying the directory's contents won't trigger
rebuilds. Bazel warns about this:
WARNING: [redacted]/BUILD:4:5: input 'venv' to //:compile-46.compiled
is a directory; dependency checking of directories is unsound
Your example BUILD file does the same: genrule(..., srcs=["venv"]).
Are you sure it works correctly?
I'm wondering what are the inputs of the Merkle tree computation.
Could you attach a debugger here (assuming you run Bazel 2.1.0):
https://github.com/bazelbuild/bazel/blob/2.1.0/src/main/java/com/google/devtools/build/lib/remote/merkletree/MerkleTree.java#L122
Additionally to what Laszlo wrote consider taking a JSON profile of the build [1]. The merkle tree building will be in there as a separate node. It's quite possible that the merkle tree building is the problem. As Laszlo wrote it's strongly recommended to not specify directories but use a glob() instead. The way the MerkleTree building works with directory inputs is that it has to recursively traverse the directory by actually calling readdir(). Besides the correctness concerns this won't be cached between actions and so every action will run the readdir() and hashing. If you use a glob() you will have to pay this only once at the beginning.
Chris, from your example it seems that you generate 5000 copies of the rule with each copy depending on 50000 source files. I wonder if you could introduce a single intermediate rule (py_library comes to mind) which would depend on your 50000 input files, but then 5000 other rules would only depend on a single intermediate rule. That intermediate rule may play the role of the sentinel you were considering. Checksuming and symlinking of 50000 files is going to stay this way, but I hope it would only be done once instead of 5000 times.Konstantin
--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/559a8ea7-4748-4ce5-9068-ed15f31e22c8%40googlegroups.com.
For the slow symlinking problem, try:
--experimental_sandbox_base=/dev/shm
This puts the symlinks in a tmpfs, which is significantly faster.
Austin
On Tue, Feb 25, 2020 at 1:49 PM 'Chris Kuehl' via bazel-discuss
<bazel-...@googlegroups.com> wrote:
>
> Hey Konstantin,
>
> You're right -- if I group all of these 50k input files into one rule (e.g. via filegroup) and reference that instead, it is indeed significantly faster during the loading (?) phase. This is really helpful, thanks for the tip!
>
> Unfortunately the symlinking required to set up a separate sandbox for each of the 5k rules still ends up being prohibitively slow for my use-case (as far as I can tell, each sandbox is only used once).
>
> On Sat, Feb 22, 2020 at 6:22 AM Konstantin Erman <kon...@ermank.com> wrote:
>>
>> Chris, from your example it seems that you generate 5000 copies of the rule with each copy depending on 50000 source files. I wonder if you could introduce a single intermediate rule (py_library comes to mind) which would depend on your 50000 input files, but then 5000 other rules would only depend on a single intermediate rule. That intermediate rule may play the role of the sentinel you were considering. Checksuming and symlinking of 50000 files is going to stay this way, but I hope it would only be done once instead of 5000 times.
>>
>> Konstantin
>>
>> --
>> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-...@googlegroups.com.
>> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/559a8ea7-4748-4ce5-9068-ed15f31e22c8%40googlegroups.com.
>
> --
> You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to bazel-...@googlegroups.com.