operating on the contents of .runfiles from within a skylark rule?

1,154 views
Skip to first unread message

ms...@dropbox.com

unread,
Jan 10, 2016, 9:01:16 PM1/10/16
to bazel-discuss
Hi all,

Similar questions have been asked about referencing the contents of runfiles, but I am having a bit of trouble figuring out the best way to handle the problem.

I've defined a set of skylark rules that create python binaries (the default py_* rules aren't hermetic enough for my use cases).

The issue is that I need to run over the finished .runfiles directory to perform a link-like step to process the python source code and any libraries.

I don't see a way of expressing that I have a dependency on the runfiles being materialized before running a final step.

I thought a way of handling this might be to create two rules - one to collect the runfiles and then only to modify them in place. At a high level, generating rules like this:

py_collect_runfiles(
  name = "my_runfiles",
  srcs ...
  deps ...
)

py_link_runfiles(
  name = "my_bin",
  deps = [":my_runfiles"].
)

Unfortnately, when I do "bazel build :my_bin" there is just an empty "my_runfiles.runfiles" directory and the items that should be in my_runfiles.runfiles are actually in my_bin.runfiles. The seems like a bug in Bazel, but perhaps I've botched something in how I reference the runfiles provider.

I want to avoid writing a wrapper script  because I'd like this to be composable. I'd like to be able to pass this to a pkg_* rule and make a function .deb of tarball.

Suggestions welcome.

Thanks,
-Mike


Brian Silverman

unread,
Jan 10, 2016, 11:33:21 PM1/10/16
to ms...@dropbox.com, bazel-discuss
Hi Mike,

You can get at the runfiles using something like "[dep.runfiles for dep in ctx.attr.deps]". Once you have the files you want, you can list them as inputs to an action to do your processing with them.

If you're getting the files from your own rules anyways, it might make more sense to collect up the source files in a more convenient form using transitive info providers in the individual rules instead. For example, the builtin Python rules have "transitive_sources".

To get the output files into the pkg_* rules, you'll want to list the output files from your action(s) in the "files" member of the struct returned by the implementation function of your rule. This will cause any rules which expect files as inputs to use those files when you list your rules as inputs (ie in "srcs"). That is also the list of files Bazel will print paths to when you do `bazel build //:my_target` (subject to --show_result).

Brian

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/a89255e1-9929-4217-a3ce-3c2c211daca6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ms...@dropbox.com

unread,
Jan 11, 2016, 12:27:40 AM1/11/16
to bazel-discuss, ms...@dropbox.com
Thanks Brian. I am actually already doing what you suggest. I haven't made it an explicit provider because thus far iterating runfiles was enough.

The issue is that traversing runfiles gives me something like this:

Artifact:[[/home/.../.cache/bazel/_bazel_.../.../server]bazel-out/local_linux-fastbuild/genfiles]pip/six.pip.zip,


Artifact:[/home/.../src/server[source]]build_tools/vpip/import_check.py, 


What I need to be able to do is unzip six.pip.zip from genfiles into the runfile of my target. 


I can find the runfiles directory like this:


bin_runfiles = ctx.configuration.bin_dir.path + '/' + ctx.outpute.executable.short_path + '.runfiles'


Unfortunately, even if my rule populates the runfiles directory, it does not work deterministically. I suspect this is down to the fact that the outputs I specify for the action are partial (I can't easily inform bazel of the total zip manifest).


Basically, I want the target to be fully constructed with a complete instance of the runfiles directory before my final of the rule gets called.


-Mike

Brian Silverman

unread,
Jan 11, 2016, 3:22:25 AM1/11/16
to ms...@dropbox.com, bazel-discuss
Is the part you're missing how to set the runfiles of your custom rule? Returning struct(runfiles = ctx.runfiles(...)) from the implementation function does that. Not doing that will definitely cause problems with determinism.

However, you can't put all the files from a zip file in the runfiles if you don't know all the names at rule evaluation time. Bazel doesn't really support that... Could you write a rule to extract the runfiles and create a tarball to pass to the pkg_* rules directly? I'm pretty sure the pkg_* rules ignore actual runfiles anyways.

Bazel only constructs runfiles trees before running targets, which means there's no good way to force it to construct the tree so something else can operate on it.

If you want, constructing a fake runfiles tree in a temporary directory isn't very hard. file.short_path gives you the path a file will be copied to under the runfiles directory, so you can just copy all the files to their corresponding short_path under another directory.

Han-Wen Nienhuys

unread,
Jan 11, 2016, 8:48:27 AM1/11/16
to Mike Solomon, bazel-discuss
On Mon, Jan 11, 2016 at 3:01 AM, <ms...@dropbox.com> wrote:
> I've defined a set of skylark rules that create python binaries (the default
> py_* rules aren't hermetic enough for my use cases).

interesting; in what way?


--
Han-Wen Nienhuys
Google Munich
han...@google.com

Mike Solomon

unread,
Jan 11, 2016, 1:00:40 PM1/11/16
to Han-Wen Nienhuys, bazel-discuss
On Mon, Jan 11, 2016 at 5:48 AM, Han-Wen Nienhuys <han...@google.com> wrote:
On Mon, Jan 11, 2016 at 3:01 AM,  <ms...@dropbox.com> wrote:
> I've defined a set of skylark rules that create python binaries (the default
> py_* rules aren't hermetic enough for my use cases).

interesting; in what way?

They use the system interpreter and don't disable path manipulation. You need to start python with -Ss and preciesly control the PYTHONPATH that is passed in.

Also, since py_library can only depend on other py_library targets, you end up having to pass a lot of stuff as "data" which isn't ideal.

Mike Solomon

unread,
Jan 11, 2016, 1:33:26 PM1/11/16
to Brian Silverman, bazel-discuss
On Mon, Jan 11, 2016 at 12:21 AM, Brian Silverman <bsilve...@gmail.com> wrote:
Is the part you're missing how to set the runfiles of your custom rule? Returning struct(runfiles = ctx.runfiles(...)) from the implementation function does that. Not doing that will definitely cause problems with determinism.

I have that part working correctly.
 

However, you can't put all the files from a zip file in the runfiles if you don't know all the names at rule evaluation time. Bazel doesn't really support that... Could you write a rule to extract the runfiles and create a tarball to pass to the pkg_* rules directly? I'm pretty sure the pkg_* rules ignore actual runfiles anyways.

This is the crux of the problem.  I could probably do as you suggest, but then the issue is that I have two workflows to maintain and debug. One is a pkg_* release and the other is the use case where someone is working incrementally and wants to rely on the symlinking of runfiles.


Bazel only constructs runfiles trees before running targets, which means there's no good way to force it to construct the tree so something else can operate on it.

I don't quite understand this part. When I type "bazel build //my_bin" does that "run the target"?  What about the target is referenced as a direct depenency? Does build //my_bin then also run //my_runfiles? If so, shouldn't there be a before-after relationship here I can leverage?

 Is there a bad way to force this?
 
If you want, constructing a fake runfiles tree in a temporary directory isn't very hard. file.short_path gives you the path a file will be copied to under the runfiles directory, so you can just copy all the files to their corresponding short_path under another directory.

The wrinkle is that I need this type of target to composable so that more complex targets can rely on these binary to run correctly - for instance in an integration test scenario. I guess I get your point. I can could move my output to something like:

my_bin
my_bin.runfiles
my_bin.Xrunfiles

and manually manage the contents of .Xrunfiles. I don't relish this option, but it might work.

Brian Silverman

unread,
Jan 11, 2016, 5:46:44 PM1/11/16
to Mike Solomon, bazel-discuss
On Mon, Jan 11, 2016 at 10:33 AM, Mike Solomon <ms...@dropbox.com> wrote:


On Mon, Jan 11, 2016 at 12:21 AM, Brian Silverman <bsilve...@gmail.com> wrote:
Is the part you're missing how to set the runfiles of your custom rule? Returning struct(runfiles = ctx.runfiles(...)) from the implementation function does that. Not doing that will definitely cause problems with determinism.

I have that part working correctly.
 

However, you can't put all the files from a zip file in the runfiles if you don't know all the names at rule evaluation time. Bazel doesn't really support that... Could you write a rule to extract the runfiles and create a tarball to pass to the pkg_* rules directly? I'm pretty sure the pkg_* rules ignore actual runfiles anyways.

This is the crux of the problem.  I could probably do as you suggest, but then the issue is that I have two workflows to maintain and debug. One is a pkg_* release and the other is the use case where someone is working incrementally and wants to rely on the symlinking of runfiles.

It's not very different. They both use the set of runfiles Bazel keeps track of. The only difference is normally Bazel does the symlinking, but for your release you symlink/copy/etc the files into place yourself.

Also, where do you want the symlinks to point for your pkg_* release? Are you planning to install everything into the same absolute path it was built under? If not, you need something to edit the symlinks anyways because when Bazel builds a .runfiles tree it uses absolute symlinks.


Bazel only constructs runfiles trees before running targets, which means there's no good way to force it to construct the tree so something else can operate on it.

I don't quite understand this part. When I type "bazel build //my_bin" does that "run the target"?  What about the target is referenced as a direct depenency? Does build //my_bin then also run //my_runfiles? If so, shouldn't there be a before-after relationship here I can leverage?

True, `bazel build` also creates the runfiles tree (forgot about that). Most targets (like *_library and *_binary) collect runfiles from their direct dependencies and also support adding additional runfiles through "data" attributes. However, Bazel only creates the actual runfiles tree at certain times, and I haven't found a good way to force it to do so in the middle.

 Is there a bad way to force this?

The only one I can think of is to tell Bazel you're going to run the target (ie pass it in as a tool or something). However, that will cause issues with host vs target configurations, and it won't work for *_library targets.

Andrew Allen

unread,
Jan 13, 2016, 10:40:05 PM1/13/16
to Mike Solomon, mittal...@gmail.com, bazel-discuss, Brian Silverman
Mike,

Have you seen the change from Sailesh Mittal (I think he is at Twitter) that adds PEX support to Bazel. In case you're not familiar with PEX, it is a way to produce a deployable "Python EXecutable" like a deploy jar for python. It's a big change and Damien Martin-Guillerez's comment is on there about it probably being better to use the native rules, but it might also be a place you can work from. Reading a bit between the lines, it sounds like you really just want a deployable version of your python program and this might be another path to go down.

Just my unsolicited $0.02, but having a chat with some of these people might be a good way to achieve your goal (unless I totally misunderstand things).


/** ~Andrew Z Allen */

Mike Solomon

unread,
Jan 13, 2016, 10:45:59 PM1/13/16
to Andrew Allen, mittal...@gmail.com, bazel-discuss, Brian Silverman
Yup, I'm familiar with most of that. I've had a version of a PEX/PAR
linker for a long time. The main issue is wanting the workflow to be
identical for PEX/PAR and incremental dev when reading out of
.runfiles.

I ended up restructuring my rules a bit and I'm handling higher level
packaging outside of Bazel since the built in pkg_tar/pkg_deb rules
also seem to suffer from the inability to capture the contents of the
.runfiles.

This is workable for now, but I'll keep my eye on how that shakes out.

Ming Zhao

unread,
Jan 14, 2016, 1:42:49 AM1/14/16
to Mike Solomon, Andrew Allen, mittal...@gmail.com, bazel-discuss, Brian Silverman
Mike,

If you're just looking for par + bazel. I have integrated your plink
into native bazel py target here:
https://github.com/mzhaom/bazel/commit/98501323a3a625527bf281bcacadf69fcbef8f5c

Though it was rejected when I first attempted to send it for review,
so we simply maintain it in our own local branch.

Best,
Ming
> To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CA%2BdNya1oTHn%3DUPUecqkSO-g97vV3CktQKDtthujN95AaJO21pg%40mail.gmail.com.

Mike Solomon

unread,
Jan 14, 2016, 1:46:19 AM1/14/16
to Ming Zhao, Andrew Allen, mittal...@gmail.com, bazel-discuss, Brian Silverman
Thanks Ming, that's definitely tempting.
Reply all
Reply to author
Forward
0 new messages