Bucklets: How to share and reuse BUCK recipes

332 views
Skip to first unread message

david.o...@gmail.com

unread,
Feb 21, 2014, 7:35:34 AM2/21/14
to buck-...@googlegroups.com

When a build process for a non trivial project is implemented in Buck, a lot of custom
scripts/recipes are created. It turns out that substantial part of these scripts/recipes is
very useful and urgently needed for other projects.

Currently there is no built in way in Buck to share/reuse scripts/recipes between projects.
One option would be to use bucklets: number of build.defs and python_scripts files.

Project foo defines for example bucklet local_jar() [1] and publishes it with deploy_bucklet('local_jar')

Project bar (unrelated to project foo) reuses that bucklet with: import_bucklet('local_jar')

Question: how can deploy_bucklet() and import_bucklet() be implemented?

1. upload/download it to a common bucklets-central repository
2. use git submodule(s) to push/pull it into own repository

Other options?

[1]

#
# If a dependent library is undergoing active development it must be
# recompiled and the change must be reflected in the Buck build process. For
# example testing Gerrit against changed JGit snapshot version. After building
# JGit library, the artifacts are created in local Maven build directory. 
#
# To shorten that workflow and take the installation of the artifacts to the
# local Maven repository and fetching it again from there out of the picture,
# `local_jar()` method is used:
#
# local_jar(
#   name = 'jgit',
#   jar = '/home/<user>/projects/jgit/org.eclipse.jgit/target/org.eclipse.jgit-3.3.0-SNAPSHOT.jar',
#   src = '/home/<user>/projects/jgit/org.eclipse.jgit/target/org.eclipse.jgit-3.3.0-SNAPSHOT-sources.jar',
#   deps = [':ewah']
# )

def local_jar(
    name,
    jar,
    src = None,
    deps = [],
    visibility = ['PUBLIC']):
  binjar = name + '.jar'
  srcjar = name + '-src.jar'
  genrule(
    name = name + '__local_bin',
    cmd = 'ln -s %s $OUT' % jar,
    out = binjar)
  if src:
    genrule(
      name = name + '__local_src',
      cmd = 'ln -s %s $OUT' % src,
      out = srcjar)
    prebuilt_jar(
      name = name + '_src',
      deps = [':' + name + '__local_src'],
      binary_jar = genfile(srcjar),
      visibility = visibility,
    )
  else:
    srcjar = None

  prebuilt_jar(
    name = name,
    deps = deps + [':' + name + '__local_bin'],
    binary_jar = genfile(binjar),
    source_jar = genfile(srcjar) if srcjar else None,
    visibility = visibility,
  )

david.o...@gmail.com

unread,
Feb 23, 2014, 5:19:00 PM2/23/14
to buck-...@googlegroups.com

Am Freitag, 21. Februar 2014 13:35:34 UTC+1 schrieb david.o...@gmail.com:

When a build process for a non trivial project is implemented in Buck, a lot of custom
scripts/recipes are created. It turns out that substantial part of these scripts/recipes is
very useful and urgently needed for other projects.

Currently there is no built in way in Buck to share/reuse scripts/recipes between projects.
One option would be to use bucklets: number of build.defs and python_scripts files.

Project foo defines for example bucklet local_jar() [1] and publishes it with deploy_bucklet('local_jar')

Project bar (unrelated to project foo) reuses that bucklet with: import_bucklet('local_jar')

Question: how can deploy_bucklet() and import_bucklet() be implemented?

1. upload/download it to a common bucklets-central repository
2. use git submodule(s) to push/pull it into own repository


Initial Bucklets repository was set up [1] and Gitiles project was migrated from Maven to Buck
on top of it [2].


Simon Stewart

unread,
Aug 13, 2014, 9:38:58 AM8/13/14
to david.o...@gmail.com, buck-build
Hi,

This post was pointed out to me recently. Sorry for not replying before.

One of the options we've discussed on the team before has been the addition of a "buck fetch" command that would walk the Target Graph and download "stuff" identified by "remote_artifact" targets to the local disk. This initial came up as a mechanism for doing something similar to what shaven-maven[1] does. We'd include an md5 as part of the rule too.

When "buck build" was run, you'd be able to treat a "remote_artifact" in much the same way you'd treat "export_file".

The important thing here is that "remote_artifact" would fail the build if "buck fetch" hadn't already grabbed the file. We want your builds to be as fast as possible, and randomly pausing to meander across the Internet for files doesn't seem to really fit with that goal.

Assuming we had that feature, "import_bucklet" becomes relatively easy to implement (provided the build files containing those rules didn't themselves depend on bucklets for obvious reasons). "deploy_bucklet" also ends up being lightweight. You just upload them to whereever you think is best. If they were hosted in a git repo on Google Code (for example), you'd get a versioned URL per artifact, which would seem ideal.

What are your thoughts?

Regards,

Simon



--
You received this message because you are subscribed to the Google Groups "Buck" group.
To unsubscribe from this group and stop receiving emails from it, send an email to buck-build+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Shawn Pearce

unread,
Aug 13, 2014, 10:58:10 AM8/13/14
to Simon Stewart, David Ostrovsky, buck-build
On Wed, Aug 13, 2014 at 6:38 AM, Simon Stewart <simon.m...@gmail.com> wrote:
Hi,

This post was pointed out to me recently. Sorry for not replying before.

One of the options we've discussed on the team before has been the addition of a "buck fetch" command that would walk the Target Graph and download "stuff" identified by "remote_artifact" targets to the local disk. This initial came up as a mechanism for doing something similar to what shaven-maven[1] does. We'd include an md5 as part of the rule too.

Use SHA1 not MD5. Most sites like search.maven.org export the SHA1 right now.

If buck wants md5 shape internally, rebuild the hash yourself after you have fetched the payload, or (*shudder*) hash the SHA1 using MD5...

When "buck build" was run, you'd be able to treat a "remote_artifact" in much the same way you'd treat "export_file".

Nice, but a super common way to use these is to have them be a prebuilt_jar(). An easy way to also say this is a JAR would be great:

  remote_artifact(
    name = "guava",
    mvn = "com.google.guava:guava:jar:18.0-rc1",
    sha1 = "9bd0d5bc8a4269bb2b5584d5498e281633c677eb",
    export_jar = True,
)

Or some such, with remote_artifact accepting either url or mvn to identify the source on the network and export_jar being optionally set to export the result as a prebuilt_jar(), without having to write another prebuilt_jar() target. 

The important thing here is that "remote_artifact" would fail the build if "buck fetch" hadn't already grabbed the file. We want your builds to be as fast as possible, and randomly pausing to meander across the Internet for files doesn't seem to really fit with that goal.

That would be annoying. Every time the project rolls its deps you would need to:

  git pull
  if ! buck build; then buck fetch && buck build; fi

arrgh. That is annoying.

Gerrit has been working with Buck auto-downloading missing remote artifacts during the build ever since our conversion to buck. It works as well as buck caching does pulling prebuilt rules from a cache server. So long as you are well connected and the work is done in parallel, its fast.

Maven remote resources suck because Maven searches all over the Internet for something at many different URLs, and it does so without any threading. Eliminate the searching part by forcing the remote_artifact() to say exactly one URL to attempt, and fix the parallelization by using Buck's native rule parallelization.


For more options, visit https://groups.google.com/d/optout.

Simon Stewart

unread,
Aug 14, 2014, 10:21:13 AM8/14/14
to Shawn Pearce, David Ostrovsky, buck-build
Inline

On Wed, Aug 13, 2014 at 3:57 PM, Shawn Pearce <s...@google.com> wrote:
On Wed, Aug 13, 2014 at 6:38 AM, Simon Stewart <simon.m...@gmail.com> wrote:
Hi,

This post was pointed out to me recently. Sorry for not replying before.

One of the options we've discussed on the team before has been the addition of a "buck fetch" command that would walk the Target Graph and download "stuff" identified by "remote_artifact" targets to the local disk. This initial came up as a mechanism for doing something similar to what shaven-maven[1] does. We'd include an md5 as part of the rule too.

Use SHA1 not MD5. Most sites like search.maven.org export the SHA1 right now.

Even better. We use SHA1 throughout, but the last time I was trying to download random stuff of the Net, MD5 appeared to be the preferred way of doing things. Glad that's changed :)
 
If buck wants md5 shape internally, rebuild the hash yourself after you have fetched the payload, or (*shudder*) hash the SHA1 using MD5...

Ha. Let's not do that.
 
When "buck build" was run, you'd be able to treat a "remote_artifact" in much the same way you'd treat "export_file".

Nice, but a super common way to use these is to have them be a prebuilt_jar(). An easy way to also say this is a JAR would be great:

  remote_artifact(
    name = "guava",
    mvn = "com.google.guava:guava:jar:18.0-rc1",
    sha1 = "9bd0d5bc8a4269bb2b5584d5498e281633c677eb",
    export_jar = True,
)

Or some such, with remote_artifact accepting either url or mvn to identify the source on the network and export_jar being optionally set to export the result as a prebuilt_jar(), without having to write another prebuilt_jar() target. 

I can imagine each language having something similar (python would be the obvious one, but anything that has a centralized distribution point for grabbing packages or their equivalents would run into the same issue. Ruby? NodeJS?) It may be easier, if we were to do this, to allow a "remote_jar" target or perhaps add a "url" and "sha1" field to prebuilt_jar.
 
The important thing here is that "remote_artifact" would fail the build if "buck fetch" hadn't already grabbed the file. We want your builds to be as fast as possible, and randomly pausing to meander across the Internet for files doesn't seem to really fit with that goal.

That would be annoying. Every time the project rolls its deps you would need to:

  git pull
  if ! buck build; then buck fetch && buck build; fi

arrgh. That is annoying.

I guess I do more disconnected work than most. My general habit before a long flight is to do a git pull and nothing else. A separate "fetch" command would mean that I'd be able to build anything in the target graph, not just what I happened to touch with a build command that may have failed.

My experience with central points holding deps hosted by a company is that central server has an uptime that can be (generously speaking) erratic. I'd much rather download things ahead of time rather than mid-build in that case. Maybe things have improved since the last time I was in The Real World? That way, a CI server can highlight that the CI target's failure was caused by third party infra choking, and not because someone has checked in something that's causing "buck build" to choke.

A possible compromise would be to have both options, and allow mid-build downloads to be a failure state via a config option.
 
Gerrit has been working with Buck auto-downloading missing remote artifacts during the build ever since our conversion to buck. It works as well as buck caching does pulling prebuilt rules from a cache server. So long as you are well connected and the work is done in parallel, its fast.

Interesting data point.
 
Maven remote resources suck because Maven searches all over the Internet for something at many different URLs, and it does so without any threading. Eliminate the searching part by forcing the remote_artifact() to say exactly one URL to attempt, and fix the parallelization by using Buck's native rule parallelization.

That's what shaven-maven does. Transitive dependency resolution and only specifying one URL might help.
 
Simon

Shawn Pearce

unread,
Aug 14, 2014, 11:07:50 AM8/14/14
to Simon Stewart, David Ostrovsky, buck-build
On Thu, Aug 14, 2014 at 7:21 AM, Simon Stewart <simon.m...@gmail.com> wrote:
On Wed, Aug 13, 2014 at 3:57 PM, Shawn Pearce <s...@google.com> wrote:
On Wed, Aug 13, 2014 at 6:38 AM, Simon Stewart <simon.m...@gmail.com> wrote:
Hi,

This post was pointed out to me recently. Sorry for not replying before.

One of the options we've discussed on the team before has been the addition of a "buck fetch" command that would walk the Target Graph and download "stuff" identified by "remote_artifact" targets to the local disk. This initial came up as a mechanism for doing something similar to what shaven-maven[1] does. We'd include an md5 as part of the rule too.

Use SHA1 not MD5. Most sites like search.maven.org export the SHA1 right now.

Even better. We use SHA1 throughout, but the last time I was trying to download random stuff of the Net, MD5 appeared to be the preferred way of doing things. Glad that's changed :)
 
If buck wants md5 shape internally, rebuild the hash yourself after you have fetched the payload, or (*shudder*) hash the SHA1 using MD5...

Ha. Let's not do that.
 
When "buck build" was run, you'd be able to treat a "remote_artifact" in much the same way you'd treat "export_file".

Nice, but a super common way to use these is to have them be a prebuilt_jar(). An easy way to also say this is a JAR would be great:

  remote_artifact(
    name = "guava",
    mvn = "com.google.guava:guava:jar:18.0-rc1",
    sha1 = "9bd0d5bc8a4269bb2b5584d5498e281633c677eb",
    export_jar = True,
)

Or some such, with remote_artifact accepting either url or mvn to identify the source on the network and export_jar being optionally set to export the result as a prebuilt_jar(), without having to write another prebuilt_jar() target. 

I can imagine each language having something similar (python would be the obvious one, but anything that has a centralized distribution point for grabbing packages or their equivalents would run into the same issue. Ruby? NodeJS?) It may be easier, if we were to do this, to allow a "remote_jar" target or perhaps add a "url" and "sha1" field to prebuilt_jar.

Agreed, it would be handy on prebuilt_jar(), but we also download .zip of JS so it would be handy to also have a generic remote_artifact() rule.
 
The important thing here is that "remote_artifact" would fail the build if "buck fetch" hadn't already grabbed the file. We want your builds to be as fast as possible, and randomly pausing to meander across the Internet for files doesn't seem to really fit with that goal.

That would be annoying. Every time the project rolls its deps you would need to:

  git pull
  if ! buck build; then buck fetch && buck build; fi

arrgh. That is annoying.

I guess I do more disconnected work than most. My general habit before a long flight is to do a git pull and nothing else. A separate "fetch" command would mean that I'd be able to build anything in the target graph, not just what I happened to touch with a build command that may have failed.

I am not arguing against a fetch command. For all the reasons you state a fetch command is useful; `git pull && buck fetch` *slam lid board plane* is super useful workflow.

I was trying to suggest that we allow incomplete or missing fetches to run during the build itself, so the user doesn't always have to run fetch first.

My experience with central points holding deps hosted by a company is that central server has an uptime that can be (generously speaking) erratic. I'd much rather download things ahead of time rather than mid-build in that case. Maybe things have improved since the last time I was in The Real World? That way, a CI server can highlight that the CI target's failure was caused by third party infra choking, and not because someone has checked in something that's causing "buck build" to choke.

A possible compromise would be to have both options, and allow mid-build downloads to be a failure state via a config option.
 
Gerrit has been working with Buck auto-downloading missing remote artifacts during the build ever since our conversion to buck. It works as well as buck caching does pulling prebuilt rules from a cache server. So long as you are well connected and the work is done in parallel, its fast.

Interesting data point.
 
Maven remote resources suck because Maven searches all over the Internet for something at many different URLs, and it does so without any threading. Eliminate the searching part by forcing the remote_artifact() to say exactly one URL to attempt, and fix the parallelization by using Buck's native rule parallelization.

That's what shaven-maven does. Transitive dependency resolution and only specifying one URL might help.

We don't do transitive dependency resolution in Gerrit. We chase the dep graph and rewrite it in BUCK files. This improves parallelism significantly during download because you don't have to wait to load a *.pom before you can start loading a library's deps.

If you want to support auto-discovery of Maven transitive dependencies, I would suggest doing this as a subcommand that spits out BUCK build rules you append to a build file. Don't do magically in the build.

Simon Stewart

unread,
Aug 14, 2014, 11:55:39 AM8/14/14
to Shawn Pearce, David Ostrovsky, buck-build
Pruning this a little, like we did back in the 90s. :)

On Thu, Aug 14, 2014 at 4:07 PM, Shawn Pearce <s...@google.com> wrote:
On Thu, Aug 14, 2014 at 7:21 AM, Simon Stewart <simon.m...@gmail.com> wrote:
On Wed, Aug 13, 2014 at 3:57 PM, Shawn Pearce <s...@google.com> wrote:
On Wed, Aug 13, 2014 at 6:38 AM, Simon Stewart <simon.m...@gmail.com> wrote:
 
Agreed, it would be handy on prebuilt_jar(), but we also download .zip of JS so it would be handy to also have a generic remote_artifact() rule.

Noted. Sounds like a plan to me.

I am not arguing against a fetch command. For all the reasons you state a fetch command is useful; `git pull && buck fetch` *slam lid board plane* is super useful workflow.

I was trying to suggest that we allow incomplete or missing fetches to run during the build itself, so the user doesn't always have to run fetch first.

That's why I later suggested optionally allowing this, but putting in a flag to stop the build if an artifact was missing. I think we're on the same page here.
 
If you want to support auto-discovery of Maven transitive dependencies, I would suggest doing this as a subcommand that spits out BUCK build rules you append to a build file. Don't do magically in the build.

Absolutely 100% agreed. If there was some way to be more than 100%, I'd be that too.

Simon

david.o...@gmail.com

unread,
Aug 18, 2014, 6:55:55 AM8/18/14
to buck-...@googlegroups.com, Shawn Pearce

Am Mittwoch, 13. August 2014 15:38:58 UTC+2 schrieb Simon Stewart:
Hi,

This post was pointed out to me recently. Sorry for not replying before.

One of the options we've discussed on the team before has been the addition of a "buck fetch" command that would walk the Target Graph and download "stuff" identified by "remote_artifact" targets to the local disk. This initial came up as a mechanism for doing something similar to what shaven-maven[1] does. We'd include an md5 as part of the rule too.

When "buck build" was run, you'd be able to treat a "remote_artifact" in much the same way you'd treat "export_file".

As much as i can see the value for native support ror remote_jar() and remote_artifact() rules in Buck,
it doesn't solve my problem. The question was, how can we extend Buck's rules set, make it pluggable
and reusable. So that new recipes can be easily deployed, found and used. All this natively supported in Buck.

Currently Buck provides only base building block. To become universal build system it must be extendable.
Recently one domain specific rule (GWT library) was natively added to Buck, gwt_binary(), so it can be used, a lá:

  gwt_binary(
    name = 'ui_gerrit',
    modules = [MODULE],
    style = 'OBF',
    optimize = 9,
    module_deps = [':ui_module'],
    deps = ['//lib/gwt:dev'],
    local_workers = cpu_count(),
    strict = True,
    experimental_args = GWT_COMPILER_ARGS,
    vm_args = GWT_JVM_ARGS,
 )

How can Buck be extended in such a way, that such rules can be added without extending the Buck core,
and not only be added, but also be published and discovered by end users.
The idea is to be able to define gwt_binary.bucklet, with say

define_bucklet(group="facebook", name="gwt_binary", version="1.0)
def gwt_binary(...):
  [...]

And the consumer can include it in natively supported .bucklets, e. g.

[providers]

[bucklets]
  facebook:gwt_binary:1.0

And then it can be used as when this rule was natively built in.

The implementation could implicitly fetch (cage aware, obviously) the referenced bucklet from provided
location, say in $HOME/.buck/bucklets directory, and "mount" this directory, so that it just works.

To be Swiss Army Knife build system Buck should be extendable in reusable way.
Published/deployed recipes must be easily discoverable. Always patch the core or copy/paste (or include)
recipes between projects/repositories doesn't scale.

Iain Merrick

unread,
Sep 5, 2014, 10:54:37 AM9/5/14
to buck-...@googlegroups.com, s...@google.com
On Monday, 18 August 2014 11:55:55 UTC+1, david.o...@gmail.com wrote:
[...]

As much as i can see the value for native support ror remote_jar() and remote_artifact() rules in Buck,
it doesn't solve my problem. The question was, how can we extend Buck's rules set, make it pluggable
and reusable. So that new recipes can be easily deployed, found and used. All this natively supported in Buck.

There are two separate issues here. The first one is code reuse:
 
Currently Buck provides only base building block. To become universal build system it must be extendable.
Recently one domain specific rule (GWT library) was natively added to Buck, gwt_binary(), so it can be used, a lá:

  gwt_binary(
    name = 'ui_gerrit',
    modules = [MODULE],
    style = 'OBF',
    optimize = 9,
    module_deps = [':ui_module'],
    deps = ['//lib/gwt:dev'],
    local_workers = cpu_count(),
    strict = True,
    experimental_args = GWT_COMPILER_ARGS,
    vm_args = GWT_JVM_ARGS,
 )

How can Buck be extended in such a way, that such rules can be added without extending the Buck core,

You can do this by wrapping a genrule() in some custom Python code, like this:

def gwt_binary(name, modules, ...[etc]):
    genrule(name=name, cmd=[complicated shell script])

Put that in a common build file (I call mine DEFS), and use include_defs() to import it.

It's not perfect because there's no namespace management, but it solves the basic code reuse problem within a single project.

The second issue is how to share that code among multiple projects:

and not only be added, but also be published and discovered by end users.
The idea is to be able to define gwt_binary.bucklet, with say

define_bucklet(group="facebook", name="gwt_binary", version="1.0)
def gwt_binary(...):
  [...]

And the consumer can include it in natively supported .bucklets, e. g.

[providers]

[bucklets]
  facebook:gwt_binary:1.0

And then it can be used as when this rule was natively built in.

The implementation could implicitly fetch (cage aware, obviously) the referenced bucklet from provided
location, say in $HOME/.buck/bucklets directory, and "mount" this directory, so that it just works.

To be Swiss Army Knife build system Buck should be extendable in reusable way.
Published/deployed recipes must be easily discoverable. Always patch the core or copy/paste (or include)
recipes between projects/repositories doesn't scale.

I don't think Buck needs to solve this problem at all. If it includes a way to download additional stuff from the network, won't it end up fighting the source control system? I'll have to put a bunch of extra stuff into .gitignore, whereas right now I only need to ignore "buck-out" and a couple of other top-level files. I think it would be bad for Buck to go down the Maven route of magically downloading stuff during a build, as that's one of the reasons many people dislike Maven.

How do you share Java code across projects right now? Why not just use the same method to share custom Buck build defs? (I use git submodules. Other people don't and that's great. Choice is good!)

If Buck *does* try to solve this problem, it should be kept separate from the build defs reuse problem. For dependency management, it should use the same mechanism for everything -- build defs, libraries, source code. There's no need for build defs to be handled differently from anything else.
Reply all
Reply to author
Forward
0 new messages