Controlling import prefixes in proto_library

3,280 views
Skip to first unread message

Lukács T. Berki

unread,
Nov 27, 2018, 7:14:02 AM11/27/18
to bazel-...@googlegroups.com, Dmitry Lomov, Josh Haberman, Jay Conrod, Dmitry Babkin, jmil...@stripe.com
Hey there,

I'm planning to add two attributes to proto_library:
  • strip_import_prefix: if specified, this path is removed from the import paths of the .proto files in the proto_library, e.g. "a/b/c.proto" can be included as "b/c.proto" if strip_import_prefix is set to "a".
  • import_prefix: if specified, this path is prepended to the import paths of the .proto files in the proto_library, e.g. "x/y.proto" can be included as "z/x/y.proto" if import_prefix is set to "z".
  • If both are set, strip_import_prefix is applied first and then import_prefix.
All this mirrors include_prefix / strip_include_prefix in cc_library except that strip_import_prefix will default to the package name if import_prefix is specified. This is to support the common use case of having a proto_library in a package and wanting to tell which directory it should be imported under.


And a Bazel source tree is available for experimenting here: https://github.com/lberki/bazel/

This will fix #3867 if all goes well. I'm hoping to land this this week so that it gets into Bazel 0.21, so if you see any horrible flaws in the idea, speak now or keep your peace forever.

--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

John Millikin

unread,
Nov 27, 2018, 1:16:29 PM11/27/18
to lbe...@google.com, bazel-...@googlegroups.com, dsl...@google.com, habe...@google.com, jayc...@google.com, dba...@google.com
All this mirrors include_prefix / strip_include_prefix in cc_library except that strip_import_prefix will default to the package name if import_prefix is specified. This is to support the common use case of having a proto_library in a package and wanting to tell which directory it should be imported under.

Does this mean if I have a file `myservice/proto/main.proto` and I give its target `import_prefix = "github.com/company/team"`, that the resulting proto would be importable at (a) `github.com/company/team/main.proto` or (b) `github.com/company/team/myservice/proto/main.proto` ?

If (a), that would be pretty surprising behavior and contrary to what I want in ~all cases.

Given that `package_name()` exists and can be used from BUILD files, it seems like the better approach would be to not change the default behavior of strip_import_prefix, and ask users to explicitly write something like this if they want that behavior:

```
proto_library(
  name = "foo_proto",
  srcs = ["foo.proto"],
  import_prefix = "completely/different/prefix",
  strip_import_prefix = package_name(),
)
```

Lukács T. Berki

unread,
Nov 27, 2018, 1:49:26 PM11/27/18
to John Millikin, bazel-...@googlegroups.com, Dmitry Lomov, Josh Haberman, Jay Conrod, Dmitry Babkin
On Tue, Nov 27, 2018 at 7:16 PM, John Millikin <jmil...@stripe.com> wrote:
All this mirrors include_prefix / strip_include_prefix in cc_library except that strip_import_prefix will default to the package name if import_prefix is specified. This is to support the common use case of having a proto_library in a package and wanting to tell which directory it should be imported under.

Does this mean if I have a file `myservice/proto/main.proto` and I give its target `import_prefix = "github.com/company/team"`, that the resulting proto would be importable at (a) `github.com/company/team/main.proto` or (b) `github.com/company/team/myservice/proto/main.proto` ?

If (a), that would be pretty surprising behavior and contrary to what I want in ~all cases.
Depends on where your BUILD file is. If it's myservice/proto/BUILD , you'd get (a). If it's in the grandparent directory, you'd get (b). Dmitry argued that (a) is more intuitive and frankly, I don't have enough data to decide and I valued landing this more than being consistent for cc_library... (one could argue that that's not how API design should work...)

I'll let him argue his case :)


Given that `package_name()` exists and can be used from BUILD files, it seems like the better approach would be to not change the default behavior of strip_import_prefix, and ask users to explicitly write something like this if they want that behavior: 

```
proto_library(
  name = "foo_proto",
  srcs = ["foo.proto"],
  import_prefix = "completely/different/prefix",
  strip_import_prefix = package_name(),
)
```

On Tue, Nov 27, 2018 at 4:14 AM Lukács T. Berki <lbe...@google.com> wrote:
Hey there,

I'm planning to add two attributes to proto_library:
  • strip_import_prefix: if specified, this path is removed from the import paths of the .proto files in the proto_library, e.g. "a/b/c.proto" can be included as "b/c.proto" if strip_import_prefix is set to "a".
  • import_prefix: if specified, this path is prepended to the import paths of the .proto files in the proto_library, e.g. "x/y.proto" can be included as "z/x/y.proto" if import_prefix is set to "z".
  • If both are set, strip_import_prefix is applied first and then import_prefix.
All this mirrors include_prefix / strip_include_prefix in cc_library except that strip_import_prefix will default to the package name if import_prefix is specified. This is to support the common use case of having a proto_library in a package and wanting to tell which directory it should be imported under.


And a Bazel source tree is available for experimenting here: https://github.com/lberki/bazel/

This will fix #3867 if all goes well. I'm hoping to land this this week so that it gets into Bazel 0.21, so if you see any horrible flaws in the idea, speak now or keep your peace forever.

--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | GermanyGeschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

Dmitry Lomov

unread,
Nov 27, 2018, 1:52:34 PM11/27/18
to Lukács T. Berki, John Millikin, bazel-...@googlegroups.com, habe...@google.com, Jay Conrod, Dmitry Babkin
I would like to understand:

1. what is our recommended layout for .proto files and their respective proto_library targets and BUILD files?
2. what are the most common layouts in the wild that people want supported?

On both counts, we could use community feedback.

This calls for a design doc (I feel this conversation is now scattered across no less that 3 different threads, some of which are internal, even though the feature affects exclusively external users)
--
Google Germany GmbH
Erika-Mann-Straße 33, 80636 München, Germany

John Millikin

unread,
Nov 27, 2018, 3:01:00 PM11/27/18
to dsl...@google.com, lbe...@google.com, bazel-...@googlegroups.com, habe...@google.com, jayc...@google.com, dba...@google.com
For repos with native Bazel builds, the most common layout is for BUILD files to be placed in the same directory as their .proto sources. Typically it's one target per .proto file, but I sometimes see targets where a single "logical .proto" has been split into proto2 and proto3 syntaxes because proto3 can't define extensions.

Sometimes .proto files are mixed in with the implementations (my_project/main.cc and my_project/service.proto), sometimes they're in proto/ directories (my_project/main.cc and my_project/proto/service.proto). This seems to be driven by cultures around the main implementation language (e.g. Python vs Go vs C++).

For non-native Bazel projects, the standard approach is a single BUILD.bazel file at the root of the external workspace defined by new_http_repository.

---

Note that it's easy to support both models if `strip_import_prefix` is explicit, but if it's implicit then the "native Bazel build" model becomes more difficult.

I think we should optimize Bazel for projects / repos that have native Bazel builds, and accept a small usability tradeoff when adapting repos build with other tooling.

Dmitry Lomov

unread,
Nov 28, 2018, 12:42:58 AM11/28/18
to John Millikin, Lukács T. Berki, bazel-...@googlegroups.com, habe...@google.com, Jay Conrod, Dmitry Babkin
That is useful, thanks.
What are the import paths in both of those situations?

Lukács T. Berki

unread,
Nov 28, 2018, 7:56:51 AM11/28/18
to Dmitry Lomov, John Millikin, bazel-...@googlegroups.com, Josh Haberman, Jay Conrod, Dmitry Babkin
To recap an different thread: Josh prefers to

1. define a prescriptive convention that works well across languages, is well-supported by the tooling, and is recommended for new projects. 
2. support the more flexible attributes also, with caveats attached.

@Josh: do you have an idea of what that convention would be? As familiar as I am with Bazel, my experience with protobuf is pretty limited so I can't offer much wisdom here.

In addition to this, at least Josh and Jay and me agree that it should be the responsibility LANG_proto_library rules to make sure that they handle whatever combination of attributes proto_library itself has. However, proto_library should provide tools for the language-specific libraries to do so. FWIW, my longer-term plan was to have a nested set of (source root exec path, descriptor set, set of .proto files) for every proto_library in the transitive closure, but I *think* the current somewhat-clunky .proto. provider should do for the time being.

As for proto_source_root, I was planning to deprecate it in favor of strip_import_prefix so that proto_library is consistent with cc_library. I don't have a strong opinion, though. proto_source_root is pretty limited now: it can only be empty or the name of the package proto_library is in, so at least it's not hard to make sure that users of it can be migrated...

So far, there are no hard arguments against strip_import_prefix / import_prefix as proposed by me but it feels a bit like they allow too much freedom.

John Millikin

unread,
Nov 28, 2018, 12:32:25 PM11/28/18
to dsl...@google.com, lbe...@google.com, bazel-...@googlegroups.com, habe...@google.com, jayc...@google.com, dba...@google.com
On Tue, Nov 27, 2018 at 9:42 PM Dmitry Lomov <dsl...@google.com> wrote:
That is useful, thanks.
What are the import paths in both of those situations?
 
Language-specific, matching conventions of library imports for whatever the "main language" is. Protobufs written by Go devs have URL-ish prefixes:


Protobufs in other languages tend to follow Google's example, where the first couple components are abstract namespaces:

import "company/project/proto/whatever.proto";

I've also seen some degenerate cases where there is no prefix (thankfully this is rare):

import "whatever.proto";


Josh Haberman

unread,
Nov 28, 2018, 2:03:55 PM11/28/18
to jmil...@stripe.com, Dmitry Lomov, Lukács T. Berki, bazel-...@googlegroups.com, Jay Conrod, Dmitry Babkin
On Wed, Nov 28, 2018 at 9:32 AM John Millikin <jmil...@stripe.com> wrote:
On Tue, Nov 27, 2018 at 9:42 PM Dmitry Lomov <dsl...@google.com> wrote:
That is useful, thanks.
What are the import paths in both of those situations?
 
Language-specific, matching conventions of library imports for whatever the "main language" is. Protobufs written by Go devs have URL-ish prefixes:


Protobufs in other languages tend to follow Google's example, where the first couple components are abstract namespaces:

import "company/project/proto/whatever.proto";

This is what I mean wrt. conventions. If I see a github.com-based import like the above, I will wonder if the tooling considers that significant, either at the protoc stage or at the Go stage. I will wonder if I need to use imports like that for my protos to be usable by Go.

Protobuf schemas are their own language and should follow their own conventions. Those conventions are designed to map well into all languages. A good example of this is field names. Field names in protobufs are named using snake_case. The protobuf compiler knows how to map this to each language's convention. If we generate code for Java, a field called foo_bar_baz will become getFooBarBaz()/setFooBarBaz().

If Java devs started writing their protos using Java conventions, using field names like fooBarBaz, the .proto file might look more friendly and familiar to Java developers. But for C++, all field names are lower-cased, so fooBarBaz would generate set_foobarbaz(), which isn't very readable.

For this reason I want to gently push back on the notion of a "main language" for a .proto file. Protobuf files should file protobuf conventions, so idiomatic code can be generated for any language. Starting imports with github.com isn't an established protobuf convention.

Are the github.com imports here just an aesthetic convention, or are they providing some functional benefit? If it's the latter, I'd like to explore how we can get these github.com imports into the Go generated code without having to put them in the actual .proto import lines. Does the go_package option contain the information we need to do this?

John Millikin

unread,
Nov 28, 2018, 2:26:09 PM11/28/18
to habe...@google.com, dsl...@google.com, lbe...@google.com, bazel-...@googlegroups.com, jayc...@google.com, dba...@google.com
The proto package import paths have no impact on how the generated code is imported by the target language. I'm just describing how people's habits from the implementation language will transfer to their Protobuf schemas.

You'll find that the idea of "protobuf is language-independent" is not widespread outside of Google. Even multiple Google projects assume their .proto will only ever be used by a single language, for example Kubernetes protos depend on extensions defined by a third-party Go protoc plugin.

If you want Protobuf schemas to start following a particular convention then I think that's an admirable goal, but you should not try to enforce those future conventions in today's build system.

Josh Haberman

unread,
Nov 28, 2018, 2:57:09 PM11/28/18
to jmil...@stripe.com, Dmitry Lomov, Lukács T. Berki, bazel-...@googlegroups.com, Jay Conrod, Dmitry Babkin
On Wed, Nov 28, 2018 at 11:26 AM John Millikin <jmil...@stripe.com> wrote:
If you want Protobuf schemas to start following a particular convention then I think that's an admirable goal, but you should not try to enforce those future conventions in today's build system.

I agree with that in principle, and this mirrors my earlier words that Lucacs quoted above.

One thing I am trying to understand is why the Go conventions require special build system support. Even without new Bazel attributes, imports like "github.com/foo/bar.proto" will work fine if there is actually a directory called "github.com/foo". Why is import_prefix needed? I am not understanding the background of https://github.com/bazelbuild/bazel/issues/3867.

John Millikin

unread,
Nov 28, 2018, 3:12:58 PM11/28/18
to habe...@google.com, dsl...@google.com, lbe...@google.com, bazel-...@googlegroups.com, jayc...@google.com, dba...@google.com
Lets say that github.com/company/project contains these files:

Makefile
main.go
proto/
  types.proto
  service.proto

service.proto contains this Go-style import:


Today, this repo can't be built as-is with Bazel, because Bazel assumes the import path for all protos is workspace relative. It wants:

import "proto/types.proto";

The most notable repo this affects is Kubernetes. For example, in https://github.com/kubernetes/api/blob/master/batch/v1/generated.proto :


So what we'd like to do for Bazel+Go is let its BUILD file generator take "k8s.io" as an option, and have the generated proto_library targets contain that import prefix in addition to their repo-relative path.

proto_library(..., import_prefix = "k8s.io/api")

Jay Conrod

unread,
Nov 28, 2018, 3:27:51 PM11/28/18
to jmil...@stripe.com, Josh Haberman, Dmitry Lomov, Lukács T. Berki, bazel-...@googlegroups.com, Dmitry Babkin
> One thing I am trying to understand is why the Go conventions require special build system support.

The standard Go build system doesn't have any support for protoc. Developers typically add support using some combination of make and bash scripts which are run ahead of time (generated files are typically checked in). Since this is all ad hoc, developers define their own conventions.

Having the directory path match the Go import path is convenient for Go's GOPATH (where packages are stored in directories that match their import paths) and vendor directories (same, in a project subdirectory named "vendor"). For example, if you've copied part of Kubernetes into vendor/k8s.io/api, it's easy to just add -Ivendor on the protoc command line. Additional projects can be copied into vendor with no extra fuss.

Lukács T. Berki

unread,
Nov 29, 2018, 4:49:21 AM11/29/18
to Jay Conrod, John Millikin, Josh Haberman, Dmitry Lomov, bazel-...@googlegroups.com, Dmitry Babkin
It seems that we agree that proto_library should support the status quo of protocol buffers and not the platonic ideal of their usage. Good!

I can see three use cases:
  1. A project has .proto files (e.g. foo/bar.proto) and it wants them to be imported with a prefix "github.com/project/foo/bar.proto"). This is the most salient in Go.
  2. A project is vendored into someone else's source tree e.g. at "vendor/foo" and BUILD files are written for them because they don't (yet, hopefully!) have any. The "vendor/foo" parts shouldn't be required in import statements.
  3. A project and its BUILD files are vendored into someone else's source tree
(1) calls for adding a prefix to import paths (i.e. import_prefix) (2) calls for stripping a prefix from import paths (i.e. strip_import_prefix). (3) Just Works as long as the proto_library rules have either a package-relative strip_import_prefix or an import_prefix attribute.

Unfortunately, we can't make (3) work without breaking existing uses of proto_library because if Bazel sees a proto_library it can't tell if it's a vendored one or not and if it's a vendored one, which prefix to strip. If one wants to prepare their source tree for being vendored, they can always say "import_prefix=<package name>" and let strip_import_prefix default to the package name as per Dmitry Lomov's proposal or we can make it so that strip_import_prefix="" mean "strip the package path" and require them to say "proto_library(strip_import_prefix='' import_prefix=<package name>"). 



John Millikin

unread,
Nov 29, 2018, 12:30:48 PM11/29/18
to lbe...@google.com, jayc...@google.com, habe...@google.com, dsl...@google.com, bazel-...@googlegroups.com, dba...@google.com
For (3), we've always used a local_repository to make the vendored workspace be treated as an external dependency with "standard" package resolution rules. This means that vendored BUILD files don't need to be modified, and any .proto import paths are resolved relative to the vendored subdir rather than the top-level workspace.

Lukács T. Berki

unread,
Nov 30, 2018, 4:17:39 AM11/30/18
to John Millikin, Jay Conrod, Josh Haberman, Dmitry Lomov, bazel-...@googlegroups.com, Dmitry Babkin
On Thu, Nov 29, 2018 at 6:30 PM, John Millikin <jmil...@stripe.com> wrote:
For (3), we've always used a local_repository to make the vendored workspace be treated as an external dependency with "standard" package resolution rules. This means that vendored BUILD files don't need to be modified, and any .proto import paths are resolved relative to the vendored subdir rather than the top-level workspace.
...aaand, this is how this dovetails with the "how should vendoring be done"  question with Bazel. Not that we have a good policy on that. In fact the source tree of Bazel does use both the "copy the sources" and the "use new_local_repository" approaches. 

Tony Aiuto

unread,
Nov 30, 2018, 12:59:00 PM11/30/18
to Lukács T. Berki, jmil...@stripe.com, Jay Conrod, Josh Haberman, Dmitry Lomov, bazel-...@googlegroups.com, Dmitry Babkin
 "how should vendoring be done"  .... you had my curiosity, now you have my attention.

I can offer some insight from being a long time owner of some third party C++ code vendored into Google.
Vendored C++ code generally has these pain points, but I am sure there are analogies in rule sets.
  • #include paths are written in a variety of styles
    • relative to the top of the source tree
    • relative to the directory of the source file
    • relative to the place where the *installed* .h files go
  • configure creates config.h at bootstrap compile
    • the rest of the code #inclues as "config.h"
    • the fun happens when there are two config.h's, each reached by a different -I directives, but only one can be the winner
  • configure does not support cross platform well
    • really hard to generate config.h for 32 bit CPU when your desktop is 64bit
    • or for darwin when your desktop OS is linux
I ended up as an advocate of "extreme import".
  • I pull in fresh sources.
  • I (mechanically) change all of the #include statements from relative paths to absolute ones
  • If there are extra patches which are needed I examine each and reapply it.
    • Noe hand edits. Patches must be scripts or diffs and checked in.
This sounds invasive. Yes it is. But it gets absolute predictability at a small cost if you maintain
a discipline about the importing.



--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/CAOu%2B0LVXUFb7e15YGLNmxgARxKUBVvzkWd-16A755JBa%2B%2Bt55A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Lukács T. Berki

unread,
Dec 3, 2018, 5:15:39 AM12/3/18
to Tony Aiuto, John Millikin, Jay Conrod, Josh Haberman, Dmitry Lomov, bazel-discuss, Dmitry Babkin
I think this is perfectly reasonable for C++ in google3. However, proto imports are a bit less of a wild west than C++ includes so we hopefully don't have to contend with the config.h silliness. In exchange, however, we also have to deal with Bazel where mandating import statements to be rewritten isn't a nice thing to do.

I was hoping that we can get to the point where it's obvious from a BUILD file what .proto files are available at what import path (this *is* the case with the above proposal if import_prefix is specified and strip_import_prefix gets a value that's not an ancestor of the package directory) and then suddenly don't have this problem at all when vendoring protocol buffers along with their BUILD statements.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discuss+unsubscribe@googlegroups.com.

Lukács T. Berki

unread,
Dec 4, 2018, 7:43:26 AM12/4/18
to Tony Aiuto, John Millikin, Jay Conrod, Josh Haberman, Dmitry Lomov, bazel-discuss, Dmitry Babkin
Update: it just occurred to me that if one is vendoring code, the recommended way is to vendor it as a repository and use local_repository(). This is because references to other packages cannot be relative, so if label //a:b references //c:d and you vendor them under third_party/foo, there is no way to make //third_party/foo/a:b automatically reference //third_party/foo/c:d without changing a/BUILD.

After an in-person discussion with Josh, we decided to stay consistent with cc_library and avoid the magic of "if this attribute is set to this, that attribute defaults to that". That is, if import_prefix is specified without strip_import_prefix, not automatic stripping will be done. I'll update my code accordingly.

We also decided to try to enact a check that makes it so that in any proto_library rule, import paths must not be ambiguous, that is, any particular import path should refer only to one .proto file. We must tread softly there because we currently don't enforce that and i don't know whether anything important will be broken with this extra constraint.

To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.



--
Lukács T. Berki | Software Engineer | lbe...@google.com | 

Google Germany GmbH | Erika-Mann-Str. 33  | 80636 München | Germany | Geschäftsführer: Paul Manicle, Halimah DeLaine Prado | Registergericht und -nummer: Hamburg, HRB 86891

John Millikin

unread,
Dec 4, 2018, 11:06:34 AM12/4/18
to lbe...@google.com, ai...@google.com, jayc...@google.com, habe...@google.com, dsl...@google.com, bazel-...@googlegroups.com, dba...@google.com
Would that non-ambiguity be enforced by Bazel, or by protoc? I agree that it's a reasonable thing to check, but it feels like that behavior belongs in the compiler rather than the build system. Bazel won't warn if a C #include path or Java class import is ambiguous, right?

Lukács T. Berki

unread,
Dec 5, 2018, 3:01:23 AM12/5/18
to John Millikin, Tony Aiuto, Jay Conrod, Josh Haberman, Dmitry Lomov, bazel-discuss, Dmitry Babkin
By Bazel, I hope. I'd also enforce non-ambiguity with C++ includes if that was not a common pattern in C++. Josh says that this is feasible and desired for protocol buffers, and I'd rather start out strict (or quickly migrate to being strict, in this case...) then relax the constraints if desired.

(we also enforce non-ambiguity of Java classes internally)

Josh Haberman

unread,
Dec 5, 2018, 4:09:13 AM12/5/18
to Lukács T. Berki, jmil...@stripe.com, Tony Aiuto, Jay Conrod, Dmitry Lomov, bazel-...@googlegroups.com, Dmitry Babkin
I agree that Bazel would be a better place to check this. Bazel has a complete list of all files that are considered inputs to the protoc invocation, so it can validate this constraint using information it already has in memory. It would be theoretically possible for protoc to validate this constraint by attempting to open an import in every include path, even after it's already found the file it is looking for. However this would imply extra filesystem operations (O(N*M), where N=imports and M=include paths), and it seems outside the realm of what we would generally expect a compiler to do.

I also agree with the principle of being more strict now, and relaxing later if necessary.

John Millikin

unread,
Dec 5, 2018, 12:12:36 PM12/5/18
to habe...@google.com, lbe...@google.com, ai...@google.com, jayc...@google.com, dsl...@google.com, bazel-...@googlegroups.com, dba...@google.com
protoc also has the ability to check that constraint without any filesystem IO, doesn't it? No need to go open every file, you can check for duplicate import paths using the content of argv.

For example, a typical protoc invocation from our internal go_proto_library rule is:

protoc \
  -Igoogle/protobuf/any.proto=external/com_google_protobuf/src/google/protobuf/any.proto \
  -Igoogle/rpc/status.proto=external/com_google_googleapis/google/rpc/status.proto \
  -Ik8s.io/api/core/v1/generated.proto=external/com_github_kubernetes_api/core/v1/generated.proto \
[...]

If further down the argv there was a second assignment to `google/protobuf/any.proto` then that could be detected without IO at either the protoc or Bazel layer. Doing the check in protoc has an advantage of being consistent across build systems.
Reply all
Reply to author
Forward
0 new messages