Repository layout

491 views
Skip to first unread message

rob...@gmail.com

unread,
Jul 14, 2016, 10:15:18 AM7/14/16
to bazel-discuss
Hi all,

Our codebase has ~250 services written in a mix of Java (primarily), Go, JS (using Closure Tools), and I was interested in views on how to best organize it for the purpose of migrating to Bazel, enabling sharing, and controlling dependencies.

Presently we have something like this:

//src/com/corp/$service/... # Java service client and server, ad-hoc organization
//src/com/corp/$service/app/views/... # Closure templates
//src/com/corp/common/... # application-agnostic libraries
//js/... # js code + tests, built using plovr
//gocode/src/corp/... # go code + tests, built using the go tool

Our Java code is built using a custom tool that symlinks relevant Java files into a build tree and runs ant / javac on the whole thing.

Questions:
- It seems like the right organization for a Java service package is to have a clear separation between RPC Client / Client library & the service implementation. Any particular naming conventions? What does Google use?
- What are good guidelines for services' public interface being allowed to depend upon other services? In particular, this happens for us when a service uses protocol buffers defined by another (e.g. there are a bunch of services that operate on the same proto data structure). Is this a possible source of cyclic dependencies?
- Are there any tips for taking a Java codebase and migrating it to bazel / removing cyclic dependencies? The best thing I've come up with is to use dependency visualization tools and Eclipse "Move" refactorings.
- Have people found it to be a good idea to have a single tree of templates to aid sharing between applications and making goog.require('templatepath') work?

Any thoughts (or links to resources)?

Thank you!
Rob

Paul Johnston

unread,
Jul 14, 2016, 12:42:05 PM7/14/16
to bazel-discuss, rob...@gmail.com
I'm in middle of that process now and FWIW can share that.  The application I am working with has a (mostly) java backend of about 20 services built via using maven.  The frontend is all Closure, implemented as a SPA and does not mirror the service-based architecture of the backend, so there are no dependencies between the frontend project layout (in src/main/js; filesystem layout mirrors component architecture) and the backend (fs mirrors the service architecture).  

Some component-based soy js templates are used in the frontend (mostly just native goog.dom construction though), and a few Soy java templates for batch-based tasks and utility views served from a servlet and for email templating.  Frontend is built and tested with plovr currently.  

Here are the conventions I'm using:

* Retained the maven directly layout in case the bazel migration hits a full-stop (hasn't happened yet).

* WORKSPACE has a bunch of maven_jar rules.  https://github.com/pgr0ss/bazel-deps has been useful to make that process a little easier (thanks Paul!).  

* //third_party/BUILD targets that group @dep//jar into relevant codependent compile-time or runtime sets.  I've seen 3rdparty used but prefer to have these entries sorted down rather than up by buildifier.

* BUILD file in each service dir with:
  (1) a java_library rule to build the interface, hopefully with as few library dependencies as possible.  Target name matches directory name (e.g. 'sms').
  (2) a java_library rule to build the implementation, with necessary additional dependencies. Target name matches short name for impl (e.g. 'twilio').
  (3) a java_test rule to test the implementation, with additional runtime dependencies.  Target name is '-test' appended to implementation rule name.  

Since I'm not using any file globbing, I've have been moving the test source files in the same directory as the thing it is testing, similar to how one would test a closure component.  I have not done this before with a java project but seems like the right thing to do.  It may the wrong choice.  As has been mentioned previously on this list, google uses the //javatests directory for this.

I'm not using 'exports' effectively nor visibility declarations intelligently at the moment.

Here's an example:


# src/main/java/org/pubref/k8a/service/sms/BUILD
package(default_visibility = ["//visibility:public"])

java_library(
    name = "sms",
    srcs = [
        "SmsListener.java",
        "SmsMessage.java",
        "SmsMessageBuilder.java",
        "SmsService.java",
    ],
    deps = [
        "//src/main/java/org/pubref/app",
    ],
)

java_library(
    name = "twilio",
    srcs = [
        "TwilioSmsService.java",
    ],
    deps = [
        ":sms",
        "//src/main/java/org/pubref/app",
        "//third_party:twilio",
    ],
)

java_test(
    name = "twilio-test",
    size = "small",
    srcs = [
        "TwilioSmsServiceTest.java",
    ],
    deps = [
        ":sms",
        ":twilio",
        "//src/main/java/org/pubref/app",
        "//third_party:junit4",
    ],
    test_class = "org.pubref.k8a.service.sms.TwilioSmsServiceTest",
    runtime_deps = [
        "//third_party:twilio_runtime",
    ],
)

The process has also been one of refactoring, detaching unnecessary dependencies, and simplification of the code.  Since bazel is good about not redoing work and fast compiles, I start migrating a service to bazel with a single java file in the srcs attribute.  Keep recompiling, cleaning up, and adding additional source files until it compiles cleanly.   Along the way you'll recognize natural boundaries and prior bad decisions, and fix those.  Mark and sweep.  And of course test it.  With a large codebase of 250 services could be hard (understatement) unless you've stayed rigid about your dependencies.  Not sure there is a magic bullet here.

Some service dependencies are so ubiquitous that they are arguably not services, but core to the application.  You can refactor those into a common trunk.  I'm guessing that many of your protocol buffer classes are similarly part of the core model of your application that you'll need either a common trunk for these or a common rule that bundles them together.  Since the presence of a BUILD file in a directory defines it as a package, it can be tricky to try to cherry-pick files from child package namespace from a parent BUILD file however.

HTH,
Paul

Rob Figueiredo

unread,
Jul 15, 2016, 7:15:51 AM7/15/16
to Paul Johnston, bazel-discuss
Thank you for sharing! It's good to hear the experience of someone else that has gone through the same thing.

Your strategy sounds good, but I'd like to follow the "1:1:1" rule [link], even though that would mean moving code rather than just creating more fine-grained BUILD files. I'm thinking something like com/corp/$service/service/... (or com/corp/$service/*.java) & com/corp/$service/internal/.... One big benefit is that there are tools to work with java dependencies at the package level (e.g. structure101) that would not be very helpful without doing that.

My migration strategy is to create BUILD files at the top level of each service (rather than one per package) that encompasses all java code within that subtree, so I don't have to eliminate cyclic dependencies within a service - only between services. I suppose I could measure progress by number of cyclic dependencies when things are topologically sorted.

Moving protobufs into a common trunk sounds like it could be a good option!


Justine Tunney

unread,
Jul 15, 2016, 10:50:44 AM7/15/16
to rob...@gmail.com, bazel-discuss
I just want to start by saying that I think Paul has been offering some good solid advice for a project moving away from Maven.

Google uses a little bit of a different directory structure internally. For example, pretty much all .java files are under //java/com/google, //javatests/com/google, and //third_party/java_src for open source projects that have things like Maven directory structures. //javatests has a special meaning in Bazel, where all rules are testonly=True by default. //third_party also has a special meaning, where licence declarations are required.

Source files in languages that aren't Java, pretty much go wherever at Google. Usually something related to the project name (since the whole company shares the same repository.) For example: //domain/registry and //domain/registrar.

On Thu, Jul 14, 2016 at 10:15 AM, <rob...@gmail.com> wrote:
Our codebase has ~250 services written in a mix of Java (primarily), Go, JS (using Closure Tools), and I was interested in views on how to best organize it for the purpose of migrating to Bazel, enabling sharing, and controlling dependencies.

I hope Closure Rules has been serving your company well. If not, let me know if there's anything I can do.

Our Java code is built using a custom tool that symlinks relevant Java files into a build tree and runs ant / javac on the whole thing.

You're probably going to be so happy when that isn't necessary anymore.

- It seems like the right organization for a Java service package is to have a clear separation between RPC Client / Client library & the service implementation. Any particular naming conventions? What does Google use?

You mentioned you're using protocol buffers. Are you also using gRPC? It allows you to define services that are obviously disjoint from their implementations. For example: helloworld.proto. That way, the client only needs to depend on gRPC, the service definitions, and the protos it references (assuming the client code statically links any of their classes.)

Typically at Google, protobuf / service definitions don't go under //java if they're used by multiple languages. If the service is for instance, //java/com/google/doodle then the protobuf might go under //doodle/proto.

For services that don't use protobuf, I commonly see people at google using /impl/ subdirectories.

- What are good guidelines for services' public interface being allowed to depend upon other services? In particular, this happens for us when a service uses protocol buffers defined by another (e.g. there are a bunch of services that operate on the same proto data structure). Is this a possible source of cyclic dependencies?

Just have your proto_library() rule depend on the other proto_library() rules it needs. I don't see why that would create cycles. Unless there was a cycle between two .proto source files. If not, the solution is usually to add more rules.
 
- Are there any tips for taking a Java codebase and migrating it to bazel / removing cyclic dependencies?  The best thing I've come up with is to use dependency visualization tools and Eclipse "Move" refactorings.

When I'm porting Java projects, I usually start off by having a BUILD file in each package, with a single java_library() rule. Then I write a script that determines what gets imported in each package. If I notice cycles between packages, it can usually be solved by having multiple java_library() rules in a single package. If the cycles get too intense, particularly between a package and its subpackages, I might just glob(["**/*.java"]) all the subpackages into the parent package.

- Have people found it to be a good idea to have a single tree of templates to aid sharing between applications and making goog.require('templatepath') work?

You can put all your Soy templates in a separate directory structure from your JS if you want. Directory structure doesn't matter so much from a sharing perspective, because Bazel allows you to customize visibility on a per-rule basis.

Paul Johnston

unread,
Jul 15, 2016, 11:35:13 AM7/15/16
to bazel-discuss, rob...@gmail.com

Google uses a little bit of a different directory structure internally. For example, pretty much all .java files are under //java/com/google, //javatests/com/google, and //third_party/java_src for open source projects that have things like Maven directory structures. //javatests has a special meaning in Bazel, where all rules are testonly=True by default. //third_party also has a special meaning, where licence declarations are required.



I didn't realize //javatests was a bazel primitive with special meaning.  Thanks for pointing that out.  

rob...@gmail.com

unread,
Jul 15, 2016, 12:42:27 PM7/15/16
to bazel-discuss, rob...@gmail.com
On Friday, July 15, 2016 at 10:50:44 AM UTC-4, Justine Tunney wrote:
> I just want to start by saying that I think Paul has been offering some good solid advice for a project moving away from Maven.
>
>
> Google uses a little bit of a different directory structure internally. For example, pretty much all .java files are under //java/com/google, //javatests/com/google, and //third_party/java_src for open source projects that have things like Maven directory structures. //javatests has a special meaning in Bazel, where all rules are testonly=True by default. //third_party also has a special meaning, where licence declarations are required.
>
>
> Source files in languages that aren't Java, pretty much go wherever at Google. Usually something related to the project name (since the whole company shares the same repository.) For example: //domain/registry and //domain/registrar.

Question

Interesting, so you don't use GOPATH?

>
>
> On Thu, Jul 14, 2016 at 10:15 AM, <rob...@gmail.com> wrote:
> Our codebase has ~250 services written in a mix of Java (primarily), Go, JS (using Closure Tools), and I was interested in views on how to best organize it for the purpose of migrating to Bazel, enabling sharing, and controlling dependencies.
>
>
>
> I hope Closure Rules has been serving your company well. If not, let me know if there's anything I can do.

Hopefully soon. We don't use any bazel just yet -- we build using plovr presently.


> - It seems like the right organization for a Java service package is to have a clear separation between RPC Client / Client library & the service implementation. Any particular naming conventions? What does Google use?
>
>
>
> You mentioned you're using protocol buffers. Are you also using gRPC? It allows you to define services that are obviously disjoint from their implementations. For example: helloworld.proto. That way, the client only needs to depend on gRPC, the service definitions, and the protos it references (assuming the client code statically links any of their classes.)
>
>
> Typically at Google, protobuf / service definitions don't go under //java if they're used by multiple languages. If the service is for instance, //java/com/google/doodle then the protobuf might go under //doodle/proto.
>
>
> For services that don't use protobuf, I commonly see people at google using /impl/ subdirectories.
>

Our majority case is all-java services communicating with protobufs. We are not using gRPC, due to everything being on proto2. (It sounded like there was no upgrade path, and we didn't want to be stuck using a mix of proto2 and proto3 forever)

To clarify, is it common to have a mix of private and public code in the same java package along with separate BUILD rules, or is //java/com/google/doodle/*.java public and //java/com/google/doodle/impl/**/*.java private? (Assuming that doodle is a java service) Or do most packages follow the "1:1:1 rule"?

How do you organize javascript and javascript tests?


> - Are there any tips for taking a Java codebase and migrating it to bazel / removing cyclic dependencies?  The best thing I've come up with is to use dependency visualization tools and Eclipse "Move" refactorings.
>
>
>
>
> When I'm porting Java projects, I usually start off by having a BUILD file in each package, with a single java_library() rule. Then I write a script that determines what gets imported in each package. If I notice cycles between packages, it can usually be solved by having multiple java_library() rules in a single package. If the cycles get too intense, particularly between a package and its subpackages, I might just glob(["**/*.java"]) all the subpackages into the parent package.
>

Interesting! So you find that you can often avoid refactoring code when porting by writing the right rules? I didn't really consider that approach (I was following 1:1:1)


>
> - Have people found it to be a good idea to have a single tree of templates to aid sharing between applications and making goog.require('templatepath') work?
>
>
>
> You can put all your Soy templates in a separate directory structure from your JS if you want. Directory structure doesn't matter so much from a sharing perspective, because Bazel allows you to customize visibility on a per-rule basis.

Do you mind providing an example? If doodle service has templates to be shared with registrar service as well as with javascript, where would you put them? Leave them wherever they were initially used/added and reference them from the subsequent spots using BUILD rules?

Thanks very much!
Rob

Justine Tunney

unread,
Jul 15, 2016, 1:50:52 PM7/15/16
to rob...@gmail.com, bazel-discuss
On Fri, Jul 15, 2016 at 12:42 PM, <rob...@gmail.com> wrote:
Interesting, so you don't use GOPATH?

We do not use GOPATH in our internal repository. We also don't use @external_workspace//. Everything is in a single Bazel repository. All imports are relative to the base of the repository. 

This poses a challenge for Go projects, because external paths are different. Some projects like Kubernetes adopt the following convention:

import "third_party/cloud/kubernetes/pkg/api/resource/resource"       // "k8s.io/kubernetes/pkg/api/resource"

A tool will then replace the package name when exporting to GitHub. We also have tools like MOE that can just do a regex replace on sources when exporting to GitHub.

Our majority case is all-java services communicating with protobufs. We are not using gRPC, due to everything being on proto2. (It sounded like there was no upgrade path, and we didn't want to be stuck using a mix of proto2 and proto3 forever)

In that case, I imagine you're writing a thin client library by hand for each one? In that case, one option to consider is having them separated as follows:
  • //java/com/yext/service/proto
  • //java/com/yext/service/server
  • //java/com/yext/service/client
Each of those directories might have a lot of subpackages and a lot of rules. So the next thing you're going to want to do, is guarantee that the client library never links against the server code. Because you don't want every app that talks to service to be bloated with the server-side code. The following script, if added to your CI system, will guarantee their disjointedness:

[[ -z "$(bazel query 'somepath(//java/com/yext/service/client/...,//java/com/yext/service/server/...)')" ]]
[[ -z "$(bazel query 'somepath(//java/com/yext/service/server/...,//java/com/yext/service/client/...)')" ]]

Keeping paths disjoint is very important for the long term maintainability of codebases with many developers. It's something Googlers actually struggle with. We sometimes find ourselves in silly situations, where the build for Haskell breaks, and it somehow ends up cascading into many things you'd think have no business having Haskell as a transitive dependency. So please enjoy that script.

To clarify, is it common to have a mix of private and public code in the same java package along with separate BUILD rules, or is //java/com/google/doodle/*.java public and //java/com/google/doodle/impl/**/*.java private?  (Assuming that doodle is a java service) Or do most packages follow the "1:1:1 rule"?

It depends what you mean by private and public. If private means, this code is going to run as an entirely separate program that other stuff talks to over the network, strong enforceable separation is probably a good idea.

Most Java code at Google follows the 1:1:1 rule. It makes deps so much simpler. Breaking it is usually a sign of technical debt. Or a sign that you've got a bloated utils package that depends on a lot of large libraries, and other packages that only need a subset of utils don't want those big dependencies schlepped in. But that's still technical debt. Because many people at Google consider utils packages harmful. There's usually no penalty to having lots of little packages.

How do you organize javascript and javascript tests?

There's no consensus at Google. JavaScript core libraries tend to go under //javascript. For any foo.js there is a foo_test.js in the same directory. Teams that use primarily Java, will oftentimes have their JS files under the //java source tree. As a result, they put their foo_test.js files under //javatests. That's what my team does.

Interesting! So you find that you can often avoid refactoring code when porting by writing the right rules?  I didn't really consider that approach (I was following 1:1:1)

Absolutely. Starting off with the complicated rules is technical debt, don't get me wrong. But it'll help your organization get standardized on Bazel quickly. Then you can refactor Java code over time to not need the complicated rules.

Do you mind providing an example? If doodle service has templates to be shared with registrar service as well as with javascript, where would you put them? Leave them wherever they were initially used/added and reference them from the subsequent spots using BUILD rules?

Yeah pretty much. But try to minimize visibility by default. Googlers tend to behave very openly in the internal repository. If something is marked visibility = ["//visibility:public"], other engineers in the farthest corners of the company will show no compunction in depending on it. Even if it's in the darkest deepest depths of another team's codebase.

Also consider that it's possible to restrict visibility to just one other team. For example, doodle service could say visibility = ["//java/com/yext/registrar:__subpackages__"] to give the whole registrar codebase access. There's also package_group() and package(default_visibility) for creating centralized visibility policies.

ittai zeidman

unread,
Jul 17, 2016, 3:03:37 PM7/17/16
to bazel-discuss, rob...@gmail.com
Hi,
FWIW I'm in the very early staging of migrating our code to Bazel.
We have 200+ micro-services (vast majority in scala) and we have a lot of client code which I'm not trying to tackle right now (but my clear intent is to migrate it to bazel).
I'm currently working on a tool that is meant to automatically facilitate moving from maven to bazel. It will use bytecode analysis and free-text-search to give a very correct answer about dependencies between source files and the current intent is to have that aggregate to the package level (to lower maintenance costs). In case of cycles I just combine two sub packages. Let's say that "com/wix/security/something" and "com/wix/rpc/other" have a cyclic dependency then I'll have one target called "security/something_rpc/other" sitting in a package at "com/wix". This enables me to live with existing cycles without any refactoring since I know we're pretty limited with our cycles.

"- What are good guidelines for services' public interface being allowed to depend upon other services?" Good question. I've asked myself a similar (same?) question about splitting to an foo_interface target and a foo_impl target with different visibility. Don't know yet...

Justine Tunney

unread,
Jul 17, 2016, 7:17:34 PM7/17/16
to ittai zeidman, bazel-discuss, Rob Figueiredo
[Disclaimer: I know next to nothing about Scala.]

> I've asked myself a similar (same?) question about splitting to an foo_interface target and a foo_impl target with different visibility. Don't know yet...

Let's say for example you have a Java interface called FooService. It's implemented by two things: FooServiceImpl and FooServiceFake for testing. In that case, it would be a good idea for all three to be in separate build targets.
  1. You definitely want to have a separate java_library() rule for FooServiceFake so it can be testonly=True. That way Bazel can guarantee it'll never end up in a production binary. Possibly put in a testing/ subdirectory. Then unit tests just depend on that rather than the implementation.

  2. The interface rule could be named //java/com/doodle/fooservice (a.k.a. //java/com/doodle/fooservice:fooservice) since it provides a clean name that the majority of the code will use as a dep.

  3. As for the impl rule, it would be better if it could be named //java/com/doodle/fooservice/impl rather than //java/com/doodle/fooservice:fooservice_impl. But either technically work.

  4. Whether or not the impl rule should be visibility restricted depends. Chances are it might not be worth the effort. Since it's going to end up in the resulting test or production binary at some point or another. But depending on your architecture, it might be appropriate. For example, my team uses Dagger 2 dependency injection. We have lots and lots of packages. They mostly just depend on interfaces. Then we have a separate set of packages for each production environment, e.g. com.foo.app.module.frontend, com.foo.app.module.backend. Those packages basically contain nothing other than a @Component that lists all the modules which provides the implementations. Along with the various entrypoints into the application. Seriously, if anyone reading is using Bazel with Java, I can't recommend Dagger 2 enough. So if we wanted to, we could have visibility restricted our module implementation packages to just the component packages. But we didn't bother.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/60dbfa8b-de3a-4bb8-99f4-7eebf1dfaec4%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages