Recommended process for adding new workflows?

213 views
Skip to first unread message

b...@sudoforge.com

unread,
Mar 22, 2020, 10:58:31 PM3/22/20
to Copybara OSS
Hey there,

I'm exploring Copybara as a way to both integrate the source code of open external dependencies within a monorepo, and to manage the integration with internal projects that have been open sourced. Thank you for building (and releasing) it! I've been playing around with basic git <-> github workflows, and feel like I have a decent understanding of how to accomplish various tasks (https://github.com/google/copybara/issues/54 was helpful), but  I'm still trying to wrap my head around one thing that seems fairly basic:

What is the recommended method for integrating new workflows? Let's say I'm adding a new external SoT with a (fairly straightforward) workflow:

```

core.workflow(
    name = "import",
    mode = "SQUASH",

    origin = git.github_origin(
      url = sot_url,
      ref = "master",
    ),

    destination = git.github_pr_destination(
      url = dest_url_http,
      destination_ref = "master",
      pr_branch = "chore/update-foo-digest",
      title = "chore(tools): import foo at digest ${COPYBARA_CURRENT_REV}",
      update_description = True,
    ),

    origin_files = glob([
      "some/path/with/files/**",
      "another_path/**",
    ]),

    destination_files = glob(["third_party/foo/**"], exclude=['copy.bara.sky']),

    authoring = authoring.pass_thru("copybara <tools-c...@domain.com>"),
    transformations = [
        core.move("", "third_party/foo"),
        metadata.replace_message("chore(tools): import foo at digest ${COPYBARA_CURRENT_REV}"),
    ],
)
```

The primary goal of this workflow is to keep some the source code for an external dependency in my internal monorepo, and build it with Bazel. This type of monodirectional migration is likely to be the most common pattern written in my repository, and I have a few questions around it.


Who should actually run the workflow?

1. The developer, before submitting
This can create an odd scenario where the PR adding the dependency is created before the PR adding the workflow to the root, or even before the workflow is added to the repository (in its own ref). This also skips the review and testing process, which can lead to other issues. What is the recommended process for handling this with workflows that are located within the destination tree (e.g. the above workflow would sit at `//:third_party/foo/copy.bara.sky`)?

2. The developer, after submitting
This would be preferred, but is difficult to test. How does one test a copybara workflow? In the example workflow above, how might a test be written to ensure that the workflow completes successfully (an "end to end" test, so to speak)?

3. A bot, after submitting
I can see this being useful for first time imports, but has the same issue regarding testing.
 
4. A bot, after submitting and on a schedule
Same issues as (3). Additionally, it may not be safe to run certain migrations on a schedule, as it can unknowingly introduce new code. This can potentially be mitigated by requiring a review of the changes (i.e. use `git.github_pr_destination` or `git.gerrit_destination`), but in doing so we'd need to build additional processes around execution of the workflow -- e.g. run daily, but not if a broken submission has already been made, etc).

5. ???

How do I build a workflow for submitting contributions back upstream?

In the above workflow, I'm using the `SQUASH` mode, and modifying the tree in place (this can alternatively be done via patch files). If I plan on submitting patches back to upstream, I'd need to use an `ITERATIVE` mode and rewrite each commit -- would it be a safe bet to always use an `ITERATIVE` mode? I see potential issues when performing complex (or many) modifications, not to mention the load on a build server when performing fresh imports -- how can we batch commits together in ITERATIVE mode before sending a change up? Copybara, for example, has over 1900 commits -- sending those in quick succession may generate a lot of noise at best, and eat up build resources that developers need at worst.

 
I can appreciate that Google likely doesn't have these same concerns and/or has internal tools that compensate for these issues; any direction that you are able to provide would be much appreciated.
Reply all
Reply to author
Forward
0 new messages