Re: [bazel-discuss] Re: TurboCache - A new remote execution / cache server

75 views
Skip to first unread message

Steven Bergsieker

unread,
May 12, 2022, 2:27:49 PM5/12/22
to Fredrik Medley, Remote Execution APIs Working Group, bazel-discuss
Hi Nathan,

Thanks for posting this! You may want to add TurboCache to the list of known server implementations of the Remote Execution API: https://github.com/bazelbuild/remote-apis#servers

You might also be interested in joining remote-exe...@googlegroups.com and attending our monthly sync for all things remote execution-related. (Joining the group should invite you to the meeting automatically.)

Thanks,
Steven

On Sun, Apr 24, 2022 at 4:58 PM Fredrik Medley <fredrik...@gmail.com> wrote:
Can you mention, on high level, some specific pain points in the other remote systems, thinking specifically about Bazel Remote, Buildbarn and Buildfarm? I'm curious because I don't want to do the same mistakes myself.

Also, don't forget to add TurboCache to the remote-apis-testing repository: https://remote-apis-testing.gitlab.io/remote-apis-testing/

Best regards,
Fredrik

fredag 15 april 2022 kl. 05:04:02 UTC+2 skrev thegr...@gmail.com:

Hi team,

TL;DR

There's a new bazel remote execution / cache project: https://github.com/allada/turbo-cache

I thought I'd put this out there and see if there is any early feedback on a new project I've been working on on-and-off for the last couple years and now actually have lots of time to bring it to completion. Currently it is unlicensed because I have not yet chosen what to do with it, but it is open source and will likely apply at least a LGPL or even more permissive license in the near future.

I'd love any feedback any of you have.

My Background

A few years ago at my previous job I built out our remote execution farm (BuildBarn) that serviced about:

  • ~1.2 million unit tests per day (seconds to run)

  • ~20k integration tests per day (median test duration was ~8 mins)

  • ~300k build jobs per day.

  • ~1 petabyte of cache / month

We spent a huge amount of time trying to keep things stable and to keep infra related issues to a minimum. Over winter break of 2020 I had some free time, so I decided to start a new project to start from scratch and build the server remote execution / cache server myself with all the hindsight available to me.

I decided to write the entire thing in Rust to help with stability (and I wanted to try the brand new Async/Await [which is awesome btw]). I also wanted to implement some cool features that I thought this space was lacking.

Current State

There are 2 main parts of the project: Remote Cache and Remote Execution.

Remote Cache

Remote cache is in the alpha stage. It currently supports:

  • Memory store - Data will live in the same machine memory (with eviction policies)

  • S3 store - Services objects that live in a service that supports S3 calls

  • Compression store - Will compress data (lz4) then forward on to another store

  • Dedup (de-duplication) store - Uses a rolling hash algorithm to find parts of files that are the same and only process and store the parts that have changed (similar algorithm that rsync & bup uses). Very efficient when large files only a few changes in them.

  • FastSlow store - Tries one store first, then if not found tries the slow store (and then populates the fast store)

  • Filesystem store - Stores objects on disk

  • SizePartitioning store - Chooses a store to place objects based on the size field of the digest.

  • Retry logic - Some stores (like s3) you might be able to retry on an error. Retry & recovery is supported in these cases (without the client knowing).

  • Heavily tested - over 100 unit tests so far. Any bug detected always gets a regression test.

  • Extremely small memory footprint & no garbage collection
  • GRPC only endpoint

Remote Execution

Remote execution is still a work in progress. I estimate it will be in alpha stage sometime in May/June. Currently it has bazel properly talking to the scheduler & cas. The scheduler appears to properly schedule jobs w/ priorities, and the worker API to interact with the scheduler is all implemented. The next stage is to implement the workers.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/fe9051ff-8ee1-496b-8732-7bc76dd65079n%40googlegroups.com.
Reply all
Reply to author
Forward
0 new messages