Re: [bazel-discuss] Re: TurboCache - A new remote execution / cache server

75 views

Skip to first unread message

Steven Bergsieker

unread,

May 12, 2022, 2:27:49 PM5/12/22

to Fredrik Medley, Remote Execution APIs Working Group, bazel-discuss

Hi Nathan,

Thanks for posting this! You may want to add TurboCache to the list of known server implementations of the Remote Execution API: https://github.com/bazelbuild/remote-apis#servers

You might also be interested in joining remote-exe...@googlegroups.com and attending our monthly sync for all things remote execution-related. (Joining the group should invite you to the meeting automatically.)

Thanks,

Steven

On Sun, Apr 24, 2022 at 4:58 PM Fredrik Medley <fredrik...@gmail.com> wrote:

Can you mention, on high level, some specific pain points in the other remote systems, thinking specifically about Bazel Remote, Buildbarn and Buildfarm? I'm curious because I don't want to do the same mistakes myself.

Also, don't forget to add TurboCache to the remote-apis-testing repository: https://remote-apis-testing.gitlab.io/remote-apis-testing/

Best regards,
Fredrik

fredag 15 april 2022 kl. 05:04:02 UTC+2 skrev thegr...@gmail.com:
Hi team,
TL;DR
There's a new bazel remote execution / cache project: https://github.com/allada/turbo-cache
I thought I'd put this out there and see if there is any early feedback on a new project I've been working on on-and-off for the last couple years and now actually have lots of time to bring it to completion. Currently it is unlicensed because I have not yet chosen what to do with it, but it is open source and will likely apply at least a LGPL or even more permissive license in the near future.
I'd love any feedback any of you have.
My Background
A few years ago at my previous job I built out our remote execution farm (BuildBarn) that serviced about:
~1.2 million unit tests per day (seconds to run)
~20k integration tests per day (median test duration was ~8 mins)
~300k build jobs per day.
~1 petabyte of cache / month
We spent a huge amount of time trying to keep things stable and to keep infra related issues to a minimum. Over winter break of 2020 I had some free time, so I decided to start a new project to start from scratch and build the server remote execution / cache server myself with all the hindsight available to me.
I decided to write the entire thing in Rust to help with stability (and I wanted to try the brand new Async/Await [which is awesome btw]). I also wanted to implement some cool features that I thought this space was lacking.
Current State
There are 2 main parts of the project: Remote Cache and Remote Execution.
Remote Cache
Remote cache is in the alpha stage. It currently supports:
Memory store - Data will live in the same machine memory (with eviction policies)
S3 store - Services objects that live in a service that supports S3 calls
Compression store - Will compress data (lz4) then forward on to another store
Dedup (de-duplication) store - Uses a rolling hash algorithm to find parts of files that are the same and only process and store the parts that have changed (similar algorithm that rsync & bup uses). Very efficient when large files only a few changes in them.
FastSlow store - Tries one store first, then if not found tries the slow store (and then populates the fast store)
Filesystem store - Stores objects on disk
SizePartitioning store - Chooses a store to place objects based on the size field of the digest.
Retry logic - Some stores (like s3) you might be able to retry on an error. Retry & recovery is supported in these cases (without the client knowing).
Heavily tested - over 100 unit tests so far. Any bug detected always gets a regression test.
Extremely small memory footprint & no garbage collection
GRPC only endpoint
Remote Execution
Remote execution is still a work in progress. I estimate it will be in alpha stage sometime in May/June. Currently it has bazel properly talking to the scheduler & cas. The scheduler appears to properly schedule jobs w/ priorities, and the worker API to interact with the scheduler is all implemented. The next stage is to implement the workers.

--
You received this message because you are subscribed to the Google Groups "bazel-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to bazel-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/bazel-discuss/fe9051ff-8ee1-496b-8732-7bc76dd65079n%40googlegroups.com.

Reply all

Reply to author

Forward

0 new messages