Hello All,
I am wondering if there is something in the pipeline for managing test services in an optimal manner.
The best strategy for native services in testing currently is to provide binaries for multiple platforms(Linux/Mac) in Bazel via a repo rule and then write a service launcher --e.g., junit 4 rule (we use junit5 @RegisterExtension).
# Example: How we test with Postgres.
We are using the strategy mentioned above for Postgres and all of our persistence tests are now hermetic. We can run them with zero setup on Mac and Linux. JVM embedded DB's are not an option for us as we have our own ORM layer and we use many advanced Postgres features.
The existing JVM embedded Postgres solutions were downright terrible 10-20 second wall clock time to get a instance migrated and ready. This is for various reasons (artifact size, zipped artifacts embedded in jars, unoptimised launchers).
I recompiled Postgres from scratch for Mac and Linux (7mb artifact for linux vs 50 - 100+ mb instances that other solutions provide), we have repo rules to select the binaries for the platform and an optimised Postgres test manager for that jvm that starts up and migrates in the bazel sandbox in 2.5 - 4 seconds.
We have 40 test suites that use Postgres in this way. Moving to hermetic testing for our integration tests has really driven home the benefits of using bazel in my team.
However things could be improved further, the startup cost is of Postgres is incurred repeatedly. For developers on the laptops with standard specs (15" MacBooks, low spec) the cost of startup is much higher. For Postgres a separate isolated physical db server instance is not needed for each test suite, we just need an isolated logical named database within a shared server instance.
If we had a test service manager and it ensures that a db was created for the test we would not incur the startup costs. Migrating a db for a test suite only takes 500ms wall time and most of this is the JVM jitting as this is the first step that occurs when bring up the microservice that will be tested.
# Example: Minikube
Another example is minikube -- most services do not require a seperate kubernetes cluster for each test suite but rather an isolated namespace. We could start a single Minikube cluster and share that with multiple test suites. Minikube is probably the best example to drive this discussion.
The approach I used above is not doable with minikube as it is very slow to startup, the startup procedure is very brittle on a Mac at least (it is still quite experimental), it is slow to start up in general and it is not realistic to have more than one instance running at a time.
# Possible solution
Whilst tighter integration with docker could be the solution for a lot of test services needs it's not a universal solution --i.e., if there was first class docker for test services support it would cover the Postgres case above but Minikube can't run inside docker.
Test could perhaps declare requirements as follows:
java_test(
....
test_requirements = [
"@postgres_test_services//:database_instance",
"@minikube_test_services//:namespace"
],
...
)
`@postgres_test_services//:database_instance` could be a rule of type `*_test_service_template`, declaration and mechanics of such rules could be like toolchains. It could declare the work process that needs to start, the attributes supported etc.
When the worker starts up it lifecycle events and responds via stdin and stdout like the existing worker protocol. The worker protocol could write out a `map<string,string>` with elements needed by the test cases --e.g., db name, db port for postgres and namespace name for minikube. This could then be populated as env variables for the tests.
In the example above a persistent test worker would be spun up at the start of a huge test suite run. For each test rule that declares requirements it could receive start and stop events. At the end of each test the created resources are released and at the end of the entire suite the server could optionally be torn down.
The example above gets rid of a heavy service launcher component -- in our postgres example we have a heavily optimised postgres service manager and it took a lot of time to get it right, This would no longer be a part of the test bootstrap but would be moved out to the persistent worker and would be simplified, part of the reason the launcher is so complex is because it needs to be fast.
Hope this monologue made some sense.
Cheers
Hassan