Is the roachprod cluster automatically wiped in between siblings? I thought it wasn't and that various subtests do that manually. Perhaps I'm mistaken.
One item to be aware of is the special setup that the jepsen tests do. That setup is time consuming and the subtests want to share an already set up cluster. I had thoughts on adding some sort of tag to clusters to indicate that some initialization had been performed, but never got around to doing it as the existing solution for the jepsen tests was sufficient. If you're revisiting how subtests work you might need to do something in this area. We definitely don't want every jepsen subtest running `initJepsen` from scratch.
On Fri, Oct 12, 2018, 10:54 AM Peter Mattis <pe...@cockroachlabs.com> wrote:Is the roachprod cluster automatically wiped in between siblings? I thought it wasn't and that various subtests do that manually. Perhaps I'm mistaken.Today subtests do the wiping manually I think, but I would like to make that automatic, between what today are sibling subtests and between root tests. That would be a good thing, wouldn't it?
One item to be aware of is the special setup that the jepsen tests do. That setup is time consuming and the subtests want to share an already set up cluster. I had thoughts on adding some sort of tag to clusters to indicate that some initialization had been performed, but never got around to doing it as the existing solution for the jepsen tests was sufficient. If you're revisiting how subtests work you might need to do something in this area. We definitely don't want every jepsen subtest running `initJepsen` from scratch.Yes, I'm planning on dealing with Jepsen by introducing a cluster tagging mechanism, applied by a test once it performed things like initJepsen(). Separately, tests would use the same tag in their cluster spec. One way or another, these features in concert would make the scheduler give big preference to running sequences of similarly tagged tests on the same cluster.This would all be orthogonal to the wiping; clusters would still be `roachprod wiped` between tests, but that doesn't destroy the initJepsen() stuff (which is package installs and such).
--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CANKgOKh74mRrZ032Yf7Ye1fo%3DBTEd-1A0eo6aJgUuJvwOJT7kQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
~ time roachprod create -c aws -n 1 peter-test...real 0m36.694suser 0m6.361ssys 0m1.390s~ roachprod destroy peter-test...real 0m3.954suser 0m1.961ssys 0m0.363s
~ time roachprod create -c gce -n 1 peter-test...real 0m49.709suser 0m5.965ssys 0m1.331s~Ā time roachprod destroy peter-test...real 1m29.029suser 0m2.097ssys 0m0.409s
+ Ben
Hmmm so then what do you reckon would be the right abstraction here?
It seems to me that either we want Jepsen clusters to be reused by both Jepsen and non-Jepsen tests alike, or not reused by anybody. Does it make sense to say that they can be reused only by other "Jepsen tests"?
It seems to me that either we want Jepsen clusters to be reused by both Jepsen and non-Jepsen tests alike, or not reused by anybody. Does it make sense to say that they can be reused only by other "Jepsen tests"?Yes, I think the policy we want is that jepsen clusters can only be reused by other jepsen tests. Generally, I'd give tests a "cluster reuse tag" and only reuse clusters across tests with the same tag.Ā
It seems to me that either we want Jepsen clusters to be reused by both Jepsen and non-Jepsen tests alike, or not reused by anybody. Does it make sense to say that they can be reused only by other "Jepsen tests"?Yes, I think the policy we want is that jepsen clusters can only be reused by other jepsen tests. Generally, I'd give tests a "cluster reuse tag" and only reuse clusters across tests with the same tag.ĀBut why? (My message was implying that I don't really see how this policy makes sense). Are Jepsen tests generally somehow more resilient to the kind of mucking that some of them do to the machines than other, more naive, tests?
> (I'd probably also destroy the clusters of any failing tests to start clean on the next one). Be conservative about cluster reuse.
I'd rather be more liberal :). You can imagine that a change causes a lot of tests to fail. We don't want to do away with all the benefits of cluster reuse for these runs.
I believe the vast majority of tests are fine with cluster reuse after a wipe. The wipe destroys everything cockroach-related. Only "special" tests do things outside of cockroach.
`--parallelism` affects only the test runner, right? Real perf tests wouldn't want to use the test runner's CPU for anything. They start workloads on a remote node. (Maybe I'm misunderstanding your point here).
My point of mentioning performance tests is that they want to control their environment which typically includes the file system, whether it's local ssd or not, whether the device is mounted nobarrier or not, etc. But I may not have a big point here because lots of that needs to be specified when creating the node. Filesystems etc aren't (but perhaps that will change), but the tests can assert that they're in the right environment (and we're likely going to have N workloads to be tested in each environment, so reuse makes sense once you've set up the right one).
You're right that with `--parallelism=10` the streamlining goes away to some extent. But now I'm not so sure that cluster reuse is really that pressing a problem if the provisioning time is what counts, can't --parallelism=auto just run the tests as fast as it can? Sure, it's a little less than optimal, but how much less? Or are we re-doing various apt-gets and disk reformats many times if we don't keep the hardware around?
> (let me know if you know of more funkies).Anything that reformats (search for zfs) or calls RemountNoBarrier which notably includes TPCC.
--
You received this message because you are subscribed to the Google Groups "CockroachDB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cockroach-db...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/cockroach-db/CAPqkKgmL1mHbLnLsyGGGd0QSUTY3YUwf0rPc1zMzRGPC6hHYZw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.