Hi Andy,
Those queries are the ones that read-side processors use to poll for events.
There are a few questions I have and a few suggestions for things you can check out:
Is it the Lagom process or the Cassandra process that is burning CPU? In case you weren't aware, Lagom is not actually running Cassandra in the same JVM, it is forking a process. You can use 'jps -l' and look for a process with "Cassandra" in the name to find the pid.
I think scan and park indicate that the ForkJoinPool is waiting for work. Are you sure it's CPU time and not wall clock time? In either case, this would indicate a mostly idle service, so that also suggests that it is Cassandra consuming CPU.
As I said before, these queries are normal behavior for read-side processors. There are a few things that determine the number and frequency of the queries:
When you are using sharded event tags, you'll have one polling loop per shard for each read-side processor, and each of those will issue the query every three seconds by default.
So if you're using 11 services * ~2-3 read-side processors per service * 4 shards per event tag, then the rate of queries you're seeing seems to be in the right ballpark.
You can adjust the number of event tag shards to change the amount of parallelism in your read-side processors. It's likely that the number of shards you'll want might differ between development and production, in which case I'd recommend using a config property. Be aware, however, that changing the number of shards after events have been written will mean that you can lose ordering consistency. Ordinarily, all events for a given persistence ID will be written to the same sharded tag, but if you change the number of shards, this will cause the shard assigned to a persistence ID to change. It's OK to change in development/test if you're wiping the data every time.
In particular, you can adjust the cassandra-query-journal.refresh-interval property to poll less frequently than every 3s. The trade-off here is that there will be potentially be a longer delay before a read-side processor picks up a new event.
You can mitigate this by setting the cassandra-journal.pubsub-minimum-interval property (
https://github.com/akka/akka-persistence-cassandra/blob/v0.58/core/src/main/resources/reference.conf#L255-L264). This causes Akka Persistence Cassandra to send an internal pub-sub broadcast message when it writes an event, which the read-side processors will automatically subscribe to. When they receive this message, they will issue a new poll immediately. So this allows you to set a much longer refresh-interval period without seeing a huge delay when there is a new event. This won't help much if you are actually writing a high volume of events, but in development if things are mostly idle it will reduce the amount of background activity.
Cheers,
Tim