Akka Persistence, Cassandra and compaction strategies (DTCS)

191 views
Skip to first unread message

Anders Båtstrand

unread,
Jul 23, 2015, 8:52:32 AM7/23/15
to Akka User List
Dear group

I am quite new to Akka persistence, and very new to Cassandra. The combination is working fine, however, except one problem: Mass deletions and compaction.

It seems, if I understood Cassandra correctly, that my usage is an anti-pattern (*). I have actors that read a lot of messages, and persist a hash of them (for duplicate detection). Every hour I take a snapshot, and deletes old messages. It is about 20'000 messages each hour (for each actor, does not seem much to me). But I guess this way of using Akka Persistence is quite typical?

After a while, the deletes start to time out, and compaction is taking a really long time.

As I understood it, DTCS (Date-Tiered Compaction Strategy) (+) should be more suited, since data is inserted in order, and deleted from the end.

Is this correct? Has any of you problems with compactions, done some testing etc? Any experience on the matter?

Best regards,

Anders Båtstrand

Links:

* http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
+ http://www.datastax.com/dev/blog/datetieredcompactionstrategy

Vaughn Vernon

unread,
Jul 23, 2015, 11:25:38 PM7/23/15
to Akka User List, ande...@gmail.com
As a rule you should never delete messages/events from the store. I am not sure of the original motivation for allowing deletes, but for event sourcing it shouldn't happen ever.

Vaughn

Anders Båtstrand

unread,
Jul 24, 2015, 4:24:06 AM7/24/15
to akka...@googlegroups.com
I do delete after snapshot, so the functionality is intact.

What do you do about disk usage, and startup time? These actors never
die, so the event log would grow by 20'000 events each hour forever...

Anders

Viktor Klang

unread,
Jul 24, 2015, 4:33:27 AM7/24/15
to Akka User List

20k/h is 5.5 per second and given about 1kb per event, that is about 20mb per hour,
this is 480mb per day, which is about 171gb per year,
which, according to this means that it will cost you about $5.13 per year (going down).

Given compression, you could most probably get it down to about 25% of that.

Are you sure you are optimizing for the right thing here?


--
>>>>>>>>>>      Read the docs: http://akka.io/docs/
>>>>>>>>>>      Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>>      Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.



--
Cheers,

Anders Båtstrand

unread,
Jul 24, 2015, 4:47:45 AM7/24/15
to akka...@googlegroups.com
But this is just one single actor in a much bigger system... In total
the event log is growing with 4 GB each day. This is, however, not
optmized at all, and we could probably save a lot of space by being
smarter about what we persist...

I see no reason to leave the data in the event log, as we have
snapshots. Also, when we change the classes we persist, the old events
in the store will probably be incompatible with the new actors. The
only reason I can think of for not deleting anything, is that it is an
expensive operation.

Are you saying you generally never delete anything from the event
store, at all? Or do you delete manually when needed?

Anders
> You received this message because you are subscribed to a topic in the
> Google Groups "Akka User List" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/akka-user/BETNfkbXvgw/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to

Viktor Klang

unread,
Jul 24, 2015, 4:53:44 AM7/24/15
to Akka User List
My PoV is that if an event doesn't have any current nor potential future value then why persist it in the first case (i.e. "am I being too granular here?"), otherwise I assume that storing it indefinitely will be valuable since it can be used for all sorts of analytics down the line.

Anders Båtstrand

unread,
Jul 27, 2015, 4:04:33 AM7/27/15
to Akka User List, viktor...@gmail.com
Some events might not be allowed to be stored for over a certain time (they contain personal information, positions etc).

For various other reasons as well (not related to Akka at all), I will have to show my team I am able to control the disk usage, and delete messages. At a later point, I might be able to argue that we will have value in storing most of the events forever...

I am still interested if anyone has tried different compaction strategies with Akka Persistence and Cassandra!

Best regards,

Anders

Daniel Schröter

unread,
Jul 31, 2015, 9:50:13 AM7/31/15
to Akka User List, viktor...@gmail.com, ande...@gmail.com
Hmm it looks like you would be better of using Kafka for your persistence.
->kafka will automatically delete events after 7 days
->you should create a snapshot every week
->and keep the latest snapshots...
see https://github.com/krasserm/akka-persistence-kafka/

Best regards
Daniel
Reply all
Reply to author
Forward
0 new messages