Understanding Lagom

466 views
Skip to first unread message

Mangesh Deshmukh

unread,
Dec 2, 2016, 2:22:49 PM12/2/16
to Lagom Framework Users
Hi,

I am trying to understand Lagom Framework with Chirper application. Pardon my newbie questions. As far as possible I tried to see if these are answered before on forum or in documentation.

- Where are the entities stored after Persist directive? I could understand that they are stored in event log and the state is updated with the latest event. From documentation, I understand that active entities are in memory. But I guess they are also persisted somewhere (perhaps cassandra) permanent. If it is cassandra, which part of code does this? Also, what are the default connection settings for Cassandra if I were to look at the data?

- In production scenario, if we were to use PersistentEntity, does it mean that all the service instances (nodes) are active and must be taking requests? In other words, is it possible to keep some nodes up but not taking external requests but open for internal requests?


Thanks,
Mangesh

James Roper

unread,
Dec 4, 2016, 6:44:15 PM12/4/16
to Mangesh Deshmukh, Lagom Framework Users
On 3 December 2016 at 06:22, Mangesh Deshmukh <mang...@gmail.com> wrote:
Hi,

I am trying to understand Lagom Framework with Chirper application. Pardon my newbie questions. As far as possible I tried to see if these are answered before on forum or in documentation.

- Where are the entities stored after Persist directive? I could understand that they are stored in event log and the state is updated with the latest event. From documentation, I understand that active entities are in memory. But I guess they are also persisted somewhere (perhaps cassandra) permanent. If it is cassandra, which part of code does this? Also, what are the default connection settings for Cassandra if I were to look at the data?

The entities are not persisted anywhere - that's the point of event sourcing, you just store the events, which can be implemented very simply, be easily distributed, and done with very high performance since it's just an append operation.  When an entity needs to be loaded, the events for that entity are loaded, and the event handlers that you've declared to handle the events then process each event to produce the current entity state.

As an optimisation, the entities do hang around in memory for a limited amount of time, as you've pointed out.  It's important to note that this is just an optimisation, it doesn't strictly need to be done, your entity could be reloaded from the persisted events in the database every time you needed to handle a command, it's just that this can be a fairly expensive operation, and since often the same entity tends to be used multiple times in a short timespan, it can be a great benefit to keep them around in memory for a limited amount of time.

As a further optimisation, Lagom supports something called snapshotting.  This is where every so many events (I think we default to 100), the entity state itself is stored to a database.  This means that from then on, to load the entity, Lagom will first load the snapshot, and then it will load all the events since the snapshot, and replay them - this ensures that loading an entity doesn't become prohibitively expensive as the number of events increases.  But once again, this is just an optimisation, and not something that your application code should have any knowledge of.  And in fact, at any time, you can drop the snapshots table, and not lose any data, since it's all there still in the events.

In dev mode, by default, Cassandra is run on port 4000.  The events are stored, by default, in a table called "messages".
 
- In production scenario, if we were to use PersistentEntity, does it mean that all the service instances (nodes) are active and must be taking requests? In other words, is it possible to keep some nodes up but not taking external requests but open for internal requests?

This question isn't really related to persistent entities.  Do you mean HTTP requests?  This all depends on the configuration of your service gateway, if you're using haproxy, this would be ConductR.  Out of the box there's nothing really that allows you to configure that, but there is certainly no reason why you couldn't create two different configurations, and tell haproxy to only route to one of them.
 
Thanks,
Mangesh

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framework+unsubscribe@googlegroups.com.
To post to this group, send email to lagom-framework@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lagom-framework/16dbf097-02f0-43ca-9803-3b680819182e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Lightbend – Build reactive apps!
Twitter: @jroper

Mangesh Deshmukh

unread,
Dec 5, 2016, 12:36:59 PM12/5/16
to Lagom Framework Users, mang...@gmail.com
HI James,

Thanks for the detailed response. Please see comments inline.


On Sunday, December 4, 2016 at 3:44:15 PM UTC-8, James Roper wrote:
On 3 December 2016 at 06:22, Mangesh Deshmukh <mang...@gmail.com> wrote:
Hi,

I am trying to understand Lagom Framework with Chirper application. Pardon my newbie questions. As far as possible I tried to see if these are answered before on forum or in documentation.

- Where are the entities stored after Persist directive? I could understand that they are stored in event log and the state is updated with the latest event. From documentation, I understand that active entities are in memory. But I guess they are also persisted somewhere (perhaps cassandra) permanent. If it is cassandra, which part of code does this? Also, what are the default connection settings for Cassandra if I were to look at the data?

The entities are not persisted anywhere - that's the point of event sourcing, you just store the events, which can be implemented very simply, be easily distributed, and done with very high performance since it's just an append operation.  When an entity needs to be loaded, the events for that entity are loaded, and the event handlers that you've declared to handle the events then process each event to produce the current entity state.

[MD] Got it. I assume the events are stored in Cassandra as soon as generated as well as the last event is stored as a state in memory. I also remember reading somewhere that the request is automatically routed to the service instance that hosts the entity. If so, is the request routing decided by gateway? Or is it more of a cassandra functionality?
 
As an optimisation, the entities do hang around in memory for a limited amount of time, as you've pointed out.  It's important to note that this is just an optimisation, it doesn't strictly need to be done, your entity could be reloaded from the persisted events in the database every time you needed to handle a command, it's just that this can be a fairly expensive operation, and since often the same entity tends to be used multiple times in a short timespan, it can be a great benefit to keep them around in memory for a limited amount of time.

As a further optimisation, Lagom supports something called snapshotting.  This is where every so many events (I think we default to 100), the entity state itself is stored to a database.  This means that from then on, to load the entity, Lagom will first load the snapshot, and then it will load all the events since the snapshot, and replay them - this ensures that loading an entity doesn't become prohibitively expensive as the number of events increases.  But once again, this is just an optimisation, and not something that your application code should have any knowledge of.  And in fact, at any time, you can drop the snapshots table, and not lose any data, since it's all there still in the events.

[MD]  Very good explanation. Thanks.

In dev mode, by default, Cassandra is run on port 4000.  The events are stored, by default, in a table called "messages".

[MD] Was able to connect to Cassandra and browse the data. It helps to understand what is happening under the hood.
 
 
- In production scenario, if we were to use PersistentEntity, does it mean that all the service instances (nodes) are active and must be taking requests? In other words, is it possible to keep some nodes up but not taking external requests but open for internal requests?
 
This question isn't really related to persistent entities.  Do you mean HTTP requests?  This all depends on the configuration of your service gateway, if you're using haproxy, this would be ConductR.  Out of the box there's nothing really that allows you to configure that, but there is certainly no reason why you couldn't create two different configurations, and tell haproxy to only route to one of them.

[MD] Correct. It was totally separate question. Unrelated to persistent entities. Should have clarified that. 
Yes I was talking about HTTP requests. My question was based on my assumption that the request gets automatically routed to the service instance hosting persistent entity. In which case, I thought if the node is up and there is a request for an entity on this node, it will have to serve the request (internal or external).
The reason why one would want to keep some nodes up and not take any live traffic is when we make changes to the service and deploy it. In this case, one would want to test the changes internally first (on few nodes) before opening them up for external use. Hope I am making sense.
 
 
Thanks,
Mangesh

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framewo...@googlegroups.com.
To post to this group, send email to lagom-f...@googlegroups.com.

James Roper

unread,
Dec 5, 2016, 9:18:07 PM12/5/16
to Mangesh Deshmukh, Lagom Framework Users
On 6 December 2016 at 04:36, Mangesh Deshmukh <mang...@gmail.com> wrote:
HI James,

Thanks for the detailed response. Please see comments inline.

On Sunday, December 4, 2016 at 3:44:15 PM UTC-8, James Roper wrote:
On 3 December 2016 at 06:22, Mangesh Deshmukh <mang...@gmail.com> wrote:
Hi,

I am trying to understand Lagom Framework with Chirper application. Pardon my newbie questions. As far as possible I tried to see if these are answered before on forum or in documentation.

- Where are the entities stored after Persist directive? I could understand that they are stored in event log and the state is updated with the latest event. From documentation, I understand that active entities are in memory. But I guess they are also persisted somewhere (perhaps cassandra) permanent. If it is cassandra, which part of code does this? Also, what are the default connection settings for Cassandra if I were to look at the data?

The entities are not persisted anywhere - that's the point of event sourcing, you just store the events, which can be implemented very simply, be easily distributed, and done with very high performance since it's just an append operation.  When an entity needs to be loaded, the events for that entity are loaded, and the event handlers that you've declared to handle the events then process each event to produce the current entity state.

[MD] Got it. I assume the events are stored in Cassandra as soon as generated as well as the last event is stored as a state in memory. I also remember reading somewhere that the request is automatically routed to the service instance that hosts the entity. If so, is the request routing decided by gateway? Or is it more of a cassandra functionality?

It's neither, it's Akka persistence functionality.  Lagom's persistence API is essentially a thin layer on top of Akka persistence and Akka clustering.  It's Akka clustering that distributes the entities across your Lagom nodes, using a feature called Akka cluster sharding, and communication between the nodes is done using Akka remoting.

So what happens is a request may come in from somewhere, and hits one of your Lagom service nodes.  Which node it hits and how that is decided is not relevant at this point, it's a concern of the load balancer/gateway/whatever, and has nothing to do with the persistent entities.  The request may have arrived on the node that the entity lives, or it may not.  It doesn't matter.

Then, in handling that request, you may issue a command on a persistent entity.  You call:

PersistentEntityRef<MyCommand> ref = persistentEntityRegistry.refFor(someEntityId);
ref.ask(someCommand);

Lagom (and underneath implemented by Akka persistence) knows which node the entity for someEntityId lives.  When you call ref.ask, if we're already on the node that the entity lives, then that command is just going to be passed in memory to the entity.  Otherwise, if it's on a different node, the command will be sent over the network using Akka's remoting protocol to the node where the entity lives.  There it will be handled, and once handled, the reply will be sent back over the network.  This, by the way, is why all your commands, not just your events and state, should implement Jsonable, so that they can be serialized and sent over the network (if they don't, it will use Java serialization, which is slow and has big security problems).

You also mentioned Cassandra - Cassandra itself also shards the data it stores over many nodes, so if as a result of handling the command an event gets persisted, the sharding of that will be handled by Cassandra.
 
 
As an optimisation, the entities do hang around in memory for a limited amount of time, as you've pointed out.  It's important to note that this is just an optimisation, it doesn't strictly need to be done, your entity could be reloaded from the persisted events in the database every time you needed to handle a command, it's just that this can be a fairly expensive operation, and since often the same entity tends to be used multiple times in a short timespan, it can be a great benefit to keep them around in memory for a limited amount of time.

As a further optimisation, Lagom supports something called snapshotting.  This is where every so many events (I think we default to 100), the entity state itself is stored to a database.  This means that from then on, to load the entity, Lagom will first load the snapshot, and then it will load all the events since the snapshot, and replay them - this ensures that loading an entity doesn't become prohibitively expensive as the number of events increases.  But once again, this is just an optimisation, and not something that your application code should have any knowledge of.  And in fact, at any time, you can drop the snapshots table, and not lose any data, since it's all there still in the events.

[MD]  Very good explanation. Thanks.

In dev mode, by default, Cassandra is run on port 4000.  The events are stored, by default, in a table called "messages".

[MD] Was able to connect to Cassandra and browse the data. It helps to understand what is happening under the hood.
 
 
- In production scenario, if we were to use PersistentEntity, does it mean that all the service instances (nodes) are active and must be taking requests? In other words, is it possible to keep some nodes up but not taking external requests but open for internal requests?
 
This question isn't really related to persistent entities.  Do you mean HTTP requests?  This all depends on the configuration of your service gateway, if you're using haproxy, this would be ConductR.  Out of the box there's nothing really that allows you to configure that, but there is certainly no reason why you couldn't create two different configurations, and tell haproxy to only route to one of them.

[MD] Correct. It was totally separate question. Unrelated to persistent entities. Should have clarified that. 
Yes I was talking about HTTP requests. My question was based on my assumption that the request gets automatically routed to the service instance hosting persistent entity. In which case, I thought if the node is up and there is a request for an entity on this node, it will have to serve the request (internal or external).
The reason why one would want to keep some nodes up and not take any live traffic is when we make changes to the service and deploy it. In this case, one would want to test the changes internally first (on few nodes) before opening them up for external use. Hope I am making sense.

You are making sense.  As explained above, the HTTP routing is orthogonal to the persistent entity command routing.

There are possibly ways that what you're describing can be achieved, however, it may not make sense to do so.  The way Akka cluster sharding works is either a node is hosting entities, or it's not.  And you can't have entities that live on multiple nodes (otherwise, changes to those entities wouldn't be transactionally serializable, ie two things could make conflicting modifications to them at the same time).  So testing things internally doesn't really make sense in that context, unless you segregate some entities so that those entities themselves aren't available externally and are only interacted with internally - but then the exact same thing could be achieved in a test system.

Not quite related to your question, but certainly related to dealing with bad deployments, is that when you use event sourcing, you have a lot more power in terms of what you can do if a bad deployment happens and things get corrupted.  Since you've persisted every event, it's actually quite straight forward to rewind your entire system back to a good known state, just by deleting (or moving to another location for further audit and replay) the events since that point.  It's also quite straight forward to audit everything that happened since the bad deploy, you have a list of all the events, you can do analysis on them to work out which ones may be illegal or may have caused an inconsistency that needs to be rectified.  If you have a problem caused by a deployment bug with a read side, a simple solution is often to simply drop the read side and let it be rebuilt from scratch.  And while these features of ES/CQRS are certainly no substitute for good QA practices, they do let you deploy to production with far more confidence since you have that safety net if things do go wrong.

 
 
Thanks,
Mangesh

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framewo...@googlegroups.com.
To post to this group, send email to lagom-f...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lagom-framework/16dbf097-02f0-43ca-9803-3b680819182e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
James Roper
Software Engineer

Lightbend – Build reactive apps!
Twitter: @jroper

--
You received this message because you are subscribed to the Google Groups "Lagom Framework Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to lagom-framework+unsubscribe@googlegroups.com.
To post to this group, send email to lagom-framework@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/lagom-framework/f23cf5f9-b70d-40d6-9884-db6b081bbd97%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Mangesh Deshmukh

unread,
Dec 8, 2016, 1:08:41 PM12/8/16
to Lagom Framework Users, mang...@gmail.com
Hi James,

Thanks for the detailed response again. Would be good to hear the approaches (however bad they may be) to test the newly released updates to the code in production, internally, before making them publicly available. 
I will read up more on Akka clusters. I am in the process of trying the framework. Will come back here in case of any questions.

Thanks,
Mangesh

dheerajk

unread,
May 11, 2018, 11:25:06 AM5/11/18
to Lagom Framework Users [deprecated]
Hi All,

Thank you James and Mangesh. This discussion helped me a lot.
Thanks Again!

Regards,
DheerajK
Reply all
Reply to author
Forward
0 new messages