ClusterSharding and node address/port

Brice Figureau

unread,

Feb 27, 2015, 10:16:45 AM2/27/15

to akka...@googlegroups.com

Hi,

I was experimenting with the ClusterSharding system (the one in master
with rememberEntries true) and persistence in a simplistic test that
those the following:

1) Start a ClusterSharding region on one node only
2) start an entry, and communicate with it
3) shutdown the ActorSystem
4) Start a new ActorSystem (same name)
5) Start the same ClusterSharding region
6) expect the entry to be recreated by virtue of persistence and
'rememberEntries'

To my surprise it failed. The reason is that I add netty.tcp.port=0 in
my configuration, which attributes a random port for each ActorSystem.

Of course, in 4) the new ActorSystem has a distinct port from the one
created at the start of the test. This means that when replaying the
ShardCoordinator events, the unserialized ActorRef points to a different
node than the current one, which later will create some issues since it
doesn't exist anymore.

Now, the problem is not this simplistic test (that I can fix by
assigning a proper fixed port). It's when this system will be used in a
production cluster where there's no guarantee that there will always be
the same node present between restarts of the ShardCoordinator. For
instance if I completely shutdown the production cluster, and recreate a
new one on different EC2 instances (for instance) that would have
completely different IP address, then the ShardCoordinator wouldn't be
able to properly be configured.

Is it a bug or did I miss something?
--
Brice Figureau <bric...@daysofwonder.com>

Björn Antonsson

unread,

Mar 3, 2015, 4:44:54 AM3/3/15

to akka...@googlegroups.com

Hi Brice,

Are you sure that it is the Sharding that is the issue and not something in the messages that you send to the sharded actors? As far as I can see, the sharding itself only persists string IDs of the entities and if you don't include any address specific information in there or persist actor refs in your sharded entries, you should be fine.

B/

--
>>>>>>>>>> Read the docs: http://akka.io/docs/
>>>>>>>>>> Check the FAQ: http://doc.akka.io/docs/akka/current/additional/faq.html
>>>>>>>>>> Search the archives: https://groups.google.com/group/akka-user
---
You received this message because you are subscribed to the Google Groups "Akka User List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to akka-user+...@googlegroups.com.
To post to this group, send email to akka...@googlegroups.com.
Visit this group at http://groups.google.com/group/akka-user.
For more options, visit https://groups.google.com/d/optout.

--
Björn Antonsson
Typesafe Inc. – Reactive Apps on the JVM
twitter: bantonsson

JOIN US. REGISTER TODAY!

Björn Antonsson

unread,

Mar 3, 2015, 5:37:49 AM3/3/15

to akka...@googlegroups.com

Hi Brice.

I just noticed the other discussion. You are right, the Region that is persisted contains an actor ref.

B/

Patrik Nordwall

unread,

Mar 3, 2015, 6:16:40 AM3/3/15

to akka...@googlegroups.com

On Tue, Mar 3, 2015 at 11:37 AM, Björn Antonsson <bjorn.a...@typesafe.com> wrote:

Hi Brice.

I just noticed the other discussion. You are right, the Region that is persisted contains an actor ref.

and that serialized ref contains full address information and uid, so eventually it will be removed by the watch

--

Patrik Nordwall
Typesafe - Reactive apps on the JVM
Twitter: @patriknw

Brice Figureau

unread,

Mar 4, 2015, 10:10:45 AM3/4/15

to akka...@googlegroups.com

On Tue, 2015-03-03 at 12:16 +0100, Patrik Nordwall wrote:
>
>
> On Tue, Mar 3, 2015 at 11:37 AM, Björn Antonsson
> <bjorn.a...@typesafe.com> wrote:
> Hi Brice.
>
>
> I just noticed the other discussion. You are right, the Region
> that is persisted contains an actor ref.
>
>
> and that serialized ref contains full address information and uid, so
> eventually it will be removed by the watch

Yes that's what I said in the other thread, still you might ping a
machine that you don't own anymore (especially in an elastic cloud
system), and worst you incur a restart delay until the remote system
understands there's no ActorSystem running there.

I'm not sure there is a solution, though.

Does this deserve a bug report?
--
Brice Figureau <bric...@daysofwonder.com>

Patrik Nordwall

unread,

Mar 4, 2015, 10:30:20 AM3/4/15

to akka...@googlegroups.com

No, it works as designed and based on this I see no reason to change that. Thanks anyway.
/Patrik

Reply all

Reply to author

Forward