[eq-dev] RSP / mutlicast details

2 views
Skip to first unread message

Petros.Kataras

unread,
Nov 20, 2014, 12:40:55 PM11/20/14
to eq-...@equalizergraphics.com
Hello,

I am trying to setup a cluster with 1 master and 9 slaves that will use RSP
for mapping and delta distribution but I am not sure how the config file
should look exactly on the master side aka appNode..

Here is the config file that I have created so far:

#Equalizer 1.2 ascii

global
{
EQ_WINDOW_IATTR_HINT_FULLSCREEN ON
EQ_WINDOW_IATTR_HINT_SWAPSYNC OFF
}

server
{
connection { hostname "172.18.57.11" }
config
{
appNode
{
name "appNode"
host "sw-master"
connection{ type TCPIP }
connection
{
#type RSP
#port 12333
#interface "172.18.57.11"
#hostname "239.172.18.57"
hostname "172.18.57.11"
}
pipe
{
device 0
window
{
name "master"
attributes { hint_fullscreen OFF }
viewport [ .0 .0 1920 1080 ]
channel { name "master-channel" }
}
}
}
node
{
name "node1"
host "sw-renderer01"
connection{ type TCPIP }
connection
{
type RSP
port 12333
interface "172.18.57.12"
hostname "239.172.18.57"
}
pipe
{
device 0
window
{
name "node1"
viewport [ 0 0 1 1 ]
channel { name "node1-channel" }
}
}
}
node
{
name "node2"
host "sw-renderer02"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.13"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node2"
viewport [ 0 0 1 1 ]
channel { name "node2-channel" }
}
}
}
node
{
name "node3"
host "sw-renderer03"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.14"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node3"
viewport [ 0 0 1 1 ]
channel { name "node3-channel" }
}
}
}
node
{
name "node4"
host "sw-renderer04"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.15"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node4"
viewport [ 0 0 1 1 ]
channel { name "node4-channel" }
}
}
}
node
{
name "node5"
host "sw-renderer05"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.16"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node5"
viewport [ 0 0 1 1 ]
channel { name "node5-channel" }
}
}
}
node
{
name "node6"
host "sw-renderer06"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.17"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node6"
viewport [ 0 0 1 1 ]
channel { name "node6-channel" }
}
}
}
node
{
name "node7"
host "sw-renderer07"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.18"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node7"
viewport [ 0 0 1 1 ]
channel { name "node7-channel" }
}
}
}
node
{
name "node8"
host "sw-renderer08"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.19"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node8"
viewport [ 0 0 1 1 ]
channel { name "node8-channel" }
}
}
}
node
{
name "node9"
host "sw-renderer09"
connection{ type TCPIP }
connection
{
hostname "239.172.18.57"
interface "172.18.57.20"
type RSP
port 12333
}
pipe
{
device 0
window
{
name "node9"
viewport [ 0 0 1 1 ]
channel { name "node9-channel" }
}
}
}

layout { view { }}
canvas
{
layout 0
wall
{
bottom_left [ -1.6 -.5 -1 ]
bottom_right [ 1.6 -.5 -1 ]
top_left [ -1.6 .5 -1 ]
}
swapbarrier {}

segment { viewport [ 0 0 1 1 ] channel "master-channel" }
segment { viewport [ 0.00000000000000000000 0
0.11111111111111111111 1 ] channel "node1-channel" }
segment { viewport [ 0.11111111111111111111 0
0.11111111111111111111 1 ] channel "node2-channel" }
segment { viewport [ 0.22222222222222222222 0
0.11111111111111111111 1 ] channel "node3-channel" }
segment { viewport [ 0.33333333333333333333 0
0.11111111111111111111 1 ] channel "node4-channel" }
segment { viewport [ 0.44444444444444444444 0
0.11111111111111111111 1 ] channel "node5-channel" }
segment { viewport [ 0.55555555555555555555 0
0.11111111111111111111 1 ] channel "node6-channel" }
segment { viewport [ 0.66666666666666666666 0
0.11111111111111111111 1 ] channel "node7-channel" }
segment { viewport [ 0.77777777777777777777 0
0.11111111111111111111 1 ] channel "node8-channel" }
segment { viewport [ 0.88888888888888888888 0
0.11111111111111111111 1 ] channel "node9-channel" }
}
}
}

Now the appNode section is the one that I am not sure about ... Going
through older posts I see that people were subscribing also the appNode on
the multicast group but if I try to do that by uncommenting the relevant
lines in the config file then everything hangs.

On the other hand if I just run the config file as it is, then I can see
that the slaves properly subscribe on the appropriate multicast group and
the sample app runs properly but I am not sure if then the traffic is routed
through the multicast address or not..

The clients are auto-launched by the server btw and I haven't tried manually
pre-starting everything.

So, does the appNode need to subscribe to the multicast group or is it
sufficient to leave it like this ?

In case that the appNode actually has to subscribe to the multicast group do
you see something wrong with the above config file that it would cause it to
hang??

Thanks for any insights,
Petros



--
View this message in context: http://software.1713.n2.nabble.com/RSP-mutlicast-details-tp7586748.html
Sent from the Equalizer - Parallel Rendering mailing list archive at Nabble.com.

_______________________________________________
eq-dev mailing list
eq-...@equalizergraphics.com
http://www.equalizergraphics.com/cgi-bin/mailman/listinfo/eq-dev
http://www.equalizergraphics.com

ROHN Carsten

unread,
Nov 21, 2014, 2:26:00 AM11/21/14
to Equalizer Developer List
Hey Petros,

The config you posted will not send any RSP packets. Clients listen to RSP, but the appNode doesn't send anything. If you uncomment the commented stuff on the appNode, RSP starts working, and that's probably why it hangs ;)

There could be a lot of reasons for hanging. I recommend to experiment with coNetPerf first, if multicast really works in your network with your setup.

Server1: coNetPerf -s RSP#102400#239.172.18.57#172.18.57.12#12333##
Server2: coNetPerf -s RSP#102400#239.172.18.57#172.18.57.13#12333##
Serverx: ...
Client: coNetPerf -c RSP#102400#239.172.18.57#172.18.57.11#12333##

Does this test show packets being sent to the servers as you expect it? If yes, your problem is not multicast, but how Collage handles multicast. And that's where the debugging starts, I'm afraid. Where exactly does it hang?

Cheers,
Carsten

PS: A general note: We found multicast to be working pretty poorly for mapping, so we had to use the *CM::push() functions to distribute data and mapped the created objects to VERSION_NONE afterwards. This scales pretty well.
This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email and all attachments,

(iii) Realtime Technology does not accept or assume any liability or responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer

Petros Kataras

unread,
Nov 21, 2014, 4:12:35 AM11/21/14
to Equalizer Developer List
Hi Rohn,

thanks for your reply!

I am pretty sure that multicast works properly on the network setup since I tested with a simple node.js client/server script and I am able to receive multicast in this case..

The only problem that I faced in that case was that I had to listen to 0.0.0.0 in order to actually receive the multicast. i.e If I would try to listen for multicast explicitly on the 172 interface then I was not receiving anything..

I am not sure though if this is expected behavior with node.js or not..

Also something that I didn't mention on my previous post is that the appNode has two network interfaces; one to communicate with the external world and one for the local subnet ( of course with the multicast route configured for the appropriate interface ). As I said though this seems to work in general from the small test that I did with node.js.

I will try with coNetPerf today also and I will come back to you with more details also on where exactly the appNode hangs.

Thanks again,
Petros

Petros Kataras

unread,
Nov 21, 2014, 11:14:33 AM11/21/14
to Equalizer Developer List
Ok, looking again at the logs on the slaves side I realized that they were trying to resolve the hostname of the master so I added it to the known hosts and RSP seems to work now..

I m having other issues though that do not manifest when running the app normally without RSP.

I have a shared object that actually seems to map and work fine but I get an LBUNREACHABLE from objectStore.cpp line 700 . Reading through the source I can't figure exactly where the issue might lie and as I mentioned the object seems to register and map properly on the master and client side respectively..

Besides that I am also getting a segfault when exiting the app inside EventConnection::getNotifier() when running the RSP config. This segfault happens also with eqPly / seqPly and so on but I don't really have time to debug it right now...

Anyways I think for now I am gonna stick with unicast and maybe later on I ll give it a try again..

Thanks for the help anyways..

Cheers,
Petros

ROHN Carsten

unread,
Nov 24, 2014, 2:04:43 AM11/24/14
to Equalizer Developer List
I don't know what your line 700 looks like, unfortunately. But the segfault we resolved locally by a simple null pointer check.

Carsten

Petros Kataras

unread,
Nov 24, 2014, 10:04:36 AM11/24/14
to Equalizer Developer List
Hi Rohn,

Thanks for your reply and the hints.. I didnt step into the function to
see where exactly it crashes so its good to know that the fix is simple
as that..

On the other hand the second problem was due to my fault.. ( some old
code not properly cleaned up ) so RSP seems to actually work fine now,
although from purely qualitative tests it seems for me that the unicast
approach performs smoother..

i.e for 1000 shared objects just hitting a button and updating one bool
flag seems to block for a fraction of a second or so with RSP when with
the standard approach it seems I am not experiencing this issue for example.

Mapping also seems faster without RSP although from what I ve read in
previous posts this can be expected..

Anyways, thanks for the insights and all the best,

Petros

Petros Kataras

unread,
Nov 24, 2014, 10:09:26 AM11/24/14
to Equalizer Developer List
I just realized also that I ve been mixing your surname with your first
name in the last emails :|

Sorry for that Carsten !

Cheers,
Petros

Stefan Eilemann

unread,
Nov 26, 2014, 10:58:54 AM11/26/14
to Equalizer Developer List

On 21. Nov 2014, at 17:12, Petros Kataras <petros....@aec.at> wrote:

> I have a shared object that actually seems to map and work fine but I get an LBUNREACHABLE from objectStore.cpp line 700 . Reading through the source I can't figure exactly where the issue might lie and as I mentioned the object seems to register and map properly on the master and client side respectively..

When does this happen exactly? The node crashing receives an object command for a specific instance which can’t be found. This is most unusual. I’ve added more debug output, can you try with ?

>
> Besides that I am also getting a segfault when exiting the app inside EventConnection::getNotifier() when running the RSP config. This segfault happens also with eqPly / seqPly and so on but I don't really have time to debug it right now...
>
> Anyways I think for now I am gonna stick with unicast and maybe later on I ll give it a try again..

My recommendation would also be to only use multicast if you have a serious bottleneck in distributing data. The technology is simply not mature.



Cheers,

Stefan.


signature.asc

Stefan Eilemann

unread,
Nov 26, 2014, 11:00:29 AM11/26/14
to Equalizer Developer List

On 21. Nov 2014, at 17:12, Petros Kataras <petros....@aec.at> wrote:

> I have a shared object that actually seems to map and work fine but I get an LBUNREACHABLE from objectStore.cpp line 700 . Reading through the source I can't figure exactly where the issue might lie and as I mentioned the object seems to register and map properly on the master and client side respectively..

When does this happen exactly? The node crashing receives an object command for a specific instance which can’t be found. This is most unusual. I’ve added more debug output, can you try with '[master f941a79] Increase error output'?

>
> Besides that I am also getting a segfault when exiting the app inside EventConnection::getNotifier() when running the RSP config. This segfault happens also with eqPly / seqPly and so on but I don't really have time to debug it right now...
>
> Anyways I think for now I am gonna stick with unicast and maybe later on I ll give it a try again..

signature.asc

Petros Kataras

unread,
Nov 26, 2014, 4:44:09 PM11/26/14
to Equalizer Developer List
Hi Stefan,

there was a follow up on this post -- The weird object command issue was due to my fault (..a leftover from not resolving carefully a merge conflict on a specific file..) This is fixed and I am not getting the specific error anymore..

I was able to run properly the RSP configuration but as I mentioned on the previous post purely qualitatively ( I didn't do any actual measurements ) the unicast approach still seems smoother so I am sticking with this for now..

Right now I am dropping from 60fps to around 40fps if I go with more than 200 distributed objects that get updated every frame ( i.e syncing a position for example ).. This is locally with 1 master and 1 slave running on the same machine..

I would be interested in any experiences people might have, dealing with a large number of shared objects that need to be synced very often and of course any suggestions for a more optimized path are always welcome..

Although I understand that in these cases application specific parameters/requirements play a critical role also ..

I will try out some stuff now that we have this setup here and report back any findings..

Best,
Petros

ROHN Carsten

unread,
Nov 27, 2014, 2:31:16 AM11/27/14
to Equalizer Developer List
Hey Petros,

I don't have experience with a large number of objects getting updated, but with big object updates (like a texture update every frame). Multicast performs a lot better in this scenario than unicast, obviously especially with a higher number of clients.

We have customers where multicast is not performing well, same as you describe it. I assume this is due to network parameters or the software environment (traffic sniffer aka virus scan!). That's why multicast might be a viable solution for a network, where you have full control over the network and computers, but not if you want to roll it out to different (customer) clusters, in my hard-earned experience.

There are also quite a number of parameters to tweak (co::global), which optimize the behavior of the protocol regarding packet loss. In my experience it makes sense to set the MTU smaller (~10k, on windows), the ack timeout to 5 or even 10 and set the scale down parameter bigger than the scale up parameter(2-3x). You might get a more smooth behavior.

In general, if you have a lot of object updates it might also be worth to move some logic to the clients and calculate object changes there. In other words, to distribute more high level data (less data, less objects) instead of the most low level data. Yes, unfortunately it's not possible in every case.

Cheers,
Carsten

-----Original Message-----
From: eq-dev-...@equalizergraphics.com [mailto:eq-dev-...@equalizergraphics.com] On Behalf Of Petros Kataras
Sent: Mittwoch, 26. November 2014 22:44
To: Equalizer Developer List
This email and any attachments are intended solely for the use of the individual or entity to whom it is addressed and may be confidential and/or privileged.

If you are not one of the named recipients or have received this email in error,

(i) you should not read, disclose, or copy it,

(ii) please notify sender of your receipt by reply email and delete this email and all attachments,

(iii) Realtime Technology does not accept or assume any liability or responsibility for any use of or reliance on this email.

For other languages, go to http://www.3ds.com/terms/email-disclaimer

Petros Kataras

unread,
Nov 27, 2014, 4:07:00 PM11/27/14
to Equalizer Developer List
Hi Carsten,

thank you for sharing your experiences and suggestions.

In our case we usually have control over network settings and computer configurations ( 99.9% of the times running some linux flavor ) so the issue at least here doesn't seem to be related to some weird configuration..

Really good to know that there is co::global where I can play around with the settings -- I will definitely give it a try and come back with any results I might have..

As for the suggestion to move some logic to the clients is something that I am already doing quite a lot in the cases that I can actually do it but as you also mentioned, unfortunately this is not always the case.. Now for example I am implementing some particle streams for generative visuals on a media installation and I am hitting these kind of limitations…

I will try out though some stuff and will report back any findings..

Cheers,
Petros

Stefan Eilemann

unread,
Nov 28, 2014, 1:57:21 AM11/28/14
to eq-...@equalizergraphics.com

On 26. Nov 2014, at 22:44, Petros.Kataras [via Software] <ml-node+s17...@n2.nabble.com> wrote:

> Right now I am dropping from 60fps to around 40fps if I go with more than 200 distributed objects that get updated every frame ( i.e syncing a position for example ).. This is locally with 1 master and 1 slave running on the same machine..

I would profile (vtune) it to see the hot spot. There are multiple possible optimisations:

* Multi-thread the commit/syncs
* Optimize your serializers
* Use static objects
* Disable compression if you have a fast network

All of them are just guesses, hence the need for profiling and benchmarking.


HTH,

Stefan.



signature.asc (858 bytes) <http://software.1713.n2.nabble.com/attachment/7586802/0/signature.asc>




--
View this message in context: http://software.1713.n2.nabble.com/RSP-multicast-details-tp7586748p7586802.html
Sent from the Equalizer - Parallel Rendering mailing list archive at Nabble.com.

Reply all
Reply to author
Forward
0 new messages