How to disable FIFO and restransmit protocols in JGROUPS?

156 views
Skip to first unread message

Gautam Saxena

unread,
Nov 16, 2018, 8:46:43 AM11/16/18
to jgroups-dev
I'm a newbie to Jgroups, but based on my understanding of the documentation, one of its key advantages is that one can disable the protocol elements one does not need/want (to achieve better performance). However, when I tried to disable anything that had to do with "FIFO" order of delivery and "guaranteeing delievery", I got the following error:

    Exception in thread "main" java.lang.Exception: events [GET_DIGEST SET_DIGEST ] are required by GMS, but not provided by any of the protocols below it
    at org
.jgroups.stack.Configurator.sanityCheck(Configurator.java:320)
    at org
.jgroups.stack.Configurator.connectProtocols(Configurator.java:197)
    at org
.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:115)
    at org
.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:49)
    at org
.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:475)
    at org
.jgroups.JChannel.init(JChannel.java:965)
    at org
.jgroups.JChannel.<init>(JChannel.java:148)
    at org
.jgroups.JChannel.<init>(JChannel.java:130)
    at
RpcDispatcherTest.start(RpcDispatcherTest.java:29)
    at
RpcDispatcherTest.main(RpcDispatcherTest.java:83)


My xml config file looks like this:

   
 <config xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
           
xmlns="urn:org:jgroups"
           
xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/jgroups.xsd">
       
<TCP bind_addr="127.0.0.1"
               
bind_port="7800"
             
recv_buf_size="${tcp.recv_buf_size:130k}"
             
send_buf_size="${tcp.send_buf_size:130k}"
             
max_bundle_size="64K"
             
sock_conn_timeout="300"
             
enable_diagnostics="true"
             
thread_pool.min_threads="10"
             
thread_pool.max_threads="20"
             
thread_pool.keep_alive_time="30000"
             
stats = "false"
       
/>
       
<TCPPING initial_hosts="127.0.0.1[7800]"
                 
port_range="0" stats = "false"/>
       
<MERGE3  min_interval="10000"
                 
max_interval="30000" stats = "false"/>
       
<FD_SOCK stats = "false"/>
       
<FD timeout="3000" max_tries="3" stats = "false" />
       
<VERIFY_SUSPECT timeout="1500" stats = "false" />
       
       
<pbcast.GMS print_local_addr="true" join_timeout="2000"
                   
view_bundling="true" stats = "false"/>
   
</config>


If I comment out the last protocol (the pgcast.GMS one), I do NOT get errors and it "seems" to work on a single windows VM (on Google Cloud), but if I start up a 2nd jvm (still on the same Windows machine), then I notice that each jvm is in a "separate" cluster and doesn't see the other. (In the "normal tcp.xml" config (which includes the NACKA and XXXX protocols), eg

    <pbcast.NAKACK2 use_mcast_xmit="false"
                       
discard_delivered_msgs="true"
                       
stats = "false"/>
       
<UNICAST3 stats = "false"/>
       
<!--<pbcast.STABLE desired_avg_gossip="50000"-->
                       
<!--max_bytes="4M"/>-->


everything works "as expected", ie if I start a 2nd JVM on the same windows machine, the 2nd JVM does appear to join the 1st JVM's cluster and so messages sent on the 2nd JVM appear in the 1st JVM and vice-versa.

So, is there a way to disable UNICAST3 and NAKACK2 (essentially, anything that has to do with FIFO ordering or guaranteeing message delivery) but still include the logic needed to ensure a "working complete cluster" that also captures which nodes leave/join the cluster (eg pbcast.GMS logic?) I couldn't figure out how....

(Background info: I'm trying to improve performance, and I suspect the somewhat slow performance is because of the "guaranteed message delivery" and "FIFO" protocols, which I do not think I need because a) I'm using TCP and b) the messages can be sent in any order. (That said, I'm assuming that TCP, almost by definition, does the guaranteeing of message delivery, since that's critical.) I'm also on Google Cloud, where I think the "guaranteeing" aspect of TCP logic runs on highly optimized routers and Multicast is not allowed anyways, which supresses one of the main advantages of UDP multicast.)

Finally (and I do *NOT* think this is needed), but here's my test code (which is just a slight modification of the demo that comes with JGroups 4.0):

 
  import org.jgroups.Address;
   
import org.jgroups.JChannel;
   
import org.jgroups.Message;
   
import org.jgroups.blocks.*;
   
import org.jgroups.util.RspList;
   
import org.jgroups.util.Util;
   
   
import java.util.concurrent.CompletableFuture;
   
import java.util.concurrent.atomic.AtomicInteger;
   
import java.util.stream.IntStream;
   
   
public class RpcDispatcherTest {
       
JChannel channel;
       
RpcDispatcher disp;
       
RspList rsp_list;
       
String             props = "gs-tcp.xml"; // set by application
   
       
public static int print(int number) throws Exception {
           
return number;
       
}
   
       
public void start() throws Exception {
   
           
RequestOptions opts=new RequestOptions(ResponseMode.GET_FIRST, 1000);
            channel
=new JChannel(props);
            disp
=new RpcDispatcher(channel, this);
            channel
.connect("RpcDispatcherTestGroup");
           
           
final Address myCurAddress = channel.getAddress();
           
System.out.println("Currrent address is " + myCurAddress + " all members address are " + channel.getView().getMembers().toString());
   
   
           
final long t1 = System.currentTimeMillis();
           
final IntStream x = IntStream.range(0, 1_000_000);
           
final AtomicInteger cnt = new AtomicInteger();
            x
.asLongStream().parallel().forEach(l -> {
               
try {
               
final int i = (int) l;
                   
if (i % (100) == 0) {
                       
System.out.println("At " + i + " on thread  + " + Thread.currentThread().getId());
                   
}
   
   
               
final MethodCall call=new MethodCall(getClass().getMethod("print", int.class));
                call
.setArgs(i);
               
final CompletableFuture<Integer> response = disp.<Integer>callRemoteMethodWithFuture(myCurAddress, call, opts);
                response
.thenAccept(integer -> {
                       
if (integer % (1024*8) == 0) {
                           
System.out.println("At " + cnt.incrementAndGet() + " Execution time for " + integer + " is " + (System.currentTimeMillis() - t1)/1000f);
                       
}
                   
});
               
} catch (Exception e) {
                    e
.printStackTrace();
               
}
           
});
   
         
//   Util.close(disp, channel);
       
}
   
       
public static void main(String[] args) throws Exception {
           
new RpcDispatcherTest().start();
       
}
   
}

Bela Ban

unread,
Nov 16, 2018, 9:07:23 AM11/16/18
to jgrou...@googlegroups.com
If you remove GMS, then you won't have group membership, which renders
JGroups pretty much unusable...

There's 2 ways you can achieve what you want:

#1 Run a full stack (incl GMS, NAKACK2 and UNICAST3) and - when the
cluster has formed (and GMS has done its work) - remove the protocols
you don't need, e.g. 'probe.sh remove-protocol=UNICAST3'. Then run your
test.

#2 Keep the default configuration (including GMS,NAKACK2 and UNICAST3),
but add message flags to your messages/RPCs:
- OOB: this means that a message is delivered in parallel to other
messages from the same sender, breaking FIFO semantics (you mentioned
this is fine).
- NO_RELIABILITY: this bypasses UNICAST3 / NAKACK2

You can remove a few protocols from the default config, e.g. BARRIER,
STABLE, MFC/UFC etc

On 16.11.18 14:46, Gautam Saxena wrote:
> I'm a newbie to Jgroups, but based on my understanding of the
> documentation, one of its key advantages is that one can disable the
> protocol elements one does not need/want (to achieve better
> performance). However, when I tried to disable anything that had to do
> with "FIFO" order of delivery and "guaranteeing delievery", I got the
> following error:
>
> |
> Exceptioninthread "main"java.lang.Exception:events [GET_DIGEST
> SET_DIGEST ]are required byGMS,but notprovided byany of the protocols
> below it
>     at org.jgroups.stack.Configurator.sanityCheck(Configurator.java:320)
>     at
> org.jgroups.stack.Configurator.connectProtocols(Configurator.java:197)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:115)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:49)
>     at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:475)
>     at org.jgroups.JChannel.init(JChannel.java:965)
>     at org.jgroups.JChannel.<init>(JChannel.java:148)
>     at org.jgroups.JChannel.<init>(JChannel.java:130)
>     at RpcDispatcherTest.start(RpcDispatcherTest.java:29)
>     at RpcDispatcherTest.main(RpcDispatcherTest.java:83)
> |
>
>
> My xml config file looks like this:
>
> |
> <configxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups"
> xsi:schemaLocation="urn:org:jgroups
> http://www.jgroups.org/schema/jgroups.xsd">
> <TCPbind_addr="127.0.0.1"
> bind_port="7800"
> recv_buf_size="${tcp.recv_buf_size:130k}"
> send_buf_size="${tcp.send_buf_size:130k}"
> max_bundle_size="64K"
> sock_conn_timeout="300"
> enable_diagnostics="true"
> thread_pool.min_threads="10"
> thread_pool.max_threads="20"
> thread_pool.keep_alive_time="30000"
> stats="false"
> />
> <TCPPINGinitial_hosts="127.0.0.1[7800]"
> port_range="0"stats="false"/>
> <MERGE3min_interval="10000"
> max_interval="30000"stats="false"/>
> <FD_SOCKstats="false"/>
> <FDtimeout="3000"max_tries="3"stats="false"/>
> <VERIFY_SUSPECTtimeout="1500"stats="false"/>
>
> <pbcast.GMSprint_local_addr="true"join_timeout="2000"
> view_bundling="true"stats="false"/>
> </config>
> |
>
>
> If I comment out the last protocol (the pgcast.GMS one), I do NOT get
> errors and it "seems" to work on a single windows VM (on Google Cloud),
> but if I start up a 2nd jvm (still on the same Windows machine), then I
> notice that each jvm is in a "separate" cluster and doesn't see the
> other. (In the "normal tcp.xml" config (which includes the NACKA and
> XXXX protocols), eg
>
> |
> <pbcast.NAKACK2use_mcast_xmit="false"
> discard_delivered_msgs="true"
> stats="false"/>
> <UNICAST3stats="false"/>
> System.out.println("Currrent address is "+myCurAddress +" all members
> --
> You received this message because you are subscribed to the Google
> Groups "jgroups-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jgroups-dev...@googlegroups.com
> <mailto:jgroups-dev...@googlegroups.com>.
> To post to this group, send email to jgrou...@googlegroups.com
> <mailto:jgrou...@googlegroups.com>.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/jgroups-dev/a273301e-d2c5-4f4d-8310-3135cf6a747c%40googlegroups.com
> <https://groups.google.com/d/msgid/jgroups-dev/a273301e-d2c5-4f4d-8310-3135cf6a747c%40googlegroups.com?utm_medium=email&utm_source=footer>.
> For more options, visit https://groups.google.com/d/optout.

--
Bela Ban | http://www.jgroups.org

Bela Ban

unread,
Nov 16, 2018, 9:14:19 AM11/16/18
to jgrou...@googlegroups.com
Info on message flags:
http://www.jgroups.org/manual4/index.html#MessageFlags

I've run tests with minimal protocol stacks before and removing
protocols such as UNICAST3 or NAKACK2 made almost no difference.

A much bigger performance increase could be had by
- Sizing the thread pool correctly
- Using OOB RPCs/messages
- Handling MessageBatches at the application level
(Receiver.receive(MessageBatch)
- Using async invocations [1] etc

[1] http://www.jgroups.org/manual4/index.html#AsyncInvocation

On 16.11.18 14:46, Gautam Saxena wrote:
> I'm a newbie to Jgroups, but based on my understanding of the
> documentation, one of its key advantages is that one can disable the
> protocol elements one does not need/want (to achieve better
> performance). However, when I tried to disable anything that had to do
> with "FIFO" order of delivery and "guaranteeing delievery", I got the
> following error:
>
> |
> Exceptioninthread "main"java.lang.Exception:events [GET_DIGEST
> SET_DIGEST ]are required byGMS,but notprovided byany of the protocols
> below it
>     at org.jgroups.stack.Configurator.sanityCheck(Configurator.java:320)
>     at
> org.jgroups.stack.Configurator.connectProtocols(Configurator.java:197)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:115)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:49)
>     at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:475)
>     at org.jgroups.JChannel.init(JChannel.java:965)
>     at org.jgroups.JChannel.<init>(JChannel.java:148)
>     at org.jgroups.JChannel.<init>(JChannel.java:130)
>     at RpcDispatcherTest.start(RpcDispatcherTest.java:29)
>     at RpcDispatcherTest.main(RpcDispatcherTest.java:83)
> |
>
>
> My xml config file looks like this:
>
> |
> <configxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups"
> xsi:schemaLocation="urn:org:jgroups
> http://www.jgroups.org/schema/jgroups.xsd">
> <TCPbind_addr="127.0.0.1"
> bind_port="7800"
> recv_buf_size="${tcp.recv_buf_size:130k}"
> send_buf_size="${tcp.send_buf_size:130k}"
> max_bundle_size="64K"
> sock_conn_timeout="300"
> enable_diagnostics="true"
> thread_pool.min_threads="10"
> thread_pool.max_threads="20"
> thread_pool.keep_alive_time="30000"
> stats="false"
> />
> <TCPPINGinitial_hosts="127.0.0.1[7800]"
> port_range="0"stats="false"/>
> <MERGE3min_interval="10000"
> max_interval="30000"stats="false"/>
> <FD_SOCKstats="false"/>
> <FDtimeout="3000"max_tries="3"stats="false"/>
> <VERIFY_SUSPECTtimeout="1500"stats="false"/>
>
> <pbcast.GMSprint_local_addr="true"join_timeout="2000"
> view_bundling="true"stats="false"/>
> </config>
> |
>
>
> If I comment out the last protocol (the pgcast.GMS one), I do NOT get
> errors and it "seems" to work on a single windows VM (on Google Cloud),
> but if I start up a 2nd jvm (still on the same Windows machine), then I
> notice that each jvm is in a "separate" cluster and doesn't see the
> other. (In the "normal tcp.xml" config (which includes the NACKA and
> XXXX protocols), eg
>
> |
> <pbcast.NAKACK2use_mcast_xmit="false"
> discard_delivered_msgs="true"
> stats="false"/>
> <UNICAST3stats="false"/>
> <!--<pbcast.STABLE desired_avg_gossip="50000"-->
> <!--max_bytes="4M"/>-->
> |
>
>
> everything works "as expected", ie if I start a 2nd JVM on the same
> windows machine, the 2nd JVM does appear to join the 1st JVM's cluster
> and so messages sent on the 2nd JVM appear in the 1st JVM and vice-versa.
>
> So, is there a way to disable UNICAST3 and NAKACK2 (essentially,
> anything that has to do with FIFO ordering or guaranteeing message
> delivery) but still include the logic needed to ensure a "working
> complete cluster" that also captures which nodes leave/join the cluster
> (eg pbcast.GMS logic?) I couldn't figure out how....
>
> (Background info: I'm trying to improve performance, and I suspect the
> somewhat slow performance is because of the "guaranteed message
> delivery" and "FIFO" protocols, which I do not think I need because a)
> I'm using TCP and b) the messages can be sent in any order. (That said,
> I'm assuming that TCP, almost by definition, does the guaranteeing of
> message delivery, since that's critical.) I'm also on Google Cloud,
> where I think the "guaranteeing" aspect of TCP logic runs on highly
> optimized routers and Multicast is not allowed anyways, which supresses
> one of the main advantages of UDP multicast.)
>
> Finally (and I do *NOT* think this is needed), but here's my test code
> (which is just a slight modification of the demo that comes with JGroups
> 4.0):
>
> |
> importorg.jgroups.Address;
> importorg.jgroups.JChannel;
> importorg.jgroups.Message;
> importorg.jgroups.blocks.*;
> importorg.jgroups.util.RspList;
> importorg.jgroups.util.Util;
>
> importjava.util.concurrent.CompletableFuture;
> importjava.util.concurrent.atomic.AtomicInteger;
> importjava.util.stream.IntStream;
>
> publicclassRpcDispatcherTest{
> JChannelchannel;
> RpcDispatcherdisp;
> RspListrsp_list;
> String            props ="gs-tcp.xml";// set by application
>
>                     e.printStackTrace();
> }
> });
>
> //   Util.close(disp, channel);
> }
>

Bela Ban

unread,
Nov 16, 2018, 9:18:07 AM11/16/18
to jgrou...@googlegroups.com
Sorry, last email :-)

The way you create your method call is wasteful and will affect perf; I
suggest use MethodLookup instead. This replaces "print" with a short,
and therefore completely bypasses reflection on the receiver.

On 16.11.18 14:46, Gautam Saxena wrote:
> I'm a newbie to Jgroups, but based on my understanding of the
> documentation, one of its key advantages is that one can disable the
> protocol elements one does not need/want (to achieve better
> performance). However, when I tried to disable anything that had to do
> with "FIFO" order of delivery and "guaranteeing delievery", I got the
> following error:
>
> |
> Exceptioninthread "main"java.lang.Exception:events [GET_DIGEST
> SET_DIGEST ]are required byGMS,but notprovided byany of the protocols
> below it
>     at org.jgroups.stack.Configurator.sanityCheck(Configurator.java:320)
>     at
> org.jgroups.stack.Configurator.connectProtocols(Configurator.java:197)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:115)
>     at
> org.jgroups.stack.Configurator.setupProtocolStack(Configurator.java:49)
>     at org.jgroups.stack.ProtocolStack.setup(ProtocolStack.java:475)
>     at org.jgroups.JChannel.init(JChannel.java:965)
>     at org.jgroups.JChannel.<init>(JChannel.java:148)
>     at org.jgroups.JChannel.<init>(JChannel.java:130)
>     at RpcDispatcherTest.start(RpcDispatcherTest.java:29)
>     at RpcDispatcherTest.main(RpcDispatcherTest.java:83)
> |
>
>
> My xml config file looks like this:
>
> |
> <configxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns="urn:org:jgroups"
> xsi:schemaLocation="urn:org:jgroups
> http://www.jgroups.org/schema/jgroups.xsd">
> <TCPbind_addr="127.0.0.1"
> bind_port="7800"
> recv_buf_size="${tcp.recv_buf_size:130k}"
> send_buf_size="${tcp.send_buf_size:130k}"
> max_bundle_size="64K"
> sock_conn_timeout="300"
> enable_diagnostics="true"
> thread_pool.min_threads="10"
> thread_pool.max_threads="20"
> thread_pool.keep_alive_time="30000"
> stats="false"
> />
> <TCPPINGinitial_hosts="127.0.0.1[7800]"
> port_range="0"stats="false"/>
> <MERGE3min_interval="10000"
> max_interval="30000"stats="false"/>
> <FD_SOCKstats="false"/>
> <FDtimeout="3000"max_tries="3"stats="false"/>
> <VERIFY_SUSPECTtimeout="1500"stats="false"/>
>
> <pbcast.GMSprint_local_addr="true"join_timeout="2000"
> view_bundling="true"stats="false"/>
> </config>
> |
>
>
> If I comment out the last protocol (the pgcast.GMS one), I do NOT get
> errors and it "seems" to work on a single windows VM (on Google Cloud),
> but if I start up a 2nd jvm (still on the same Windows machine), then I
> notice that each jvm is in a "separate" cluster and doesn't see the
> other. (In the "normal tcp.xml" config (which includes the NACKA and
> XXXX protocols), eg
>
> |
> <pbcast.NAKACK2use_mcast_xmit="false"
> discard_delivered_msgs="true"
> stats="false"/>
> <UNICAST3stats="false"/>
> <!--<pbcast.STABLE desired_avg_gossip="50000"-->
> <!--max_bytes="4M"/>-->
> |
>
>
> everything works "as expected", ie if I start a 2nd JVM on the same
> windows machine, the 2nd JVM does appear to join the 1st JVM's cluster
> and so messages sent on the 2nd JVM appear in the 1st JVM and vice-versa.
>
> So, is there a way to disable UNICAST3 and NAKACK2 (essentially,
> anything that has to do with FIFO ordering or guaranteeing message
> delivery) but still include the logic needed to ensure a "working
> complete cluster" that also captures which nodes leave/join the cluster
> (eg pbcast.GMS logic?) I couldn't figure out how....
>
> (Background info: I'm trying to improve performance, and I suspect the
> somewhat slow performance is because of the "guaranteed message
> delivery" and "FIFO" protocols, which I do not think I need because a)
> I'm using TCP and b) the messages can be sent in any order. (That said,
> I'm assuming that TCP, almost by definition, does the guaranteeing of
> message delivery, since that's critical.) I'm also on Google Cloud,
> where I think the "guaranteeing" aspect of TCP logic runs on highly
> optimized routers and Multicast is not allowed anyways, which supresses
> one of the main advantages of UDP multicast.)
>
> Finally (and I do *NOT* think this is needed), but here's my test code
> (which is just a slight modification of the demo that comes with JGroups
> 4.0):
>
> |
> importorg.jgroups.Address;
> importorg.jgroups.JChannel;
> importorg.jgroups.Message;
> importorg.jgroups.blocks.*;
> importorg.jgroups.util.RspList;
> importorg.jgroups.util.Util;
>
> importjava.util.concurrent.CompletableFuture;
> importjava.util.concurrent.atomic.AtomicInteger;
> importjava.util.stream.IntStream;
>
> publicclassRpcDispatcherTest{
> JChannelchannel;
> RpcDispatcherdisp;
> RspListrsp_list;
> String            props ="gs-tcp.xml";// set by application
>
>                     e.printStackTrace();
> }
> });
>
> //   Util.close(disp, channel);
> }
>
Reply all
Reply to author
Forward
0 new messages