HZ using RDMA (for NIC scale across IP) and /dev/shm as transport (on HPC Linux supercomputing scale)

101 views
Skip to first unread message

Ben Cotton

unread,
Jul 19, 2017, 5:34:52 PM7/19/17
to Hazelcast

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1.  Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)?  Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

2.  Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

Thanks!
Ben

ihsan demir

unread,
Jul 20, 2017, 6:19:09 AM7/20/17
to Hazelcast
As of my knowledge, Hazelcast cluster only works with TCP/IP today.

Regards,
ihsan

20 Temmuz 2017 Perşembe 00:34:52 UTC+3 tarihinde Ben Cotton yazdı:

Peter Veentjer

unread,
Jul 20, 2017, 6:32:55 AM7/20/17
to haze...@googlegroups.com
On Thu, Jul 20, 2017 at 12:34 AM, Ben Cotton <bendc...@gmail.com> wrote:

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1.  Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)?  Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

Currently it is always TCP.

However the Network implementation is not as reliant any more on TCP as it used to be. A month or 2 ago I made a UDP based version of Hazelcast. So it is possible; but it isn't like super easy to plug in.
 

2.  Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

We do not formally support it.
 

Thanks!
Ben

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+unsubscribe@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/dd17da9f-7e15-418d-ad42-921e31706053%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ben.c...@alumni.rutgers.edu

unread,
Jul 20, 2017, 2:33:01 PM7/20/17
to ben.c...@alumni.rutgers.edu, haze...@googlegroups.com
Thank you ... both Ihsan and Peter (hey Peter! Special thanks to you .. how r things?)

I would encourage HZ to start prototyping future versions of their product to at least  support RDMA.  Intel recently gave away (for free) their iWarp technology to OFED.  So soon, the notion of RDMA over Ethernet (not just  infiniband) will empower all grids to ENTIRELY remove their dependencies on TCP.  The implication? Yep TCP/IP is going away ... it served us well.

Thanks again for your responses.  Carry on.

Ben

On Thu, Jul 20, 2017 at 6:32 AM Peter Veentjer <alarm...@gmail.com> wrote:
On Thu, Jul 20, 2017 at 12:34 AM, Ben Cotton <bendc...@gmail.com> wrote:

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1.  Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)?  Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

Currently it is always TCP.

However the Network implementation is not as reliant any more on TCP as it used to be. A month or 2 ago I made a UDP based version of Hazelcast. So it is possible; but it isn't like super easy to plug in.
 

2.  Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

We do not formally support it.
 

Thanks!
Ben

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.
--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/BjcGtuxW5FU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

Enes Akar

unread,
Jul 21, 2017, 7:15:27 PM7/21/17
to haze...@googlegroups.com, ben.c...@alumni.rutgers.edu
Agree with Ben. I will copy the discussion to our product management group. 

Greg Luck

unread,
Jul 24, 2017, 5:23:39 PM7/24/17
to Hazelcast
Ben

Both of these things are done by Speedus (Torus). You can buy Speedus and run Hazelcast over RDMA today. 

When we tested in 2014 it was a large benefit. But since then we have improved networking speed immensely in Hazelcast. We rested in December 2015 against Hazelcast 3.6 and got the following results:

     INTRANODEINTERNODE
              BASELINE      Gigabit Ethernet   Infiniband   Solarflare -base   Solarflare-onload
GET40654 18606451194357480087
PUT10177 4663112811089819990

So we think users can go faster than RDMA with Solarflare cards and therefore we have reduced interest in RDMA.

ben.c...@alumni.rutgers.edu

unread,
Jul 25, 2017, 8:28:20 AM7/25/17
to haze...@googlegroups.com

Thanks Greg.  Are those numbers the artifact of a  {Cache.put(k,v); k = Cache.get(k);} 'ping-pong' style test?  If so, those SolarFlare numbers blowing away RDMA/IB are staggeringly impressive.    Did the SolarFlare people provide the HZ process some type of JVM boot agent (that hot-wired java.net.Socket and bypassed TCP/IP, SDP, RDMA, etc. to proprietary SolarFlare tactics) to affect these numbers?  If appropriate, I would like to talk with the HZ folks that made these numbers happen. Thanks, Ben

--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/BjcGtuxW5FU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

Greg Luck

unread,
Jul 25, 2017, 9:42:08 AM7/25/17
to haze...@googlegroups.com
Ben

Here is the full analysis.

See guillermo.lopez AT orusware.com who ran the tests if you want more. He might also retest for your scenario and it might be worthwhile.


After reevaluating the latest version of Hazelcast with Radargun and several high-speed networks we have seen:

- Hazelcast has improved significantly its network performance and our product (at least in its current state) no longer improves Hazelcast performance. You have removed TCP/IP overhead from being a bottleneck. Congratulations! 

- Hazelcast can achieve good throughput on Solarflare, when using onload, almost 5X performance compared to regular 1 Gig Ethrenet and 2X comparted to InfiniBand (and Solarflare base, which is Solarflare without onload). However, if Hazelcast stresses the communications (multiple communication processes) then there is no real gain between using Solarflare/InfiniBand and using a regular 1 Gig Ethernet. Something similar to what happens to us. You have removed the socket overhead from the critical path on most scenarios. 

Congratulations and I wish you all the best in 2016!!!

Guillermo


RADARGUN RESULTS with lastest Hazelcast version:

2 Nodes – 2 threads per node








Throughput

      INTRANODEINTERNODE

             BASELINE      Gigabit Ethernet   Infiniband   Solarflare -base   Solarflare-onload
GET40654
18606451194357480087
PUT10177
4663112811089819990





























2 Nodes – 20 threads per node








Throughput

INTRANODEINTERNODE

            BASELINE    Gigabit Ethernet  Infiniband   Solarflare -base   Solarflare-onload
GET107400 111219140203145030107367
PUT26840 27830349923619826887

















Reply all
Reply to author
Forward
0 new messages