HZ using RDMA (for NIC scale across IP) and /dev/shm as transport (on HPC Linux supercomputing scale)

Ben Cotton

ihsan demir

Peter Veentjer

ben.c...@alumni.rutgers.edu

Enes Akar

Greg Luck

ben.c...@alumni.rutgers.edu

Greg Luck

unread,

Jul 19, 2017, 5:34:52 PM7/19/17

to Hazelcast

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1. Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)? Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

2. Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

Thanks!

Ben

(212)933-4433

unread,

Jul 20, 2017, 6:19:09 AM7/20/17

to Hazelcast

As of my knowledge, Hazelcast cluster only works with TCP/IP today.

Regards,

ihsan

20 Temmuz 2017 Perşembe 00:34:52 UTC+3 tarihinde Ben Cotton yazdı:

unread,

Jul 20, 2017, 6:32:55 AM7/20/17

to haze...@googlegroups.com

On Thu, Jul 20, 2017 at 12:34 AM, Ben Cotton <bendc...@gmail.com> wrote:

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1. Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)? Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

Currently it is always TCP.

However the Network implementation is not as reliant any more on TCP as it used to be. A month or 2 ago I made a UDP based version of Hazelcast. So it is possible; but it isn't like super easy to plug in.

2. Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

We do not formally support it.

Thanks!
Ben
(212)933-4433

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+unsubscribe@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/dd17da9f-7e15-418d-ad42-921e31706053%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

unread,

Jul 20, 2017, 2:33:01 PM7/20/17

to ben.c...@alumni.rutgers.edu, haze...@googlegroups.com

Thank you ... both Ihsan and Peter (hey Peter! Special thanks to you .. how r things?)

I would encourage HZ to start prototyping future versions of their product to at least support RDMA. Intel recently gave away (for free) their iWarp technology to OFED. So soon, the notion of RDMA over Ethernet (not just infiniband) will empower all grids to ENTIRELY remove their dependencies on TCP. The implication? Yep TCP/IP is going away ... it served us well.

Thanks again for your responses. Carry on.

Ben

On Thu, Jul 20, 2017 at 6:32 AM Peter Veentjer <alarm...@gmail.com> wrote:

On Thu, Jul 20, 2017 at 12:34 AM, Ben Cotton <bendc...@gmail.com> wrote:

Hi HZ,

I am considering an employment opportunity with a firm that uses HZ (on both HPC Linux supercomputers and clusters of exaLogic scale Linux servers connected 100% by pure fiber Infiniband networks) to render and aggergate quantitative market/liquidity/credit RISK interactively and in real-time.

I have two questions:

1. Does HZ support the use of /dev/shm as a transport for cases when HZ instances of JVMs are all on the same supercomputer (where scale is achieved physically by adding processing blades on the same HPC backbone)? Or does HZ necessarily require the use of (TCP/UDP)/IP for all cases when multiple JVMs are used to scale?

Currently it is always TCP.

However the Network implementation is not as reliant any more on TCP as it used to be. A month or 2 ago I made a UDP based version of Hazelcast. So it is possible; but it isn't like super easy to plug in.

2. Does HZ formally support Java over RDMA (via Sockets Direct Protocol) ... empowering HZ grids that can scale JVM instances without using (TCP/UDP)/IP in cases that the physical NICs are pure fiber Infiniband (necessary for Java SDP to work)?

We do not formally support it.

Thanks!
Ben
(212)933-4433

--
You received this message because you are subscribed to the Google Groups "Hazelcast" group.

To unsubscribe from this group and stop receiving emails from it, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.
To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/dd17da9f-7e15-418d-ad42-921e31706053%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/BjcGtuxW5FU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.

To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/CAGuAWdCxBRxmpv3fmfZfkFvwAQ3ysNUAtbhuRFsGk72upaYQeA%40mail.gmail.com.

unread,

Jul 21, 2017, 7:15:27 PM7/21/17

to haze...@googlegroups.com, ben.c...@alumni.rutgers.edu

Agree with Ben. I will copy the discussion to our product management group.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/CAN%2B3K8RfcZXBceXUyQt3Q5b%2BODoggn5s35OnbkTM_CNyF%2Bjwyw%40mail.gmail.com.

unread,

Jul 24, 2017, 5:23:39 PM7/24/17

to Hazelcast

Ben

Both of these things are done by Speedus (Torus). You can buy Speedus and run Hazelcast over RDMA today.

When we tested in 2014 it was a large benefit. But since then we have improved networking speed immensely in Hazelcast. We rested in December 2015 against Hazelcast 3.6 and got the following results:

INTRANODE		INTERNODE
	BASELINE		Gigabit Ethernet	Infiniband	Solarflare -base	Solarflare-onload
GET	40654		18606	45119	43574	80087
PUT	10177		4663	11281	10898	19990

So we think users can go faster than RDMA with Solarflare cards and therefore we have reduced interest in RDMA.

unread,

Jul 25, 2017, 8:28:20 AM7/25/17

to haze...@googlegroups.com

Thanks Greg. Are those numbers the artifact of a {Cache.put(k,v); k = Cache.get(k);} 'ping-pong' style test? If so, those SolarFlare numbers blowing away RDMA/IB are staggeringly impressive. Did the SolarFlare people provide the HZ process some type of JVM boot agent (that hot-wired java.net.Socket and bypassed TCP/IP, SDP, RDMA, etc. to proprietary SolarFlare tactics) to affect these numbers? If appropriate, I would like to talk with the HZ folks that made these numbers happen. Thanks, Ben

--

You received this message because you are subscribed to a topic in the Google Groups "Hazelcast" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/hazelcast/BjcGtuxW5FU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to hazelcast+...@googlegroups.com.
To post to this group, send email to haze...@googlegroups.com.
Visit this group at https://groups.google.com/group/hazelcast.

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/f5c62d56-2d93-4bf0-add0-d2dc56132e01%40googlegroups.com.

2 Nodes – 20 threads per node

Throughput

unread,

Jul 25, 2017, 9:42:08 AM7/25/17

to haze...@googlegroups.com

Ben

Here is the full analysis.

See guillermo.lopez AT orusware.com who ran the tests if you want more. He might also retest for your scenario and it might be worthwhile.

After reevaluating the latest version of Hazelcast with Radargun and several high-speed networks we have seen:

- Hazelcast has improved significantly its network performance and our product (at least in its current state) no longer improves Hazelcast performance. You have removed TCP/IP overhead from being a bottleneck. Congratulations!

- Hazelcast can achieve good throughput on Solarflare, when using onload, almost 5X performance compared to regular 1 Gig Ethrenet and 2X comparted to InfiniBand (and Solarflare base, which is Solarflare without onload). However, if Hazelcast stresses the communications (multiple communication processes) then there is no real gain between using Solarflare/InfiniBand and using a regular 1 Gig Ethernet. Something similar to what happens to us. You have removed the socket overhead from the critical path on most scenarios.

Congratulations and I wish you all the best in 2016!!!

Guillermo

RADARGUN RESULTS with lastest Hazelcast version:

2 Nodes – 2 threads per node

	Throughput

	INTRANODE	INTERNODE
	BASELINE	Gigabit Ethernet	Infiniband	Solarflare -base	Solarflare-onload
GET	40654	18606	45119	43574	80087
PUT	10177	4663	11281	10898	19990

	INTRANODE		INTERNODE
	BASELINE		Gigabit Ethernet	Infiniband	Solarflare -base	Solarflare-onload

GET	107400	111219	140203	145030	107367
PUT	26840	27830	34992	36198	26887

To view this discussion on the web visit https://groups.google.com/d/msgid/hazelcast/3de60938-9f25-8805-cbc0-7d8307ad5c06%40alumni.rutgers.edu.

Reply all

Reply to author

Forward