Getting Crail to work over TCP

30 views
Skip to first unread message

Ben Sidhom

unread,
Aug 20, 2019, 6:35:05 PM8/20/19
to d...@crail.apache.org, zrlio...@googlegroups.com
I've been experimenting with getting Crail over TCP to work with the crail-spark-io shuffle extensions.

It seems to work fine for small shuffle sizes (up to about 10 gigabytes), but anything larger than that seems to hang. I've investigated this and the hangs seem to happen due to a few reasons, mostly contained to the NaRPC layer.

The benchmark numbers here seem to imply that this has worked for at least 200 gigabyte shuffles (I'm not certain because that second experiment does not explicitly give the test parameters). Has anybody had success with Crail over TCP or were pretty much all of the tests run over RDMA/NVMe?

--
-Ben

Patrick Stuedi

unread,
Aug 21, 2019, 1:43:06 AM8/21/19
to Ben Sidhom, d...@crail.apache.org, zrlio-users
There is a bug currently in NaRPC which increases the likelyhood of hangs in Crail/TCP as the data sizes increase. We have identified the actual problem in NaRPC but didn't get to fixing it so far. I can look into this.

-Patrick

--
You received this message because you are subscribed to the Google Groups "zrlio-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to zrlio-users...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/zrlio-users/CA%2B%2BvPmYD0UXwpnaEYNxGsRj3uNpAeubzHA6Sjy3AXT82-kuh-g%40mail.gmail.com.
Reply all
Reply to author
Forward
0 new messages