Haskell bindings for CCI - hurray!

38 views
Skip to first unread message

Peter Braam

unread,
Jan 17, 2012, 4:22:13 PM1/17/12
to cci_dev...@email.ornl.gov, cloudh...@googlegroups.com
Hi -

Facundo implemented Haskell bindings for CCI and we tested the IB verbs driver, with Paul Monday's help, on our IB QDR cluster.  The results are amazingly close to C performance - just a few % off.  Below are some event-only and RDMA ping pong tests.

Hopefully everyone finds this good news!

Peter

---------- Forwarded message ----------
From: Facundo Domínguez <facundo....@parsci.com>
Date: 2012/1/14
Subject: [dev] Perhaps pingpong definitive numbers
To: Dev <d...@parsci.com>


Hi all,

Tests results follow. The hard thing in making these tests was
imitating the C pingpong behavior (messy program) in Haskell. Once
that was achieved, no optimizations other than the ones performed by
ghc automatically were required.

Cheers!
Facundo

Servers and clients are executed in different nodes.

== C pingpong implementation with a reliable ordered connection
sending active messages ==

[facundo.dominguez@pg73-v3 tests]$ CCI_CONFIG=../../../../cci.ini
./pingpong -h verbs://10.155.90.37:35480  -c RO
Using RO connection
Opened verbs://10.155.90.13:46154
Bytes   Latency (one-way)       Throughput
      0            2.28 us                 0.00 MB/s
      1            2.95 us                 0.34 MB/s
      2            2.96 us                 0.67 MB/s
      4            2.95 us                 1.36 MB/s
      8            2.93 us                 2.73 MB/s
     16            3.01 us                 5.32 MB/s
     32            3.03 us                10.58 MB/s
     64            3.00 us                21.36 MB/s
    128            3.08 us                41.62 MB/s
    256            3.24 us                79.05 MB/s
    512            3.48 us               146.95 MB/s
   1024            4.02 us               255.02 MB/s
   2048            4.91 us               417.27 MB/s

== Haskell pingpong implementation with a reliable ordered connection
sending active messages ==

[facundo.dominguez@pg155-n17 cci-haskell]$ CCI_CONFIG=../cci.ini
dist/build/ex-pingpong/ex-pingpong -h verbs://10.155.90.13:46163
verbs://10.155.90.37:35483
Bytes           Latency (one-way)       Throughput
      0            2.31 us                 0.00 MB/s
      1            2.93 us                 0.34 MB/s
      2            2.95 us                 0.68 MB/s
      4            2.96 us                 1.35 MB/s
      8            2.93 us                 2.73 MB/s
     16            2.93 us                 5.47 MB/s
     32            2.94 us                10.87 MB/s
     64            2.91 us                21.96 MB/s
    128            3.01 us                42.52 MB/s
    256            3.21 us                79.76 MB/s
    512            3.43 us               149.08 MB/s
   1024            3.91 us               261.96 MB/s
   2048            4.84 us               422.95 MB/s

== C pingpong implementation with a reliable ordered connection making
RMA writes ==

[facundo.dominguez@pg73-v3 tests]$ CCI_CONFIG=../../../../cci.ini
./pingpong -h verbs://10.155.90.37:35491 -c RO -w -m 4194304
Using RO connection
Opened verbs://10.155.90.13:46179
server RMA handle is 0x15759d0
local_rma_handle is 0x12f9530
Bytes           Latency (round-trip)    Throughput
      1            3.04 us                 0.33 MB/s
      2            3.03 us                 0.66 MB/s
      4            3.03 us                 1.32 MB/s
      8            3.03 us                 2.64 MB/s
     16            3.16 us                 5.07 MB/s
     32            3.25 us                 9.83 MB/s
     64            3.32 us                19.30 MB/s
    128            3.38 us                37.84 MB/s
    256            3.47 us                73.69 MB/s
    512            3.57 us               143.62 MB/s
   1024            4.02 us               254.85 MB/s
   2048            4.86 us               421.37 MB/s
   4096            5.38 us               760.69 MB/s
   8192            6.69 us              1225.22 MB/s
  16384            9.13 us              1794.02 MB/s
  32768           14.01 us              2338.28 MB/s
  65536           23.72 us              2762.39 MB/s
 131072           43.17 us              3036.47 MB/s
 262144           81.99 us              3197.28 MB/s
 524288          159.64 us              3284.23 MB/s
 1048576          314.96 us              3329.23 MB/s
 2097152          625.55 us              3352.51 MB/s
 4194304         1247.76 us              3361.46 MB/s


== Haskell pingpong implementation with a reliable ordered connection
making RMA writes ==

[facundo.dominguez@pg155-n17 cci-haskell]$ CCI_CONFIG=../cci.ini
dist/build/ex-pingpong/ex-pingpong -h verbs://10.155.90.13:46171 -r
4194304
verbs://10.155.90.37:35487
Bytes           Latency (one-way)       Throughput
      1            3.09 us                 0.32 MB/s
      2            3.13 us                 0.64 MB/s
      4            3.11 us                 1.29 MB/s
      8            3.10 us                 2.58 MB/s
     16            3.09 us                 5.18 MB/s
     32            3.14 us                10.19 MB/s
     64            3.15 us                20.33 MB/s
    128            3.24 us                39.46 MB/s
    256            3.23 us                79.37 MB/s
    512            3.45 us               148.58 MB/s
   1024            3.87 us               264.53 MB/s
   2048            4.66 us               439.10 MB/s
   4096            5.30 us               772.11 MB/s
   8192            6.53 us              1255.38 MB/s
  16384            8.94 us              1832.89 MB/s
  32768           13.82 us              2371.12 MB/s
  65536           23.52 us              2786.72 MB/s
 131072           42.98 us              3049.52 MB/s
 262144           81.61 us              3212.22 MB/s
 524288          158.95 us              3298.41 MB/s
 1048576          313.33 us              3346.56 MB/s
 2097152          622.05 us              3371.34 MB/s
 4194304         1239.41 us              3384.12 MB/s

Facundo Domínguez

unread,
Jan 17, 2012, 5:03:41 PM1/17/12
to Atchley, Scott, Peter Braam, cci_dev...@email.ornl.gov, cloudh...@googlegroups.com
> Interestingly, after 8 bytes, the Haskell latencies are better than the C latencies. I would be curious to understand the differences between the Haskell pingpong and the C version.

I can think of a couple of sources for the small differences:
* Network performance fluctuates slightly between runs.
* The ghc compiler for Haskell uses a native code generator so the
binaries produced by ghc and gcc must be different. Consider that
because of temporary building issues, we are using ghc to compile ORNL
CCI implementation statically rather than making a shared library with
gcc. I didn't dig how ghc compiles C source code though.

If you are interested we could provide you the Haskell bindings and
pingpong draft for you to dig it further.

Cheers!
Facundo


On Tue, Jan 17, 2012 at 7:33 PM, Atchley, Scott <atch...@ornl.gov> wrote:
> Peter,
>
> Excellent!
>
> Interestingly, after 8 bytes, the Haskell latencies are better than the C latencies. I would be curious to understand the differences between the Haskell pingpong and the C version.
>
> Scott

>> _______________________________________________
>> CCI_Developers mailing list
>> CCI_Dev...@email.ornl.gov
>> https://email.ornl.gov/mailman/listinfo/cci_developers
>> To unsubscribe, send a blank email to cci_developer...@email.ornl.gov
>
> _______________________________________________
> CCI_Developers mailing list
> CCI_Dev...@email.ornl.gov
> https://email.ornl.gov/mailman/listinfo/cci_developers
> To unsubscribe, send a blank email to cci_developer...@email.ornl.gov

Ryan N

unread,
Jan 17, 2012, 11:42:49 PM1/17/12
to CloudHaskell
Hi Facundo,

Excellent!

> If you are interested we could provide you the Haskell bindings and
> pingpong draft for you to dig it further.

I'd love to run this on some our infiniband systems here. Sign me up
for getting this code as well please.

Best,
-Ryan

Rob Stewart

unread,
Jan 18, 2012, 8:25:05 AM1/18/12
to rrne...@gmail.com, CloudHaskell
Hi Facundo,

I'm looking fault tolerant distributed memory Haskell for my PhD. I
tried using mpich2 with only limited fault tolerant behaviour (v1.4.1
has a flag to withstand node failure), and so instead just focused on
silent failure semantics using sockets. I'd love to be using a higher
performance communication layer to investigate the fault tolerant
semantics of CCI. So.. I'd also very much like to sign up getting this
code, too.

Thanks,

--
Rob Stewart
Heriot Watt University

Reply all
Reply to author
Forward
0 new messages