自制并行计算benchmark,欢迎测试加速比(来战)

54 views
Skip to first unread message

C.D.Luminate

unread,
Mar 22, 2016, 4:53:16 AM3/22/16
to xidian_linux
不过请注意这个Makefile是parallel暴力加速版本,没装GNU parallel的话需要调整Makefile

我的笔记本是 I5-2430M, 2C4T @ 2.5G,加速比(并行速度除以串行速度)
基本上在2左右。******** 欢迎来虐我的CPU  *********

另外,由于float类型在我写程序过程中又炸出大数吃小数BUG,所以实现的例程
均为双精度。

Wellcome to Lumin's serial/parallel benchmark, init ... [OK]
I: [initialization] time cost is 0.811117 seconds.
--------------------------------------------------------------------------------
I: [dcopy in serial] time cost is 0.187758 seconds.
     A 1.000000 1.000000 C 1.000000 1.000000 
I: [dcopy in parallel] time cost is 0.110634 seconds.
     A 1.000000 1.000000 C 1.000000 1.000000 
--------------------------------------------------------------------------------
I: [dasum serial] time cost is 0.244707 seconds.
     resA 67108864.000000
I: [dasum parallel] time cost is 0.126894 seconds.
     resB 67108864.000000
--------------------------------------------------------------------------------
I: [ddot in serial] time cost is 0.249027 seconds.
     resA 67108864.000000
I: [ddot in parallel] time cost is 0.119332 seconds.
     resB 67108864.000000
--------------------------------------------------------------------------------
I: [dscal in serial] time cost is 0.244976 seconds.
     A 0.500000 0.500000
I: [dscal in parallel] time cost is 0.102829 seconds.
     A 0.250000 0.250000
--------------------------------------------------------------------------------
I: [daxpby in serial] time cost is 0.300344 seconds.
     A 0.250000 0.250000 C 1.625000 1.625000 
I: [daxpby in parallel] time cost is 0.178659 seconds.
     A 0.250000 0.250000 C 2.562500 2.562500 
--------------------------------------------------------------------------------
I: [dgemv in serial] time cost is 0.315916 seconds.
     Y 0.250000 0.250000 DEST 2048.250000 2048.250000 
I: [dgemv in parallel] time cost is 0.212380 seconds.
     Y 0.250000 0.250000 DEST 2048.250000 2048.250000 
I: [dgemv in parallelv2] time cost is 0.205021 seconds.
     Y 0.250000 0.250000 DEST 2048.250000 2048.250000 
--------------------------------------------------------------------------------
I: [dgemm in serial] time cost is 1.151119 seconds.
     X 1.000000 1.000000 Y 1.000000 1.000000 DEST 512.000000 512.000000 
I: [dgemm in parallel] time cost is 0.700665 seconds.
     X 1.000000 1.000000 Y 1.000000 1.000000 DEST 512.000000 512.000000 
--------------------------------------------------------------------------------
I: [All benchmark] time cost is 4.450948 seconds.

---
Regards,
C.D.Luminate

C.D.Luminate

unread,
Mar 22, 2016, 5:07:48 AM3/22/16
to xdl
有趣的一点是,当我还在用 float 类型时,
大量 1 的累加的确导致了IEEE 754单精度经典BUG,于是串行算法本来应该累加出
67108864, 结果只累加到 16777216。

这个BUG可以很容易的用这个程序来验证:

然而相同例程的并行版本却得出了正确结果[1],并行例程使用了
reduction (+:sum)
这个黑魔法相当于
1. 把for循环分解,map到多个thread上,每个thread单独享有一个sum变量
2. reduce: 将每个thread独有的sum都累加到最初的sum里
由于每个thread独有的sum数量级很大,于是此时float无法忽视reduce步骤中
任何加数,于是导致了正确结果。(其实巧合不止这一个)

涨姿势

[1] 常见的情况反而是串行靠谱并行容易炸,这里反过来了
 
---
Regards,
C.D.Luminate
--
您收到此邮件是因为您订阅了“西电开源社区”邮件列表。
要向此邮件列表发帖,请发送电子邮件至 xidian...@googlegroups.com
要取消订阅,请发送电子邮件至 xidian_linux+unsub...@googlegroups.com
请通过 https://groups.google.com/group/xidian_linux?hl=zh-CN 访问此网上论坛。
通过 [ipv6 enabled] http://xdlinux.info/http://xdl.in/
    [ipv4 only] http://linux.xidian.edu.cn/
     [手机]:http://m.xdlinux.info/  
访问西电开源社区。
---
您收到此邮件是因为您订阅了Google网上论坛上的“西电开源社区邮件列表”群组。
要退订此群组并停止接收此群组的电子邮件,请发送电子邮件到xidian_linux...@googlegroups.com
要查看更多选项,请访问https://groups.google.com/d/optout
Reply all
Reply to author
Forward
0 new messages