Performance of pageable memory more desirable?

12 views
Skip to first unread message

Johannes Gilger

unread,
Apr 6, 2011, 9:37:37 AM4/6/11
to engine-cuda
Hi Paolo,

I just got my other benchmark-box back (was testing on my workstation at
home before that) and immediately restarted my benchmarks. FYI, the
testbox is four GTX 295 (of which I'm using one Die of one GTX). What I
discovered was that performance of OpenCL was better than CUDA for
larger blocks.

So, I disabled page-locked memory and configured with --enable-pageable
and re-ran the benchmark, yielding much better figures.

Have you tried running your latest version with pageable memory instead
of page-locked? It might be slower for small chunks of memory, but it
will most certainly perform better at 8MB.

Have a look at these:
http://avalon.hoffentlich.net/~heipei/tmp/cuda_locked.pdf
http://avalon.hoffentlich.net/~heipei/tmp/cuda_pageable.pdf

Interestingly enough, the curves right up to 1MB are pretty much the
same, but after that the behaviour changes profoundly.

Greetings,
Jojo

--
Johannes Gilger <hei...@hackvalue.de>
http://heipei.net
GPG-Key: 0xD47A7FFC
GPG-Fingerprint: 5441 D425 6D4A BD33 B580 618C 3CDC C4D0 D47A 7FFC

Johannes Gilger

unread,
Apr 6, 2011, 10:27:15 AM4/6/11
to engine-cuda
On 06/04/11 15:37, Johannes Gilger wrote:
> Hi Paolo,
>
> I just got my other benchmark-box back (was testing on my workstation at
> home before that) and immediately restarted my benchmarks. FYI, the
> testbox is four GTX 295 (of which I'm using one Die of one GTX). What I
> discovered was that performance of OpenCL was better than CUDA for
> larger blocks.
>
> So, I disabled page-locked memory and configured with --enable-pageable
> and re-ran the benchmark, yielding much better figures.
>
> Have you tried running your latest version with pageable memory instead
> of page-locked? It might be slower for small chunks of memory, but it
> will most certainly perform better at 8MB.
>
> Have a look at these:
> http://avalon.hoffentlich.net/~heipei/tmp/cuda_locked.pdf
> http://avalon.hoffentlich.net/~heipei/tmp/cuda_pageable.pdf
>
> Interestingly enough, the curves right up to 1MB are pretty much the
> same, but after that the behaviour changes profoundly.

Hi Paolo,

quick followup: I've rewritten the copy functions to use page-locked
memory up until 1MB and pageable memory afterwards. This gives the
nicest average speed with a small dip at 1MB. I will further investigate
this, but it would be cool if you could do just one run of AES-128 with
and without page-locked memory to see if this phenomenon is limited to
my test-machine or not. On my workstation (8600 GT), page-locked memory
seems to perform better for all sizes.

Paolo Margara

unread,
Apr 8, 2011, 4:04:31 AM4/8/11
to engine-...@googlegroups.com
Hi Johannes,
I'm sorry for the long wait but the time is not on my side.
The curve that I obtained is similar in form to the one you have
obtained but the values ​​are very different, I can only suppose that
this is due to different kind of memories (and memory bus width) of your
board, follow the graph of my test:

http://engine-cuda.googlecode.com/svn/wiki/pageable-vs-pinned/aes-cbc-decrypt-pageable-vs-pinned-memory.png
http://engine-cuda.googlecode.com/svn/wiki/pageable-vs-pinned/aes-ecb-decrypt-pageable-vs-pinned-memory.png
http://engine-cuda.googlecode.com/svn/wiki/pageable-vs-pinned/aes-ecb-encrypt-pageable-vs-pinned-memory.png

The engine was built with the follow command lines:
./configure --prefix=/opt --disable-ttableconstant --enable-pageable
./configure --prefix=/opt --disable-ttableconstant

I think that the better option is to leave the --enable-pageable option
available for whose who need it, but keep the use of pinned memory the
default option, as it's now. To contrast the performance degradation
that usually appear when the size of transfered data is above 1MB/2MB I
think it's better to think about other solutions.

Out of curiosity you could try to run the "bandwidthTest" program
provided into the nvidia cuda sdk with different options on different
boards.

Greetings,
Paolo Margara

Reply all
Reply to author
Forward
0 new messages