openssl speed time measurements

248 views

Skip to first unread message

Johannes Gilger

unread,

Feb 17, 2011, 6:46:29 AM2/17/11

to engine-cuda

Hi Paolo,

while developing engine_opencl (still sucks so far) I discovered that we
better use the -elapsed flag to openssl speed.

Without it, openssl speed will measure the user-time spent by the
process. Since the computation does not take place on the CPU, not much
cycles are used, while the computation is working constantly in the
background on the GPU.

It might not have been that noticeable with CUDA, since the CPU was busy
100% as well (so user time roughly corresponded to system elapsed time),
but when I use OpenCL, the CPU is suddenly idle, and openssl speed would
report 0.00s for encrypting blocks. CUDA using the CPU to that extent is
something else to be investigated ;)

See times(2) for more details.

Greetings,
Jojo

--
Johannes Gilger <hei...@hackvalue.de>
http://heipei.net
GPG-Key: 0xD47A7FFC
GPG-Fingerprint: 5441 D425 6D4A BD33 B580 618C 3CDC C4D0 D47A 7FFC

Paolo Margara

unread,

Feb 18, 2011, 9:52:58 AM2/18/11

to engine-...@googlegroups.com

Il 17/02/2011 12:46, Johannes Gilger ha scritto:
> Hi Paolo,
>
> while developing engine_opencl (still sucks so far) I discovered that we
> better use the -elapsed flag to openssl speed.
>
> Without it, openssl speed will measure the user-time spent by the
> process. Since the computation does not take place on the CPU, not much
> cycles are used, while the computation is working constantly in the
> background on the GPU.
>
> It might not have been that noticeable with CUDA, since the CPU was busy
> 100% as well (so user time roughly corresponded to system elapsed time),
> but when I use OpenCL, the CPU is suddenly idle, and openssl speed would
> report 0.00s for encrypting blocks. CUDA using the CPU to that extent is
> something else to be investigated ;)
>
> See times(2) for more details.
>
> Greetings,
> Jojo
>

Hi Johannes,
when I wrote the first version of 'test-speed.sh' I reflected on whether
or not to use the '-elapsed' option for the reasons you described, at
the end I chose to use the default option (don't enabled).
It might be a good idea to add an option to the script so that a user
can enable that option, if needed.

I currently haven't much time to run tests, but if you want you can try
to change how the CPU thread interacts with the OS scheduler when
waiting for results from the device.
You can do it by using the 'cudaSetDeviceFlags()' function, changing the
'cudaDeviceScheduleAuto' flag (the default) to 'cudaDeviceScheduleSpin'
or 'cudaDeviceScheduleYield' and see how it changes the result.
See section 4.4.2.6 of the CUDA Toolkit Reference Manual for more details.
If this isn't the reason, there is something to be investigated.

Very interesting that you've started working on porting OpenCL, your
work is finished on the CUDA version? We can start thinking about how to
integrate your work in the main development branch?