I thought I'd give you an update on streams, since I recently
implemented them in engine-cuda.
First of, implementing streams is not that straightforward and does
_not_ work with the existing transferDtoH and transferHtoD functions.
That's because the call to the single encrypt/decrypt functions of the
CUDA modules (bf, des, etc) calls a copy to the device, executes the
kernel, and then calls a copy from the device. With page-locked memory,
this copy-calls also include a host-based memcpy, which is always
blocking, so no call to a crypt-function can start before the previous
has been finished, i.e. has called it's own transferDtoH memcpy.
What I did was to create an array of pointers (one entry for each
stream) with pointers to host/device memory each, initialize that array
like usual, and then manually memcpy input data to the single pointer
locations in a for-loop in e_cuda.c. After this, all the host-side
memory resides in page-locked destinations and the non-blocking calls to
transferHtoD and subsequent kernel-calls can begin. When all the streams
return (calling cudaThreadSynchronize) I simply memcpy the output from
page-locked memory to the output-area given by OpenSSL.
Now, another approach I tried was simply mlocking the memory pages
supplied by OpenSSL. This requires super-user privileges and did not
turn out to be any faster.
So, I did some test using streams and have uploaded the corresponding
graphs to http://avalon.hoffentlich.net/~heipei/tmp/engine-cuda/. The
00_streams is a plot of before, using page-locked memory. As you can
see, the performance gain is tiny at best. The test were performed over
an average of 5 runs using a GTX 295.
Since I don't really see any benefit of using streams, I'm gonna store
that commit in a dormant branch and not further pursue the idea.
Implementing multi-gpu support would make more sense for usability, and
the only reason I'm not doing it is because I'm only going to measure
using a single GPU anyway.
So far,
greetings,
Jojo
--
Johannes Gilger <hei...@hackvalue.de>
http://heipei.net
GPG-Key: 0xD47A7FFC
GPG-Fingerprint: 5441 D425 6D4A BD33 B580 618C 3CDC C4D0 D47A 7FFC