Engine OpenCL ;)

Johannes Gilger

unread,

Feb 22, 2011, 10:29:06 AM2/22/11

to engine-cuda

Hi Paolo,

just wanted to let you know that I've started work on an OpenCL-version
of engine-cuda, for now just called engine-OpenCL or whatever.

The way I'm doing it right now is by having a completely separate set of
files (e_opencl.c etc), so it does not interfere at all with the
existing engine-cuda code. Obviously, building it yields a second .so
file, and openssl has to be called using -engine opencl. I determined
this to be the cleanest approach.

I don't know how interested you are in OpenCL, but I have to include it
for my thesis. Personally I'd rather stick with CUDA since it is a lot
easier to use and I only own NVIDIA devices.

For now I've got Blowfish ECB and DES ECB functional, with Blowfish
still lagging behind because of some unresolved compiler-issues. DES
however works fine, and using openssl speed has shown that it almost
matches the speed of the CUDA version. This is not that surprising if
one looks at the PTX generated by both versions (even register use is
the same). The only real advantage of OpenCL for owners of NVIDIA-cards
might be that using multiple devices is supported out of the box,
without having to use OpenMPI or something similar manually, which might
come in handy if this ever to be supported.

I'm still getting strange results on one of my test-hosts (constant
number of blocks encrypted for each blocksize), but I hope to resolve
this soon.

Greetings,
Jojo

--
Johannes Gilger <hei...@hackvalue.de>
http://heipei.net
GPG-Key: 0xD47A7FFC
GPG-Fingerprint: 5441 D425 6D4A BD33 B580 618C 3CDC C4D0 D47A 7FFC

Paolo Margara

unread,

Feb 23, 2011, 4:00:03 AM2/23/11

to engine-...@googlegroups.com

Hi Johannes,

Congratulations for porting the engine_cudamrg to OpenCL, sooner or
later a porting of this engine for that API would be entering my roadmap.
Also I find the approach you have used to be the cleanest possible.

We could think if start a parallel project or include the new version of
the engine in the current project as a parallel branch.

I'm not very convinced that a hypothetical multiple device's support
available out of the box could be a real advantage from the use of
OpenCL for NVIDIA user's; probably when they use two or more graphics
cards they drive them through SLI, that is managed in a transparent
manner in CUDA.

I'd be more curious to see how the engine OpenCL behaves with other
devices that support that API, such as CELL processors or AMD GPU.

Have you thought about when we could plan a merge of the changes you
made at the engine_cudamrg into main development branch?

Greetings,
Paolo

Johannes Gilger

unread,

Feb 23, 2011, 4:24:24 AM2/23/11

to engine-...@googlegroups.com

On 23/02/11 10:00, Paolo Margara wrote:
> We could think if start a parallel project or include the new version of
> the engine in the current project as a parallel branch.

Yes, that would certainly be possible. The nice thing about having
disjunct sets of files and using git is that I can rewrite the history
at any point and have two trees (one for OpenCL and one for CUDA). The
only thing that might be duplicated in the course of a separation are
common header files (which declare block-sizes for different algorithms,
or the EVP-declaration functions etc).

> I'm not very convinced that a hypothetical multiple device's support
> available out of the box could be a real advantage from the use of
> OpenCL for NVIDIA user's; probably when they use two or more graphics
> cards they drive them through SLI, that is managed in a transparent
> manner in CUDA.

I'm not that keen on multi-GPU support myself. Probably the only reason
I considered it is that my fastest test-machine is equipped with four
GTX 295. The GTX 295 is one of those rare cards which has two PCBs, and
presents them to the system as two devices. So, although the GTX 295
blows your GTX 275 out of the water in overall FLOPs, a single processor
on the card is somewhat slower (see clock speed etc) than the GTX 275.
See this table for reference:
http://en.wikipedia.org/wiki/GeForce_200_Series#Technical_summary

> I'd be more curious to see how the engine OpenCL behaves with other
> devices that support that API, such as CELL processors or AMD GPU.

Yeah, me too. Maybe I can get my hands on one of these devices, even if
its just an AMD GPU.

> Have you thought about when we could plan a merge of the changes you
> made at the engine_cudamrg into main development branch?

The way upstream (your repo) is progressing at the moment, and the
amount of trash in my repo at the moment makes me think the best
approach would be to
1. wait until my thesis is finished (early May)
2. clean up the repo, rewriting the history where necessary, maybe
squashing some commits
3. rebasing it onto yours OR
3. keep using git as a backend and move to github (just the code, the
google project is nice apart from svn) OR
3. deleting the svn-history and replacing it with my merged timeline

Another thing I discovered which kind of spoiled my day was that the
apparent superiority of the GPU for block ciphers vanishes if you max
out your CPU. Try this for example (on your Core2Duo E8400) and see what
happens:

openssl speed -evp aes-128-ecb -multi 4

My machine with the GTX 295s for example is equipped with an Core i7 960
(4 cores, HT), and using -multi 8 I almost reach the speed of one GTX
295 processor :(

Reply all

Reply to author

Forward