Harlan does not detect nVidia GPU (also make check fails)

83 views
Skip to first unread message

bozhkov...@gmail.com

unread,
Dec 5, 2013, 10:46:33 PM12/5/13
to harla...@googlegroups.com
I am trying to compile Harlan on a MBP 2013.

I have installed nVidia CUDA kit (cuda-mac-5.5.28) and Petite Cheze Scheme (threaded, 64 bit for mac).
When I do `make check` on the current repo version (as of 6/Dec/2013) I get:

Some tests failed:
test/triangle-vector-kernel-reduce.kfc
test/test-set.kfc
test/issue-61.kfc
Successes: 134; Failures: 3; Ignored: 20; Total: 157
make: *** [check] Error 1

What is even more bizarre is that during the tests harlan prints this out:
`Creating queue for Intel(R) Core(TM) i7-4850HQ CPU @ 2.30GHz`

Correct me if I'm wrong but this is not the right device?
I haven't install Intel's OpenCL SDK, because I would like harlan to run on the nVidia card instead of on the Intel Iris one. (btw the Intel OpenCL SDK link is broken on the repo's read me)

Hardware Overview:
Mac OS: 10.9
Model Name: MacBook Pro
Model Identifier: MacBookPro11,3
Processor Name: Intel Core i7
Processor Speed: 2.3 GHz
Number of Processors: 1
Total Number of Cores: 4
L2 Cache (per Core): 256 KB
L3 Cache: 6 MB
Memory: 16 GB
Boot ROM Version: MBP112.0138.B02
SMC Version (system): 2.19f3

Graphics/Displays:

Intel Iris Pro:

Chipset Model: Intel Iris Pro
Type: GPU
Bus: Built-In
VRAM (Total): 1024 MB
Vendor: Intel (0x8086)
Device ID: 0x0d26
Revision ID: 0x0008
gMux Version: 4.0.8 [3.2.8]

NVIDIA GeForce GT 750M:

Chipset Model: NVIDIA GeForce GT 750M
Type: GPU
Bus: PCIe
PCIe Lane Width: x8
VRAM (Total): 2048 MB
Vendor: NVIDIA (0x10de)
Device ID: 0x0fe9
Revision ID: 0x00a2
ROM Revision: 3765
gMux Version: 4.0.8 [3.2.8]

Any help would be appreciated.

Thanks,
Atanas

Eric Holk

unread,
Dec 6, 2013, 1:10:43 AM12/6/13
to harla...@googlegroups.com, bozhkov...@gmail.com
What happens if you do `HARLAN_DEVICE=gpu make check`?

Since you're running on a Mac, my guess is it's using Apple's OpenCL implementation instead of the NVIDIA one that you installed. The HARLAN_DEVICE variable will at least force it to use a GPU, although it might still use Apple's version instead of the NVIDIA one.

For the tests that fail, would you mind sharing the output from those programs? The test binaries are saved in the test.bin directory, so if you run, for example, `test.bin/test-set.kfc.bin`, what is the output from that?

Thanks for pointing out the broken link in the readme.

-Eric

Eric Holk

unread,
Dec 6, 2013, 10:53:38 AM12/6/13
to Atanas Bozhkov, harla...@googlegroups.com
Thanks, this is helpful.

Unfortunately, I'm not seeing this issue on any machine I have access too. It seems like either Harlan is making some OpenCL calls that are technically illegal but handled by most implementations, or maybe the size for some value doesn't agree between the Harlan code and the OpenCL code.

If it's not too much trouble, would you mind uncommenting some debug statements and running `make check` again? The statements are in `rt/harlan.cpp`, in the `map_region` function around lines 90-132. There are a couple of `printf` statements in there that are currently commented out, but uncommenting these should give us a little better idea what's going on.

Thanks,
Eric


On Fri, Dec 6, 2013 at 1:27 AM, Atanas Bozhkov <bozhkov...@gmail.com> wrote:
Thank you Eric.

The HARLAN_DEVICE variable did the trick and now `make check` is forcing the nVidia card.
There are 5 tests that fail:
    `Some tests failed:
        test/nbody2.kfc
        test/nbody.kfc
        test/lambda6.kfc
        test/kernel-fact-acc.kfc
        test/interp-lambda3.kfc`

Here are the respective error logs:
nbody2.kfc             - http://pastebin.com/1LpcFZES
nbody.kfc               - http://pastebin.com/FWtev3B1
lambda6.kfc           - http://pastebin.com/qKrAkduu
kernel-fact-acc.kfc - http://pastebin.com/p6KrPP6T
interp-lambda3.kfc - http://pastebin.com/rrBmUjHm

Hope this is useful. Tell me if you need anything else. 

Thanks, 
Atanas

Eric Holk

unread,
Dec 12, 2013, 4:59:05 PM12/12/13
to Atanas Bozhkov, harla...@googlegroups.com
How recent was the version you used before? The only thing I can think of is the change to box complex kernel arguments [1], but that was at least a week ago at this point.

I'm not really sure what's going on here. Perhaps it's timing related. It seems to have something to do with different OpenCL implementations (unfortunately, they vary wildly in quality). I think the best thing for now is to keep an eye on this issue and see if we can get more data to fix it in the future.

I've opened an issue to track this: https://github.com/eholk/harlan/issues/111


On Thu, Dec 12, 2013 at 12:16 AM, Atanas Bozhkov <bozhkov...@gmail.com> wrote:
Just an update - I pulled the latest changes, recompiled the runtime with the change you mentioned - now all test seem to run fine. I am not sure what caused this, have you pushed any related fixes to the  repo?

Successes: 137; Failures: 0; Ignored: 21; Total: 158

All tests succeeded.


Thanks,
Atanas 



On Dec 10, 2013, at 9:15 PM, Eric Holk <eric...@gmail.com> wrote:

Just to let you know, I was able to repro this issue on a machine here. I didn't get all the failures you did, but nbody and nbody2 failed in the same way. I'll keep investigating.

-Eric


On Tue, Dec 10, 2013 at 12:23 AM, Eric Holk <eric...@gmail.com> wrote:
I just pushed some code out to the repo that will have Harlan programs report the platform they use as well as the device. This should tell you whether you're using NVIDIA's or Apple's.

Thanks for your patience and willingness to help!


On Tue, Dec 10, 2013 at 12:13 AM, Atanas Bozhkov <bozhkov...@gmail.com> wrote:
How can I check which distribution is currently being used? 

I have in my System Preferences the CUDA pref pane which says that I have the latest driver.
Is there anything else I can run in order to be certain?

As for the rest - I will recompile tomorrow and report back by the end of the day.

Atanas
On 10 Dec 2013, at 04:54, Eric Holk <eric...@gmail.com> wrote:

Well, those look like the numbers I'd expect to see.

Do you know for sure whether this run is using the OpenCL that ships with Mac OS or the one that comes with the CUDA SDK?

Unfortunately I haven't been able to duplicate this on any of the machines I regularly test on, but someone in my lab has a machine that's closer to your configuration. I will give it a try there tomorrow and see if I can get a local repro.

The part about "Some previous asynchronously enqueued event on this queue retured an error" makes me think maybe the kernel execution failed and this isn't being reported until we try to read the results of the kernel back.

Assuming I can get a repro tomorrow, probably the next thing I'll try is commenting out the kernel calls and seeing how things change. If you want to give it a try too, here's what changes to make:

- comment out the call to clEnqueueNDRangeKernel on line 269 of rt/cl++.cpp
- rebuild the runtime with `make -C rt`
- run just the failing test with `./run-tests nbody2.kfc`, etc. You could run all the tests, but I'm guessing not executing the kernels will cause many of them to fail, so there's not much point in running them.

Other than that, I'm trying to think of what those tests have in common the others don't that might cause only these to fail...

-Eric



On Fri, Dec 6, 2013 at 12:00 PM, Atanas Bozhkov <bozhkov...@gmail.com> wrote:
Hey Eric,

I’d be happy to do the debugging!
I’ve uncommented the lines you mentioned.
Here are the results’ from running the tests again:

nbody2.kfc            - http://pastebin.com/MVnbYZ6i
nbody.kfc               - http://pastebin.com/U54RVxBf
lamda6.kfc             - http://pastebin.com/Z4Cxn2k9
kernel-fact-acc.kfc - http://pastebin.com/71FzqeA9
interp-lambda3.kfc - http://pastebin.com/bDFF3AB0

Keep me posted. 
Reply all
Reply to author
Forward
0 new messages