mac os x version for 1008.nonac codes

21 views
Skip to first unread message

AviP

unread,
Aug 2, 2010, 3:24:08 PM8/2/10
to VSCSE Many-core Processors 2010
Hi,
has anyone adapted the nonac version of this worksop for MacOSX? I see
that the common dir is different from the previous one -- nonac.1007,
which some folks were able to adapt for OSX; similarly the lib dir
does not exist here.

Thanks

-- Avi

John Stratton

unread,
Aug 2, 2010, 3:52:54 PM8/2/10
to vscse-many-core...@googlegroups.com
We're working on it now.  Should have an update before the afternoon lab session, or before tomorrow's labs at the latest. 
================
John Stratton
217-621-9501
507 W Green St Apt 10
Champaign, IL 61820

Richard Spencer

unread,
Aug 4, 2010, 3:19:30 PM8/4/10
to vscse-many-core...@googlegroups.com
Has the Mac OS X nonac/online port been successful yet, by anyone?  There are many linux dependencies to convert.  If so, how?  If not, given up?
 
Many thanks

--
Richard M. Spencer



From: John Stratton <john.a....@gmail.com>
To: vscse-many-core...@googlegroups.com
Sent: Mon, August 2, 2010 3:52:54 PM
Subject: Re: [Many-core Processors] mac os x version for 1008.nonac codes

Xiao-Long Wu

unread,
Aug 4, 2010, 4:16:06 PM8/4/10
to vscse-many-core...@googlegroups.com
Hi Richard,

Could you provide more information regarding the compilation problems, such as execution log and the possible problems you suspect?

We checked the package and have no clue on your issue. Supposedly you should have no problems on the dependences as long as you're using the similar environment for other CUDA programs.

Thanks.

Xiao-Long

Gerard Richardson

unread,
Aug 4, 2010, 4:29:50 PM8/4/10
to vscse-many-core...@googlegroups.com

I haven't really been running on my Mac, but here's the changes I had to make to get things to compile:


--- a/benchmarks/stencil/src/cuda/file.cc Mon Aug 02 15:00:14 2010 -0700
+++ b/benchmarks/stencil/src/cuda/file.cc Mon Aug 02 15:40:03 2010 -0700
@@ -6,9 +6,18 @@
  *cr
  ***************************************************************************/
 
+#if defined(__APPLE__)
+#include <architecture/byte_order.h>
+#else
 #include <endian.h>
+#endif
+
 #include <stdlib.h>
+
+#if not defined(__APPLE__)
 #include <malloc.h>
+#endif
+
 #include <stdio.h>
 #include <inttypes.h>



And, I run .parboil (in the top-level directory from the AC tarfile), instead of parboil:
python .parboil compile stencil cuda

(note that the stencil code seeds the random number generator and compares to a default output set, so it reports a match on ac)



Macbook Pro
Mac OS X 10.5.8
cuda 3.1
python 2.6.5


Gerard

Xiao-Long Wu

unread,
Aug 4, 2010, 4:59:26 PM8/4/10
to vscse-many-core...@googlegroups.com
Thanks, Gerard, for the complete solution.

I guess you probably didn't know we also release another package for non-AC cluster machines. It's at https://hub.vscse.org/resources/103/ and ~xiaolong/CUDA_WORKSHOP_UIUC1008.online.tgz. The environment setup for the package on AC cluster is not developed for other environments.

Thanks.
Xiao-Long

Richard Spencer

unread,
Aug 4, 2010, 5:14:54 PM8/4/10
to vscse-many-core...@googlegroups.com
Your suggestion helped me get past that problem, but I suspect the build and run problems I am still having are due to Snow Leopard 64-bit mode.  I suppose I need to give some build FLAGS for arch=i386 and choose the 32-bit version of Python.  A "$make all" in common only builds the first target.

For example, running stencil:
.
.
ld: warning: in /Volumes/Projects/CUDA/vscsecourse/UIUC1008/CUDA_WORKSHOP_UIUC1008.online/common/lib/libparboil.a, file was built for unsupported file format which is not the architecture being linked (i386)
Undefined symbols:
  "_pb_FreeParameters", referenced from:
      _main in main.o
  "outputData(char*, float*, int, int, int)", referenced from:
      _main in main.o
  "_pb_SwitchToTimer", referenced from:
.
.

I want to get this working on a Mac because that is what I do and will use.  We got the builds to work on the "Introduction to CUDA" course previously.

Thanks again

MacBook Pro 6,2 (Also Mac Pro)
OS X 10.6.4
Darwin Kernel Version 10.4.0: Fri Apr 23 18:27:12 PDT 2010; root:xnu-1504.7.4~1/RELEASE_X86_64 x86_64
CUDA 3.1
Python 2.6.1
--

Richard M. Spencer


From: Gerard Richardson <gerardri...@gmail.com>
To: vscse-many-core...@googlegroups.com
Sent: Wed, August 4, 2010 4:29:50 PM

Richard Spencer

unread,
Aug 4, 2010, 5:18:48 PM8/4/10
to vscse-many-core...@googlegroups.com
BTW.  I am using the latest .online version.  Gerard's fix is helpful and valid for that one as well.
 
--

Richard M. Spencer




From: Xiao-Long Wu <xiao...@illinois.edu>
To: vscse-many-core...@googlegroups.com
Sent: Wed, August 4, 2010 4:59:26 PM

Joshua A. Anderson

unread,
Aug 4, 2010, 5:21:31 PM8/4/10
to vscse-many-core...@googlegroups.com
> ld: warning: in /Volumes/Projects/CUDA/vscsecourse/UIUC1008/CUDA_WORKSHOP_UIUC1008.online/common/lib/libparboil.a, file was built for unsupported file format which is not the architecture being linked (i386)


Check for -m32 command line options in the makefile. Also, I presume you have CUDA 3.1 on your mac? That is capable of producing 64-bit builds - and they are the default if -m32 or -m64 is not on the command line.

- Josh

Xiao-Long Wu

unread,
Aug 4, 2010, 5:24:36 PM8/4/10
to vscse-many-core...@googlegroups.com
Did you "source env.sh" before you start working on the labs?

In the package for online users, we have the library rebuilt when you "source env.sh". Hence, the libparboil.a should be compatible with your compilation environment.

Xiao-Long

Xiao-Long Wu

unread,
Aug 4, 2010, 5:25:34 PM8/4/10
to vscse-many-core...@googlegroups.com
Yes, Josh's suggestion should work.

João Barbosa

unread,
Aug 4, 2010, 5:31:46 PM8/4/10
to VSCSE Many-core Processors 2010
If I may help,

In common/mk/common.mk

Change the search these vars to match...

CFLAGS=-m32 $(GCCSTD) $(INCLUDEFLAGS) -O3 $(EXTRA_CFLAGS)
CXXFLAGS=-m32 $(INCLUDEFLAGS) -O3 $(EXTRA_CXXFLAGS)
LDFLAGS=-m32 -L$(PARBOIL_ROOT)/common/lib $(EXTRA_LDFLAGS)
LIBS=-lparboil $(EXTRA_LIBS)

on cuda.mk make sure ro remove in the lib path the 64...

Delete the libs in common and do source env.sh
that should solve the problem, at least for me...



On Aug 4, 4:25 pm, Xiao-Long Wu <xiaol...@illinois.edu> wrote:
> Yes, Josh's suggestion should work.
> On 08/04/2010 04:21 PM, Joshua A. Anderson wrote:ld: warning: in /Volumes/Projects/CUDA/vscsecourse/UIUC1008/CUDA_WORKSHOP_UIUC1008.online/common/lib/libparboil.a, file was built for unsupported file format which is not the architecture being linked (i386)Check for -m32 command line options in the makefile. Also, I presume you have CUDA 3.1 on your mac? That is capable of producing 64-bit builds - and they are the default if -m32 or -m64 is not on the command line. - Josh

Gerard Richardson

unread,
Aug 4, 2010, 5:38:46 PM8/4/10
to vscse-many-core...@googlegroups.com
Xiao-Long,

  I downloaded the non-AC version and looked at it.  I certainly appreciate you taking the time to put it together.  However, I took it as a ... challenge ... to try something off the beaten path.

  I'm using mercurial for version control, and to push and pull changesets between AC and my local machine (via ssh).  So I started with the AC package, and then folded in the diffs from the non-AC package.  It's not perfect (I'm wrestling with symlinks on a case-by-case basis), but it makes keeping track of my edits between the two machines easier.  For what it's worth, I still do most (all ?) of my runs on AC.

Richard Spencer

unread,
Aug 4, 2010, 5:45:42 PM8/4/10
to vscse-many-core...@googlegroups.com
1) Yes, I source env.sh
2) I apply Gerard's fix
3) I "make all" in common/src, only builds libparboil.a
4) I try stencil, for example, and get the error

It looks like my make (in 3) builds x86_64 but the run builds and expects i386 (despite below)

I did not previously have to alter my build or boot environment for "Intro to CUDA"

As Josh suggested, I removed -m32 from the common/mk/cuda.mk.

Still NoGo.

Thanks
 
--

Richard M. Spencer




From: Xiao-Long Wu <xiao...@illinois.edu>
To: vscse-many-core...@googlegroups.com
Sent: Wed, August 4, 2010 5:24:36 PM

AviP

unread,
Aug 4, 2010, 5:48:19 PM8/4/10
to VSCSE Many-core Processors 2010
Just to reiterate, Joao suggestion fixes the link errors, and I was
able to run.

Thanks a lot.

-- Avi

Heeseok Kim

unread,
Aug 4, 2010, 5:50:17 PM8/4/10
to vscse-many-core...@googlegroups.com
In 3),  it should build some other files like libsol2bcpu.a. If you don't see such file, I recommend you to fetch updated package which contains some fixes that we've solved so far, hopefully some of your problems will be gone with it.

ffoe...@gmail.com

unread,
Aug 4, 2010, 6:24:50 PM8/4/10
to vscse-many-core...@googlegroups.com
oh, you should leave m32, but instead change.  I included my makefiles for you


CUDALDFLAGS=$(LDFLAGS)                                    \
-lcuda -L$(CUDAHOME)/lib64 -L$(PARBOIL_ROOT)/common/lib $(EXTRA_CUDALDFLAGS)

to 

CUDALDFLAGS=$(LDFLAGS)                                    \
-lcuda -L$(CUDAHOME)/lib -L$(PARBOIL_ROOT)/common/lib $(EXTRA_CUDALDFLAGS)

makefiles-osx.zip

AviP

unread,
Aug 4, 2010, 10:00:02 PM8/4/10
to VSCSE Many-core Processors 2010
After the changes above, my build works fine on my Mac OS X, but my
run fails.. I wonder if anyone has seen this..

make[1]: Nothing to be done for `all'.
nvcc -m32 -L/Users/apurkaya/NREL/sw/CUDA/UIUC1008/
CUDA_WORKSHOP_UIUC1008.nonac/common/lib -lm -lpthread -lcuda -L/lib -L/
usr/local/cuda/lib -L/Users/apurkaya/NREL/sw/CUDA/UIUC1008/
CUDA_WORKSHOP_UIUC1008.nonac/common/lib build/cuda_default/main.o
build/cuda_default/file.o /Users/apurkaya/NREL/sw/CUDA/UIUC1008/
CUDA_WORKSHOP_UIUC1008.nonac/common/lib/parboil_cuda.o -o build/
cuda_default/stencil -lcuda -lparboil -lm -lstdc++
ld: warning: directory '/lib' following -L not found
CUDA accelerated 7 points stencil codes****
Original version by Li-Wen Chang <lcha...@illinois.edu>
This version maintained by Chris Rodrigues ***********
CUDA error: invalid argument, line 115
Run failed!

This is with the original shared memory version which is not altered.
So I suspect something in my environment is not quite right.

Any suggestions?

Thanks

-- Avi

Richard Spencer

unread,
Aug 4, 2010, 11:14:42 PM8/4/10
to vscse-many-core...@googlegroups.com
Following João's suggestions exactly, in addition to previous and following ones, including Gerard's file.cc suggestion, got me working, almost.  There are still some unexpected residual problems like stat.st_size returning 0 in the lbm test.

$ ./parboil run stencil cuda default
.
CPU/GPU Overlap: 0.000085
-2354.47851562 -25.9408454895
Mismatch
Output checking tool detected a mismatch

$ ./parboil run lbm cuda short
.
MAIN_parseCommandLine:
size of file 'input/short/100_100_130_ldc.of' is 0 bytes
expected size is 1313130 bytes
Run failed!
 

Thanks again
--

Richard M. Spencer


From: João Barbosa <barbosa...@gmail.com>
To: VSCSE Many-core Processors 2010 <vscse-many-core...@googlegroups.com>
Sent: Wed, August 4, 2010 5:31:46 PM
Subject: [Many-core Processors] Re: mac os x version for 1008.nonac codes

If I may help,

  In common/mk/common.mk

  Change the search these vars to match...

CFLAGS=-m32 $(GCCSTD) $(INCLUDEFLAGS) -O3 $(EXTRA_CFLAGS)
CXXFLAGS=-m32 $(INCLUDEFLAGS) -O3 $(EXTRA_CXXFLAGS)
LDFLAGS=-m32 -L$(PARBOIL_ROOT)/common/lib $(EXTRA_LDFLAGS)
LIBS=-lparboil $(EXTRA_LIBS)

on cuda.mk make sure ro remove in the lib path the 64...

Delete the libs in common and do source env.sh
that should solve the problem, at least for me...



ffoe...@gmail.com

unread,
Aug 5, 2010, 5:11:39 AM8/5/10
to vscse-many-core...@googlegroups.com
I confirm this also.  I suspected the 0 was really that it was overflowing the %i... I changed %i to %lld in main.cu in the lbm src folder.
       "\tsize of file '%s' is %lld bytes\n"

But now I get differing sizes...

LBM_allocateGrid: allocated 105.3 MByte
MAIN_parseCommandLine:
        size of file 'input/short/100_100_130_ldc.of' is 5639850405396480 bytes
        expected size is 234881025 bytes
Run failed!

AviP

unread,
Aug 5, 2010, 2:46:40 PM8/5/10
to VSCSE Many-core Processors 2010
I took another stab at this problem. First I built in a new dir with
minimum changes --such as what Gerard and Joao have suggested. The
deviceQuery builds/runs fine.

1) With the stencil code, initially I was getting the same runtime
error for both "naive" and "shared" where the file contents are not
altered..

CUDA error: invalid argument, line 115
Run failed!

I took out the CUERR macro call and the run proceeded; so there was a
conflict of that error check on the Mac

However, now I get an error similar to some of other users had seen
already..

:
CPU/GPU Overlap: 0.000104
-2354.47851562 0.0
Mismatch
Output checking tool detected a mismatch

Is this a data-type recognition problem on the mac env? If so, are we
missing a header file that needs to be included?

2) On the lbm code..

:
MAIN_parseCommandLine:
size of file 'input/short/100_100_130_ldc.of' is 0 bytes
expected size is 1313130 bytes
Run failed!

Again, same as what others have already seen!

sigh :(

-- Avi

Joshua A. Anderson

unread,
Aug 5, 2010, 2:48:39 PM8/5/10
to vscse-many-core...@googlegroups.com
On Aug 5, 2010, at 2:46 PM, AviP wrote:

> I took out the CUERR macro call and the run proceeded; so there was a
> conflict of that error check on the Mac

An "invalid argument" error indicates that the kernel is not even running - hence the garbage output. Perhaps you should check the arguments specified in the CUDA call just prior to the CUERR check and determine which one of them is invalid.

- Josh

AviP

unread,
Aug 5, 2010, 4:32:25 PM8/5/10
to VSCSE Many-core Processors 2010
Further digging revealed, that I was using the GeForce 9400M -- power
saving one. When I changed to GeForce 9600M GT, which has double the
amount of global memory, I was not writing garbage..

However I am still getting mismatch. Since I am calling the stencil
"naive" and "shared" functions which have no code edits, the question
is if the binary comparison is done the right way on the Macs.

Comments?

Thanks

-- Avi

Gerard Richardson

unread,
Aug 5, 2010, 4:38:05 PM8/5/10
to vscse-many-core...@googlegroups.com

For the stencil (and many of the other labs), the input is generated
by seeding the random number generator. When you skip off to a Mac
(obviously without the same random number generator as AC's libc),
you'll get mismatches if you compare to the AC-provided outputs.

If you're convinced the naive implementation is doing the right thing
on your Mac, save off that output file, and use that as your reference.

(at least, that's what I did for lab 2b)

Gerard

AviP

unread,
Aug 5, 2010, 4:47:42 PM8/5/10
to VSCSE Many-core Processors 2010
Ah, that makes sense.

Thanks Gerard.

On Aug 5, 3:38 pm, Gerard Richardson <gerardrichard...@gmail.com>
wrote:

Xiao-Long Wu

unread,
Aug 5, 2010, 5:42:17 PM8/5/10
to vscse-many-core...@googlegroups.com
Thanks, Gerard. You're right.

We were thinking what's the possible cause regarding this until you posted this possibility. The input data set is generated each time you run the program in order to save the I/O time and disk space. Hence as long as your system is on the AC cluster, the output results shall be the same as the ones we provided.

We're working on providing a consistent and randomly-distributed input data set on all systems. Hopefully this shall resolve the mismatches on some systems.

Xiao-Long

Prakashan Korambath

unread,
Aug 5, 2010, 6:19:53 PM8/5/10
to vscse-many-core...@googlegroups.com
This is what I did in benchmarks/stencil/build/cuda_default directory.

./stencil 512 512 64 -o out

cd  to this directory benchmarks/stencil/output/default

(Copied the existing out file to some other directory)

cp ../../build/cuda_default/out 512x512x64.out


CUDA accelerated 7 points stencil codes****
Original version by Li-Wen Chang <lcha...@illinois.edu>
This version maintained by Chris Rodrigues  ***********
IO:      0.883505
GPU:     0.085878
Copy:    0.216833
Driver:  0.000044
Compute: 0.280596
CPU/GPU Overlap: 0.000090
Pass

I couldn't test it on my MAC book so far because it was stuck on showing the WebEx.  :-)


Prakashan
Reply all
Reply to author
Forward
0 new messages