About generating GPU traces using gpuocelot

234 views
Skip to first unread message

evani...@gmail.com

unread,
Jul 16, 2015, 10:03:49 AM7/16/15
to macsi...@googlegroups.com
Hi,
I am making GPU traces using CUDA sdk and gpuocelot.
And I think I can link my program with libocelot.so and libocelotTrace.so successfully. (Using ldd to examine that)
However, when I execute my program a.out, it cannot produce trace completely with the following error messages:

solim@ubuntu:~/gpuocelot$ ./a.out
all enabled
(0.014526) X86TraceGenerator.cpp:773:  New kernel launched
(0.014679) X86TraceGenerator.cpp:774:  compute version:2.0
(0.014750) X86TraceGenerator.cpp:775:  grid  1 x 1 x 1
(0.014803) X86TraceGenerator.cpp:776:  block 20 x 1 x 1
(0.014842) X86TraceGenerator.cpp:777:  number of warps per block:1
(0.014879) X86TraceGenerator.cpp:778:  number of total warps:1
(0.014915) X86TraceGenerator.cpp:779:  # threads per block : 32
(0.014952) X86TraceGenerator.cpp:780:  number of register per thread:0
(0.015036) X86TraceGenerator.cpp:781:  number of shared memory per thread:0
(0.015082) X86TraceGenerator.cpp:819:  max blocks per core : 8
(0.021116) X86TraceGenerator.cpp:1080: mkdir -p /home/solim/gpuocelot/GPUtrace//_Z11helloKernelv_0/ (status 0)

(0.021212) X86TraceGenerator.cpp:1081: errno is 2 message is No such file or directory

a.out: traces/implementation/X86TraceGenerator.cpp:1619: virtual void trace::X86TraceGenerator::event(const trace::TraceEvent&): Assertion `0' failed.
Aborted (core dumped)

What kind of problem is it?
Or is there any standard tutorial about successfully generating GPU traces using gpuocelot I can refer to?
Thanks for your kind help!!
 

Ramyad H

unread,
Jul 16, 2015, 2:58:32 PM7/16/15
to macsi...@googlegroups.com
Can you execute the program with just linking with libocelot.so?

Evania Tsai

unread,
Jul 16, 2015, 3:20:42 PM7/16/15
to macsi...@googlegroups.com
Yes, I can. 
The program only linking with libocelot.so can completely execute.


Ramyad H於 2015年7月17日星期五 UTC+8上午2時58分32秒寫道:

Ramyad Hadidi

unread,
Jul 16, 2015, 3:35:25 PM7/16/15
to macsi...@googlegroups.com
Which version of ocelot are you using? Did you get it from github?

--

---
You received this message because you are subscribed to the Google Groups "Macsim Developer" group.
To unsubscribe from this group and stop receiving emails from it, send an email to macsim-dev+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ramyad Hadidi

unread,
Jul 16, 2015, 4:25:15 PM7/16/15
to macsi...@googlegroups.com
Evania,

You can follow these instructions:

Prepare a machine that has CUDA-enabled GPU in it, and install Ubuntu 10.04 (or 11.10 would be fine but not higher). 
     1. sudo apt-get install gcc-4.6 g++-4.6 build-essential
     2. sudo /etc/init.d/gdm stop && sudo ./devdriver_4.2_linux_64_295.41.run
     3. sudo ./cudatoolkit_4.2.9_linux_64_ubuntu10.04.run
     4. ./gpucomputingsdk_4.2.9_linux.run
     5. sudo /etc/init.d/gdm start
     6. cd /home/hparch/NVIDIA_GPU_Computing_SDK/C/common && make
     7. cd /home/hparch/NVIDIA_GPU_Computing_SDK/C && make
     8. If everything goes well, then you should see Finished building all.

Install CUDA 4.2 (not higher)
     1. export PATH=/usr/local/cuda/bin:$PATH
     2. export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:$LD_LIBRARY_PATH
   (3). sudo ldconfig

Install Boost Library 1.49 (not higher)
     1. sudo apt-get install bzip2 libbz2-dev zip unzip python-dev zlib1g-dev
     2. mkdir -p ~/src && wget http://softlayer.dl.sourceforge.net/project/boost/boost/1.49.0/boost_1_49_0.tar.gz&& tar xzf boost_1_49_0.tar.gz
     3. ./bootstrap.sh
     4. ./b2 
     5. Only if nothing goes wrong, sudo ./b2 install (to default location: /usr/local/lib and /usr/local/include)

Install Hydrazine
     1. cd ~/src && git clone https://github.com/gtcasl/hydrazine.git

Install GPUOcelot
     1. sudo apt-get install scons flex bison freeglut3-dev libglew1.5-dev
     2. cd ~/src
     3. git clone https://github.com/gtcasl/gpuocelot.git
     4. cd gpuocelot/ocelot && rm -rf hydrazine && ln -s /home/hparch/src/hydrazine
     5. modify SConscript in which change (libboost_filesystem-mt, boost_system-mt) to (libboost_filesystem, boost_system)
     6. sudo ./build.py —install —no_llvm —thread 4 && sudo ldconfig
     7. export LD_LIBRARY_PATH=/usr/local/lib:$LD_LIBRARY_PATH
     8. cd /home/hparch/src/gpuocelot/trace-generators
     9. ln -s /home/hparch/src/hydrazine
     10. modify SConstruct in which change (libboost_filesystem-mt, boost_system-mt, boost_serialization-mt) to (libboost_filesystem, boost_system, boost_serialization)
     11. scons
     12. Only if nothing goes wrong, sudo scons install

Regression Tests
     1. cd /home/hparch/src/gpuocelot/tests/cuda4.1sdk
     2. comment out line 164 - env.Append(LIBPATH = [os.path.join(env[‘install_path'], 'lib')])
     3. scons
     4. If everything goes well, then you should seescons: done building targets.
     5. execute any application as a test: for instance, .release_build/Transpose

Trace Generation
     Prerequisite (Python 2.7 - gpu_tracegen.py requires Python version higher than 2.7 due to argparse)
          1. mkdir ~/src && wget https://www.python.org/ftp/python/2.7.9/Python-2.7.9.tgz
          2. tar xvf Python-2.7.9.tgz && cd Python-2.7.9
          3. ./configure && make && sudo make install
          4. sudo rm /usr/bin/python /usr/bin/python2 /usr/bin/python-config and create a link to /usr/local/bin/python2.7 …

Notes:  If you exit the terminal you should export PATH and LD_LIBRARY_PATH again and do sudo ldconfig.
        For trace generation with Ocelot you can check MacSim documentation and Ocelot's as well.
        

References
- https://github.com/gtcasl/gpuocelot/wiki/Installation
- https://code.google.com/p/gpuocelot/

-L/usr/local/cuda/lib64 -L../../lib -L../../common/lib/linux -L../../../shared//lib -lcudart    -L/usr/local/cuda/lib64 -L../../lib -L../../common/lib/linux -L../../../shared//lib -lcudart

Evania Tsai

unread,
Jul 22, 2015, 3:19:17 AM7/22/15
to Macsim Developer
Hi Ramyad,
I remember gpuocelot can let the machine without  GPU execute the cuda problem... is it still right?
Then, I would like to know whether it can be availiable for GPU trace generation with a machine without CUDA-enabled GPU?
Or for trace generation, it is must to have CUDA-enabled GPU?



Ramyad H於 2015年7月17日星期五 UTC+8上午4時25分15秒寫道:

Ramyad Hadidi

unread,
Jul 22, 2015, 8:46:33 AM7/22/15
to macsi...@googlegroups.com

Gpuocelot does not need any CUDA enabled GPU on a machine. You can execute programs and generate traces as long as you have required libraries and gpuocelot support for program.

If you link libocelot, instead of cudart, program will execute without GPU. And if you link both libocelot and libocelotTrace and set some parameters it will generate the trace.

Evania Tsai

unread,
Jul 28, 2015, 2:03:37 PM7/28/15
to Macsim Developer, ramyad...@gmail.com
Hi,
Thanks for your help, I pass the regression test.
However, when i am making GPU trace, I encouter this error...

Transpose: traces/implementation/X86TraceGenerator.cpp:836: virtual void trace::X86TraceGenerator::initialize(const executive::ExecutableKernel&): Assertion `ir::PTXInstruction::Nop == 82' failed.
Aborted

I found a solution before from the gpuocelot groups by adding two NoP instruction MAD64 SHFL64, but it didn't work.
Can you tell me about the standard solution to fix this?

Ramyad H於 2015年7月22日星期三 UTC+8下午8時46分33秒寫道:

Ramyad Hadidi

unread,
Jul 28, 2015, 2:15:32 PM7/28/15
to Evania Tsai, Macsim Developer

We have updated Macsim and gpuocelot for newer trace version. Possibly it will solve that. However, your solution is fine as well.

Evania Tsai

unread,
Jul 29, 2015, 12:10:06 AM7/29/15
to Macsim Developer, ramyad...@gmail.com

Hi,
I got gpuocelot and macsim from the github. The ones from github are the lastest version, right?
But I keep encountering the same error (NoP == 82 failed...) during the trace generation after I rebuild gpuocelot.
Is there any detailed tutorial about trace generation that I can refer to?
Thanks for help!!!
Ramyad H於 2015年7月29日星期三 UTC+8上午2時15分32秒寫道:

Hyesoon Kim

unread,
Jul 29, 2015, 12:17:50 AM7/29/15
to macsi...@googlegroups.com, ramyad...@gmail.com
Can you share the application that you are generating the trace which causes an error? 

Hyojong Kim

unread,
Jul 29, 2015, 12:36:02 AM7/29/15
to macsi...@googlegroups.com, Hyojong Kim, Ramyad Hadidi
Hi Evania, 

MacSim in GitHub was a bit outdated (sorry for the inconvenience). Please download v2.1.1 release that I have just uploaded.

Sincerely
-Hyojong





Evania Tsai

unread,
Jul 29, 2015, 7:26:03 AM7/29/15
to Macsim Developer, ramyad...@gmail.com, hye...@cc.gatech.edu
Hi,
I use cuda4.1sdk applications now.

And almost the application in cuda4.1sdk such as Clock, Transpose...etc I tried, all meet the errors as follows.

Clock: traces/implementation/X86TraceGenerator.cpp:838: virtual void trace::X86TraceGenerator::initialize(const executive::ExecutableKernel&): Assertion `ir::PTXInstruction::Nop == 82' failed.

I have tried to add MADC, SHFL, but it still doesn't work. Then... I just comment out  this line : assert(ir::PTXInstruction::Nop == 82);.
The trace-generator can go on and generate some traces; however it cannot print out the successfully process message so that I don't know my trace-generator work fine or not.

What can I do for successful trace generation?
 

Hyesoon於 2015年7月29日星期三 UTC+8下午12時17分50秒寫道:

Hyesoon Kim

unread,
Jul 29, 2015, 8:21:18 AM7/29/15
to Evania Tsai, Macsim Developer, Ramyad Hadidi
Since we were in the middle of upgrading the trace version in the last few days, 
could you confirm whether you are using the latest Ocelot trace generation (trace version 1.4) and MacSim v 2.1.1 ? 

Evania Tsai

unread,
Jul 29, 2015, 8:30:12 AM7/29/15
to Macsim Developer, ramyad...@gmail.com, hye...@cc.gatech.edu

Hi,
I use ocelot from git clone https://github.com/gtcasl/gpuocelot.git ... sorry I don't know which version it should be.
And I am just making trace using gpuocelot only.
Should I make traces with both ocelot and macsim?
Hyesoon於 2015年7月29日星期三 UTC+8下午8時21分18秒寫道:

Hyesoon Kim

unread,
Jul 29, 2015, 8:34:07 AM7/29/15
to Evania Tsai, Macsim Developer, ramyad...@gmail.com
Ocelot trace generator was updated 2 days ago. Did you get the latest  version after that? 
Sorry for the confusion. Only Ocelot is used for generating traces and MacSim is used for reading the traces. 
Hyesoon 

Evania Tsai

unread,
Jul 29, 2015, 11:55:16 PM7/29/15
to Macsim Developer, ramyad...@gmail.com, hye...@cc.gatech.edu

Hi,
I go to download the latest version and build it now.
Then, I am trying to make traces with cuda4.1sdk and I have tried it with Mergesort and vectorAdd.
MergeSort seems to almost successful...
However, the size of traces would exceed 100M and  then aborted...
How can I set the appropriate size for successfully completing the trace generation?
But VectorAdd aborted very soon.. just as follows error messages.
[VectorAdd] test results...
PASSED
> exiting in 3 seconds: 3...2...1...done!
VectorAdd: Passes.cpp:468: virtual void llvm::TargetPassConfig::addMachinePasses(): Assertion `TPI && IPI && "Pass ID not registered!"' failed.
Aborted

Hyesoon於 2015年7月29日星期三 UTC+8下午8時34分07秒寫道:

Evania Tsai

unread,
Aug 3, 2015, 11:13:41 AM8/3/15
to Macsim Developer, ramyad...@gmail.com, hye...@cc.gatech.edu
Hi,

I would like to know whether it matters that trace generator aborted because of the size of traces.
Aborted means the code isn't done executing, right?
Would it influence the macsim simulation?  (Each kernel trace indicates different part of code?)

Also, I found out without GPU some programs just linking locelot.so can execute, while also linking locelotTrace.so it would occur Seg fault.
Does trace-generator have to work with GPU?

Anyway, Thanks for help!!!

Best,
Evania
 

Evania Tsai於 2015年7月30日星期四 UTC+8上午11時55分16秒寫道:

Paul sdf

unread,
Aug 17, 2018, 3:42:06 AM8/17/18
to Macsim Developer
Hi, Hyesoon, 

I am new to gpuocelot, and I have installed the gpuocelot in my ubuntu pc. I want to obtain the GPU memory trace information for my kernel.. I have compile the two ocelot and trace-generators folders and get libocelot & libocelotTrace. But I still don't  know how to do next step, especially the configure.ocelot file. For simplicity, I use a simple addition to test. 
nvcc -ccbin /usr/bin/g++-4.4 -arch=sm_20 add.cu -locelot -locelotTrace -o add_trace

 
When I use the configure.ocelot file as follows, 

{
ocelot: "ocelot",
version: "",
trace: {
database: "traces/database.trace",
x86Trace: True,
TRACE_PATH: "/trace",
memoryChecker: {
enabled:             True,
checkInitialization: False
},
raceDetector: {
enabled:                False,
ignoreIrrelevantWrites: False
},
kernelTimer: {
enabled:    True,
outputFile: "kernel-times.json"
},
debugger: {
enabled:      False,
kernelFilter: "",
alwaysAttach: True
}
},
cuda: {
implementation: "CudaRuntime",
tracePath:      "trace/CudaAPI.trace"
},
executive: {
devices:                  ["emulated"],
preferredISA:             "nvidia",
optimizationLevel:        "full",
reconvergenceMechanism:   "ipdom",
defaultDeviceID:          0,
required:                 False,
asynchronousKernelLaunch: True,
port:                     2011,
host:                     "127.0.0.1",
workerThreadLimit:        8,
warpSize:                 32
},
checkpoint: {
enabled:  False,
path:   ".",
prefix: "kernel_trace_",
suffix: ".trace",
verify:   False
},
optimizations: {
subkernelSize:            10000,
simplifyCFG:              True,
structuralTransform:      False,
predicateToSelect:        False,
linearScanAllocation:     False,
mimdThreadScheduling:     False,
syncElimination:          False,
hoistSpecialValues:       False,
enforceLockStepExecution: False
}
}

Paul sdf

unread,
Aug 17, 2018, 3:51:04 AM8/17/18
to Macsim Developer
And ./add_trace. It always says : (0.001687) X86TraceGenerator.cpp:859:  TRACE_PATH not set up, segmentation fault. Even when I don't give any TRACE_PATH, it still has the same error ( TRACE_PATH : directory to store traces (default: current dir). Do you have any suggestion? And do you have any tutorial to brief the steps to obtain GPU trace and analyze it? Thank you very much.


On Wednesday, July 29, 2015 at 8:34:07 PM UTC+8, Hyesoon wrote:
Reply all
Reply to author
Forward
0 new messages