Problem running rodinia/bfs benchmark

578 views
Skip to first unread message

"Stamatis Kavvadias (Σταμάτης Καββαδιας)"

unread,
Mar 1, 2015, 4:04:43 PM3/1/15
to gem5-gpu developers
Hello all,

I have been trying to run Rodinia BFS benchmark, but there is an unexpected
error that is/seems irrelevant with gem5-gpu. Here is the m5_term output I get:


==== m5 slave terminal: Terminal 0 ====
Loading new script...
gem5 + GPGPU-Sim CUDA RT: __cudaRegisterFatBinary2(*fatCubin = 0x5017a0, size = 6752)
gem5 + GPGPU-Sim CUDA RT: Touching parts/pages of the binary...
gem5 + GPGPU-Sim CUDA RT: magic: 518347265
gem5 + GPGPU-Sim CUDA RT: ident: bfs.cu
gem5 + GPGPU-Sim CUDAserial8250: too much work for irq4
 RT: elf: ELF 3
gem5 + GPGPU-Sim CUDA RT: ptx[0] code hash = 19
gem5 + GPGPU-Sim CUDA RT: ptx[0]->gpuProfileName: compute_20
Reading File
gem5_fusion_bfs[976]: segfault at 0 ip 0000000000000000 sp 00007fff6cacad48 error 14 in gem5_fusion_bfs[400000+ff000]
/tmp/runscript: line 9:   976 Segmentation fault      ./gem5_fusion_bfs



I have tried to locate the problem and it seems comes from the use of the function ceil(..) in line 99 of the benchmark.
I also observed that ceil is a symbol defined with the -u option for linking. I tried moving -u options in the link command
by modifying included makefile code in common.mk. Specifically, I have moved the location of $(LDUFLAGS) from the
end of the command to before all library specs, or even before the source file name (i.e., before $(SIM_OBJS)), with no
change in behavior.

Running the default makefile does not use -u options or the -lm -lc (and -lm5op_x86) options and runs fine on real HW.

I am using g++-4.6 and nvcc release 3.2, V0.2.1221.


Any ideas what is wrong here?

Thanks,

Stamatis

-- 


Stamatis Kavvadias, PhD

Research Associate
TEI of Crete, Greece

Konstantinos Koukos

unread,
Mar 1, 2015, 4:58:34 PM3/1/15
to gem5-g...@googlegroups.com
Hello Stamati,

I had various problems with bfs in the past. I would advise you to compile the application
using gcc 4.4 (in my case works fine). Well, there were some issues when i was trying to
model a realistic GPU MMU with TLB translation but this is another story.

Best Regards,
Konstantinos.

Joel Hestness

unread,
Mar 1, 2015, 6:36:26 PM3/1/15
to Konstantinos Koukos, gem5-gpu developers
Hi Stamatis,
  This is definitely a problem with linking the benchmark. Hopefully, I can give some pointers, but let me clarify some things first:
   1) You must be running in full-system mode, since you're seeing IRQs and the kernel prints the segmentation fault.
   2) I'll need to assume that you are using the common.mk version that is included with the gem5-gpu benchmark repo. This would mean that the benchmark is statically linked, which is important.
   3) The segfault indicates that the CPU code is trying to execute instruction pointer (PC) 0x0, which is obviously an invalid instruction address.

  Alright, in addition to Konstantinos' recommendation of using gcc-4.4 (which we also recommend), there are a few things you can try here:
   1) For those taking notes, if you run this benchmark in SE mode, you will likely run into a panic: "Tried to execute unmapped address 0x0". SE mode may be quicker for tracking down such bugs.

   2) Look at the benchmark's CPU assembly code with "objdump -d -C <binary>". I suspect that you will find something like
         "4002f0:     e8 0b fd bf ff          callq  0 <__libc_tsd_LOCALE>"
    near where the ceil() function should be called. This is an indicator that the linker didn't properly set up the linkage for the ceil() function. It basically just gave up and used 0x0 as the function address. You can check the binary's symbol table to establish this. You should see something like the following:

 $ objdump -t <binary> | grep UND
 ...
 0000000000000000  w      *UND* 0000000000000000 ceil
 ...

  Use this to your advantage to avoid re-running the benchmark each time you try to recompile to get the linking right.

  The problem here is that gcc (all versions) confuses cmath/math.h functions with GPU math functions. GPU functions are defined CUDA include files, and they are declared as __device__ __host__, which means they can be executed on either the GPU or the CPU. Unfortunately, when compiling statically, gcc cannot decide whether to use the cmath/math.h, so it chooses the worst possible option: neither.

  To fix this you'll need to play around with unmapping functions from the symbol table that cause the linker conflict. One route is to use the '-u <func>' compiler option to specifically define the function to be unmapped. Sometimes this works, because gcc will choose the CUDA version to unmap, leaving the cmath/math.h version for the CPU to use. Other times (as I believe is the case for you), it will unmap the cmath/math.h version, but not use the CUDA version. You can try reordering the include files in the benchmark source, reordering '-I' compiler options when calling gcc, or reordering the compilation steps to get gcc to select a version (your mileage may vary).
  The root problem is that both versions of ceil are globally defined for your benchmark. Another option to fix this is by modifying the benchmark to include the cmath/math.h library inside of a namespace. Then, where ceil is called by CPU code, add the namespace ahead of the call. For example:

...
// At head of file, include math:
namespace cmath {
    #include <cmath>
}
...
cmath::ceil(<params>)
...


  Hope this helps,
  Joel

--
  Joel Hestness
  PhD Candidate, Computer Architecture
  Dept. of Computer Science, University of Wisconsin - Madison
  http://pages.cs.wisc.edu/~hestness/

"Stamatis Kavvadias (Σταμάτης Καββαδίας)"

unread,
Mar 2, 2015, 11:17:28 AM3/2/15
to gem5-g...@googlegroups.com
Hi Joel, Konstantinos and all,

I have tried your advice and nothing works. Of course, as mentioned before, I am using the common.mk makefile (I did try reverting to the original, also, before writing this e-mail); and I am using full-system mode but, now, I avoid running and use objdump.

objdump shows a line
  40196d:    e8 8e e6 bf ff           callq  0 <__preinit_array_end>
for the offending call to ceil; I have surrounded it with two invocations of gem5_gpu 'magic' instructions, so that it is easy to locate it.

I have turned to g++-4.4.
Tried changing several compilation flag orderings with no effect. I cannot change the order of
compilation steps, since there is only a single .o file produced and then linked (2 steps). Tried removing the -u options and/or -lm; no progress. I have tried adding the -u options to the .o compilation step; no change.

I have searched the executable produced with objdump, as Joel advised, to see 30+ undefined symbols, most of which appear to be from pthreads. In addition, all the the symbols, mentioned with -u options (pow, log2, log1p, remquo, exp, sin, exp2, cos, floor, ceil, sqrt, log) are *not* present anywhere in the benchmark code, with the exception of ceil, but they appear as undefined in objdump output.

I have also tried searching, with nm, all libraries mention in the command line and others that may be implicitly used for the symbol ceil. I tried:

nm /usr/lib/x86_64-linux-gnu/libm.a --print-armap  --demangle | grep ceil
nm
/usr/lib/x86_64-linux-gnu/libc.a  --demangle --print-armap | grep ceil
nm
/usr/lib/x86_64-linux-gnu/libz.a  --demangle --print-armap | grep ceil
nm <path-to>/NVIDIA_GPU_Computing_SDK/C/lib/libcutil_x86_64.a  --demangle --print-armap | grep ceil
nm
<path-to>/GPGPU-Benchmarks/libcuda/libcuda.a --demangle --print-armap | grep ceil
nm
<path-to>/GPGPU-Benchmarks/libcuda/libm5op_x86_64.a --demangle --print-armap | grep ceil
nm /usr/lib/gcc/x86_64-linux-gnu/4.4/libstdc++.a --demangle --print-armap | grep ceil
nm /usr/lib/gcc/x86_64-linux-gnu/4.4/libgcc.a --demangle --print-armap | grep ceil

Only the first and last return some output with ceil and the output of the last has only symbols containing the word ceil, like __bid64_to_int32_ceil, but not ceil itself.

I run the link command with 'g++-4.4 -v' to get the final link command use, which is:

 /usr/lib/gcc/x86_64-linux-gnu/4.4.7/collect2 --build-id -m elf_x86_64 --hash-style=gnu -static -o gem5_fusion_bfs -z relro -u pow -u log2 -u log1p -u remquo -u exp -u sin -u exp2 -u cos -u floor -u ceil -u sqrt -u log /usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../../x86_64-linux-gnu/crt1.o /usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../../x86_64-linux-gnu/crti.o /usr/lib/gcc/x86_64-linux-gnu/4.4.7/crtbeginT.o -L../../libcuda -L<path-to>/NVIDIA_GPU_Computing_SDK/C/lib -L/usr/lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.7 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.7 -L/usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../../x86_64-linux-gnu -L/usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../../../lib -L/lib/x86_64-linux-gnu -L/lib/../lib -L/usr/lib/x86_64-linux-gnu -L/usr/lib/../lib -L/usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../.. bfs.cu_o -lcuda -lz -lcutil_x86_64 -lm5op_x86 -lstdc++ -lm -lc --start-group -lgcc -lgcc_eh -lc --end-group /usr/lib/gcc/x86_64-linux-gnu/4.4.7/crtend.o /usr/lib/gcc/x86_64-linux-gnu/4.4.7/../../../x86_64-linux-gnu/crtn.o

I have searched the rest of the .o files and libraries for the ceil symbol. It does not appear anywhere.
It is only in /usr/lib/libm.a

Also tried using #include <cmath> instead of #include <math.h> and replacing the reference to ceil with reference to std::ceil. Same problem --no change.

I have put some effort in modifying bfs benchmark
and would like to use it (just moved things around to separate tasks that could run either on CPU or on GPU, with no real change in the version I am compiling for the above, which only has the GPU-case code). I have also tried the original benchmark. Same thing:
  4017df:    e8 1c e8 bf ff           callq  0 <__preinit_array_end>
where the ceil invocation would appear.

Any other ideas? What can be going wrong here? In my understanding, this is not reasonable behavior by g++ linker. It should complain/warn about the symbol and not call NULL, with no message.

Thanks in advance,

Stamatis


"Stamatis Kavvadias (Σταμάτης Καββαδίας)"

unread,
Mar 2, 2015, 4:28:05 PM3/2/15
to gem5-g...@googlegroups.com
Hi all,

I have managed to reproduce the bug with a minimal program:

#include <stdlib.h>
#include <stdio.h>
#include <math.h>
#define MAX_THREADS_PER_BLOCK 512
int no_of_nodes;
int
main()
{
    int num_of_blocks = 1;
    if ( no_of_nodes < MAX_THREADS_PER_BLOCK ) {
        num_of_blocks = (int) ceil( no_of_nodes/(double)MAX_THREADS_PER_BLOCK );
        printf( "num_of_blocks=%i\n", num_of_blocks );
    }
    return 0;
}

I use a version of libcuda that does not have the gem5_gpu calls. This program, when linked statically produces the bug:

nvcc -c -arch sm_20 --keep --compiler-options -fno-strict-aliasing -O3 test.cu -o test.intermediate
python /spare/gem5-work/benchmarks/GPGPU-Benchmarks/common/sizeHack.py -f test.cu.cpp -t sm_20
g++-4.4 -g -c test.cu.cpp -o test.cu_o
g++-4.4 -g -static test.cu_o -L/spare/gem5-work/benchmarks/GPGPU-Benchmarks/libcuda -lcuda -o test


Removing the -static removes the problem. (I have also tried removing the -g and/or with the -u options)

I checked (with g++-4.4 -v) what differences there are in the link command. The differences are:

Static                                                Dynamic
------                                                -------
n/a                                                   --eh-frame-hdr

-static                                               -dynamic-linker
n/a                                                   /lib64/ld-linux-x86-64.so.2

/usr/lib/gcc/x86_64-linux-gnu/4.4.7/crtbeginT.o       /usr/lib/gcc/x86_64-linux-gnu/4.4.7/crtbegin.o

--start-group                                         -lgcc_s
-lgcc                                                 -lgcc
-lgcc_eh                                              -lc
-lc                                                   -lgcc_s
--end-group                                           -lgcc
The rest of collect2 options are the same.

/usr/lib/gcc/x86_64-linux-gnu/4.4/libgcc_s.so,
/lib64/ld-linux-x86-64.so.2 and /usr/lib/gcc/x86_64-linux-gnu/4.4.7/crtbegin.o do not have the ceil symbol. I cannot easily find documentation for the --eh-frame-hdr flag, so I quit.

Since other people seem to have been able to run this benchmark, I conclude there must be something broken with static linking on my system (Kubuntu 12.04).

Thanks anyway,

Stamatis


Jason Power

unread,
Mar 2, 2015, 4:33:37 PM3/2/15
to "Stamatis Kavvadias (Σταμάτης Καββαδίας)", gem5-g...@googlegroups.com
Hi Stamatis, 

Just to double check, you've specifically tried the following?
g++-4.4 -g -static test.cu_o -L/spare/gem5-work/benchmarks/GPGPU-Benchmarks/libcuda -lcuda -o test -u ceil
or
g++-4.4 -g -static test.cu_o -L/spare/gem5-work/benchmarks/GPGPU-Benchmarks/libcuda -lcuda -o test -Wl,-u,ceil

If so, I would tend to agree with you that's it's a bug in the version of gcc that you have.

Jason

Konstantinos Koukos

unread,
Mar 2, 2015, 4:57:39 PM3/2/15
to gem5-g...@googlegroups.com, kava...@cs.teicrete.gr
Hello,

Stamati i don't know if this may help you. I am also using gcc 4.4.7 in Kubuntu 12.04 and CUDA 3.2
I just had a look in my Makefile and found the following flags:

LDUFLAGS        := -u pow -u log2 -u log1p -u remquo -u exp -u sin -u exp2 -u cos -u floor -u ceil -u sqrt -u log

Please give them a try and let me know if it works. If it doesn't works i can easily give you my source
files and makefiles to search further (diff source code etc).
Please let me know if that would help.

Best Regards,
Konstantinos.

"Stamatis Kavvadias (Σταμάτης Καββαδίας)"

unread,
Mar 2, 2015, 4:59:58 PM3/2/15
to Jason Power, gem5-g...@googlegroups.com
Hi Jason,

    I was using all the -u options or none. Now that you asked, I tried both your alternatives and the result is the same. But I do not think it is the compiler. I also tried the last option with g++-4.6 (which must be using different versions of the static libraries) and the result remains the same (I left libcuda built with g++-4.4). So, it must be something in the distribution, or in my setup of installed software, though, I cannot think how it could affect static libraries in two compiler versions...

Anyway, thank you for the effort to help.

Regards,

Stamatis


"Stamatis Kavvadias (Σταμάτης Καββαδίας)"

unread,
Mar 2, 2015, 5:08:30 PM3/2/15
to gem5-g...@googlegroups.com
Thanks Konstantinos,

    I have been using those flags (or removing them to check). It seems that we have the same system configuration, but ... I do not think it is the benchmark
files or make configuration (I have reverted to the original except for libcuda --I should check that too). Thanks for the offer.

Regerds,

Stamatis


Joel Hestness

unread,
Mar 2, 2015, 5:20:06 PM3/2/15
to gem5-gpu developers
Hi Stamatis,
  Given all the trouble you've encountered trying to fix the bug, it seems like a reasonable approach might be to side-step it. Have you tried my alternative suggestion, and if so, did this also fail?

  The root problem is that both versions of ceil are globally defined for your benchmark. Another option to fix this is by modifying the benchmark to include the cmath/math.h library inside of a namespace. Then, where ceil is called by CPU code, add the namespace ahead of the call. For example:
...
// At head of file, include math:
namespace cmath {
    #include <cmath>
}
...
cmath::ceil(<params>)
...

  If this still fails, I would strongly suspect that there is something wrong with your libc installation rather than something wrong with the linker.

"Stamatis Kavvadias (Σταμάτης Καββαδίας)"

unread,
Mar 2, 2015, 5:54:51 PM3/2/15
to gem5-g...@googlegroups.com
Hi Joel,

    It was one of the first things I tried. You must have lost it in the details of the previous e-mails. I thought this would be the most certain approach, but it did not work. I tried as you write it, using cmath instead of math.h, but there was some error (":: must come after class or namespece" or something like that). Then I saw somewhere the syntax std::ceil and used that instead. This compiled fine. I was ready to go on with my life (:-) ) but... Nothing changed. Same behavior as with math.h.

    Now, I have also reverted to the original libcuda code/Makefile and my small program still produces the error with -static.

Anyway, thank you all the hints and the willingness to help. I'll have to quickly go back to the other benchmarks and replace bfs. I will try to run experiments
also on another machine (CentOS), so I will give bfs another try at that point.

Regards,

Stamatis


731533730

unread,
Oct 15, 2015, 12:17:25 PM10/15/15
to gem5-gpu Developers List

hello
    Stamatis Kavvadias
   I encounter a question,
when gem5-gpu process bfs file,the gem5-gpu read a about 64MB  graph1MW_6.txt  need a long,long time , this situation is right?
if this situation is not right, how should I solve it?
(if gem5-gpu is running a short graph4096.txt ,the reading time is short and the resut can output,)
 

this is comand
suixiaojin@suixiaojin:~/gem5$ build/X86_VI_hammer_GPU/gem5.opt ../gem5-gpu/configs/se_fusion.py -c ../benchmarks/rodinia/bfs/gem5_fusion_bfs -o "/home/suixiaojin/benchmarks/data/bfs/graph1MW_6.txt"


                                                                  thank you
                                                                               xiaojin sui
在 2015年3月3日星期二 UTC+8上午6:54:51,Stamatis Kavvadias写道:

Joel Hestness

unread,
Oct 16, 2015, 10:06:13 AM10/16/15
to 731533730, gem5-gpu Developers List
Hi Xioajin,
  Yes, different graph sizes are going to take different amounts of simulation time to read. In most cases, the simulation time for the system reading a file is roughly linear in the file size. If you know how quickly you can simulate the file read for graph4096.txt, you should be able to estimate the simulation time required to read the larger file based on their sizes.

  Joel


Reply all
Reply to author
Forward
0 new messages