Regarding benchmark simulation error

269 views
Skip to first unread message

Abdullah Owahid

unread,
Jan 26, 2017, 11:45:28 PM1/26/17
to riscv-boom
Hi,
I have compiled a simple hello program in riscv64-unknown-elf-gcc and ran it on ISA simulator using spike pk hello.elf and it is all good. Later I ran it on Verilator simulator by replacing spike with ./emulator-rocketchip-BOOMConfig +verbose pk ../owahid_tests/hello.elf and with debug ./emulator-rocketchip-BOOMConfig-debug as well- both worked fine.

Later I downloaded MiBench benchmark programs and for basicmath_small program I generated the elf file. The ISA Spike simulator also works fine. But when I ran
./emulator-rocketchip-BOOMConfig +verbose pk basicmath_small.elf
It is giving me following error after running for very long time

   1437622; O3PipeView:retire:          2295392000:store: 0
   1437619; O3PipeView:complete:          2295392000
   1437643; O3PipeView:complete:          2295392000
Assertion failed: [ROB] writeback occurred to an invalid ROB entry.
    at rob.scala:553 assert (!(io.wb_resps(i).valid && MatchBank(GetBankIdx(rob_idx)) &&
[2295392] %Error: rocketchip.BOOMConfig.v:671096: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.rob
%Error: /home/owahid/boom-rocket/rocket-chip/emulator/generated-src/rocketchip.BOOMConfig.v:671096: Verilog $stop
Aborting...

To check if everything is ok, I re-ran Verilator and Spike for hello program and it just did fine. I re-checked spike pk basicmath_small.elf and it is working fine too. This benchmark performs lots of integer operations. I remember once I got "pipeline hung" for this same benchmark (not quite sure but I think it is) which is a different error. Either way I did not see ...completed after that many cycle...so far.

Any help would be appreciated.

Thanks,
owahid

Christopher Celio

unread,
Jan 26, 2017, 11:54:31 PM1/26/17
to riscv-boom
If you are seeing an assertion failure, then you are hitting a bug in the processor.

Are you using the latest commit on the boom branch of rocket-chip? 

Are you using the default parameters to boom? 

What are your rocket-chip and boom commit hashes? 

How many cycles does your program run before crashing?

You can enable COMMIT_LOG in consts.scala to have boom spit out a list of committed instructions and see where it crashes out. The easiest way to debug this is to build a debug version with VCD dumping and the commit log (which you can match up against spike's commit log). The assert message tells you what signals went bad, and using the VCD waveform you can walk back from the failure and see what happened.


-Chris

Abdullah Owahid

unread,
Jan 28, 2017, 4:41:16 PM1/28/17
to riscv-boom
Hi Chris,
Here are the details..
Are you using the latest commit on the boom branch of rocket-chip? Yes, I even had a clean install on on rocket chip, boom and riscv toolchain just yesterday.

Are you using the default parameters to boom? Yes

What are your rocket-chip and boom commit hashes?
owahid@owahid-Satellite-E45W-C:~/boom-rocket/rocket-chip$ git log
Rocket chip:
commit 2fcf18cb0b8de39389a9598cfcda4280eff30a3f
Author: Christopher Celio <ce...@eecs.berkeley.edu>
Date:   Sat Nov 19 20:31:05 2016 -0800

    bump boom; now with 100% more working TAGE.

Boom:
owahid@owahid-Satellite-E45W-C:~/boom-rocket/rocket-chip/boom$ git log
commit 84ecff7734364858ecfcce5b8589276710a4c335
Author: Christopher Celio <ce...@eecs.berkeley.edu>
Date:   Sat Nov 19 15:24:14 2016 -0800

    [tage] bugfix counter update.
   
       - We were not properly casting between Bits and Vec (missing toBools).
       - Counters were being updated with garbage as a result.

How many cycles does your program run before crashing? I believe it is somewhere after 2297189000 cycles.
 1477466; O3PipeView:complete:          2297189000

Assertion failed: [ROB] writeback occurred to an invalid ROB entry.
    at rob.scala:553 assert (!(io.wb_resps(i).valid && MatchBank(GetBankIdx(rob_idx)) &&
[2297189] %Error: rocketchip.BOOMConfig.v:671096: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.rob
%Error: /home/owahid/boom-rocket/rocket-chip/emulator/generated-src-debug/rocketchip.BOOMConfig.v:671096: Verilog $stop
Aborting...


You can enable COMMIT_LOG in consts.scala to have boom spit out a list of committed instructions and see where it crashes out. The easiest way to debug this is to build a debug version with VCD dumping and the commit log (which you can match up against spike's commit log). The assert message tells you what signals went bad, and using the VCD waveform you can walk back from the failure and see what happened.

I enabled commit log in boom/src/main/scala/consts.scala
val COMMIT_LOG_PRINTF   = true
After that I ran ./build.sh in riscv-tools which passed.
But if I try to do make run CONFIG=BOOMConfig in emulator directory I am having following error which is very similar to Olga's error. Do I need to change other parameters at const.scala as well?
[info] Running rocketchip.Generator /home/owahid/boom-rocket/rocket-chip/emulator/generated-src rocketchip TestHarness boom BOOMConfig
[info] [0.003] Elaborating design...
[error] (run-main-0) java.lang.reflect.InvocationTargetException
java.lang.reflect.InvocationTargetException
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

Thanks,
owahid



Christopher Celio

unread,
Jan 28, 2017, 7:14:22 PM1/28/17
to riscv-boom
How many cycles does your program run before crashing? I believe it is somewhere after 2297189000 cycles.
 1477466; O3PipeView:complete:          2297189000

O3PipeView is a bit weird, as the gem5 tool has a weird sense of timing, so there's a 1000x multiplier when O3PipeView is enabled (https://github.com/ucb-bar/riscv-boom/blob/master/src/main/scala/consts.scala#L27).

So 2,297,189 cycles is long, but thankfully not painfully long! Is it possible for you to send me your binary? If so, I can take a look at it and see if I can replicate your error.
 
Assertion failed: [ROB] writeback occurred to an invalid ROB entry.
    at rob.scala:553 assert (!(io.wb_resps(i).valid && MatchBank(GetBankIdx(rob_idx)) &&
[2297189] %Error: rocketchip.BOOMConfig.v:671096: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.rob
%Error: /home/owahid/boom-rocket/rocket-chip/emulator/generated-src-debug/rocketchip.BOOMConfig.v:671096: Verilog $stop
Aborting...

That error means that some instruction is trying to un-busy an ROB entry, but there's no valid instruction in that entry! You can try and comment out the assertion, and see if your program works. It's possible that this assertion is being overly strict (say, it's a pipeline flush scenario and so the pipeline is being redirected anyways).


Do I need to change other parameters at const.scala as well?

:( you shouldn't get an error turning on the commit log. Maybe turn off PipeView when turning on the commit log?

Christopher Celio

unread,
Jan 28, 2017, 7:21:26 PM1/28/17
to riscv-boom
Ah, you didn't show me the full error message when you turned on commit-printf = true.

If you look further, it tells you the problem:

boom.WithAllBooms$$anonfun$$lessinit$greater$1$$anonfun$apply$5$$anonfun$apply$6.apply(boom_configs.scala:46)
Caused by: java.lang.IllegalArgumentException: requirement failed
at scala.Predef$.require(Predef.scala:207)
at boom.constants.BOOMDebugConstants$class.$init$(consts.scala:34)


You can just comment out that line of code. But it will make pipeview-helper.py angry if you try to actually use PipeView from the *.out file with the intermixed commit log messages.

-Chris 

Abdullah Owahid

unread,
Jan 30, 2017, 2:12:56 AM1/30/17
to riscv-boom
Hi Chris,
Thank you for your reply.
Yes, you are right, at consts.scala O3PIPEVIEW_PRINTF cannot be true at the same time when I turn on COMMIT_LOG_PRINTF. So I turned of O3PIPEVIEw_PRINTF and make CONFIG=BOOMConfig and make debug was ok then.
Comparing spike -l pk basicmath_small.elf with ./emulator-rocketchip-BOOMConfig +verbose pk ~/Desktop/owahid_tests/basicmath_small.elf shows the following

ISA passes that point 0x00000000000107ac
core   0: 0x0000000000010798 (0xf7843707) fld     fa4, -136(s0)
core   0: 0x000000000001079c (0xf8843787) fld     fa5, -120(s0)
core   0: 0x00000000000107a0 (0x1af777d3) fdiv.d  fa5, fa4, fa5
core   0: 0x00000000000107a4 (0xfcf43027) fsd     fa5, -64(s0)
core   0: 0x00000000000107a8 (0xf7043707) fld     fa4, -144(s0)
core   0: 0x00000000000107ac (0xf8843787) fld     fa5, -120(s0)
core   0: 0x00000000000107b0 (0x1af777d3) fdiv.d  fa5, fa4, fa5
core   0: 0x00000000000107b4 (0xfaf43c27) fsd     fa5, -72(s0)
core   0: 0x00000000000107b8 (0xfc843707) fld     fa4, -56(s0)



Verilator
0 0x0000000000010798 (0xf7843707) f14 0x4040000000000000
0 0x000000000001079c (0xf8843787) f15 0x3ff0000000000000
0 0x00000000000107a0 (0x1af777d3) f15 0x4040000000000000
0 0x00000000000107a4 (0xfcf43027)
0 0x00000000000107a8 (0xf7043707) f14 0xc03e000000000000
0 0x00000000000107ac (0xf8843787) f15 0x3ff0000000000000

Assertion failed: [ROB] writeback occurred to an invalid ROB entry.
    at rob.scala:553 assert (!(io.wb_resps(i).valid && MatchBank(GetBankIdx(rob_idx)) &&
[2267615] %Error: rocketchip.BOOMConfig.v:662496: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.rob
%Error: /home/owahid/boom-rocket/rocket-chip/emulator/generated-src/rocketchip.BOOMConfig.v:662496: Verilog $stop

Also as you said; now if I turn off the assertion at rob.scala then Verilator seems to be passing as well and it matches with SPIKE simulation.
0 0x0000000000010798 (0xf7843707) f14 0x4040000000000000
0 0x000000000001079c (0xf8843787) f15 0x3ff0000000000000
0 0x00000000000107a0 (0x1af777d3) f15 0x4040000000000000
0 0x00000000000107a4 (0xfcf43027)
0 0x00000000000107a8 (0xf7043707) f14 0xc03e000000000000
0 0x00000000000107ac (0xf8843787) f15 0x3ff0000000000000
0 0x00000000000107b0 (0x1af777d3) f15 0xc03e000000000000
0 0x00000000000107b4 (0xfaf43c27)
0 0x00000000000107b8 (0xfc843707) f14 0xc025000000000000
0 0x00000000000107bc (0xfc843787) f15 0xc025000000000000
0 0x00000000000107c0 (0x12f77753) f14 0x405b900000000000
0 0x00000000000107c4 (0xfc043687) f13 0x4040000000000000

If you want to replicate the error (in default boom settings) you might look into the following link, -> source code -> auto*tar.gz -> basicmath -> basicmath_small
http://vhosts.eecs.umich.edu/mibench// that provides free MiBench benchmarks.
According to their paper, basicmath_small performs 65 million instruction and 60% of them are integer. Just FYI; I could not open the dumped commit log output file in Vi editor as it is 9.8G and it takes forever to open. I used meld to open (and differentiate where ISA/Verilator mismatches) it and searched for that sequence in the partial file, if I scroll it seems to have no end for a 9.8G txt file.
Thanks,
owahid

Abdullah Owahid

unread,
Feb 21, 2017, 12:59:17 PM2/21/17
to riscv-boom
Hi Chris,
Is there any way to turn on cycle information in spike? Or how many total cycle was used in spike simulation?
spike -l pk shows commit log, not cycle information that I am lookin for.

Also if I turn on commit log in verilator then cycle information seems to go away (but o3 pipeview has the cycle information). Can I have commit log and cycle information both at the same time in verilator?

Thanks,
owahid

Christopher Celio

unread,
Feb 21, 2017, 2:17:02 PM2/21/17
to riscv-boom
I could've sworn it was "spike -s", but I may be making that up and perhaps spike never had that ability to count cycles? It would be fairly trivial to add it to your own copy of spike, or to line-count the commit log ("wc -l *.out"). They might even accept a patch if you do this. 

However, if you are executing programs using the proxy-kernel, it will count the number of instructions (and cycles) and report it for you:

spike pk -s hello.riscv

When I run programs in Linux, I wrote my own user-program to count cycles for me (start it on boot up, kill it before shutdown).
 

Also if I turn on commit log in verilator then cycle information seems to go away (but o3 pipeview has the cycle information). Can I have commit log and cycle information both at the same time in verilator?

What "cycle information" specifically are you looking at? If I understand your question correctly, the "instret=..." stuff is being printed not by the test harness, but by BOOM itself executing printf code. I believe the information is still there, but it gets lost in the output file because it's being printed one character at a time. 

The real flaw here, I believe, is that Chisel's printfs can only ever print to stdout, which mixes it with the other output that's going on.  Now, you can try and play some tricks here with redirecting pipes, or going in and modifying either Chisel or the riscv-tests/benchmarks to output to stderr so you can differentiate the two. I myself will sometimes add special characters like "&&&" in front of some of my Chisel printfs so I can post-process my output files, but that only works for sorting Chisel printfs from each other, whereas Chisel printfs and printfs from the emulation itself will intermingle and you can't sort those apart.


-Chris

Abdullah Owahid

unread,
Feb 22, 2017, 1:52:24 AM2/22/17
to riscv-boom
Hi Chris,
Thank you for your help. As you said, spike pk -s ~/Desktop/owahid_tests/hello.elf worked and it prints the following
Hello world!
1850 ticks
58251 cycles
58250 instructions
1.00 CPI


More interesting is that If I do
./emulator-rocketchip-BOOMConfig +verbose pk -s ~/Desktop/owahid_tests/hello.elf 1>file1 2>file2
Then file 2 contains
......................
.....................
   1300658; store-comp:           1915115000
Completed after 1915123 cycles (this is the same as without -s option in verilator)

Whereas file 1 contains (result generated due to -s option)
Hello world!
6663 ticks
666253 cycles
370412 instructions
1.70 CPI

So the question is why does cycle count varies with/without -s option. Does -s option count cycles for program only whereas without -s option is for total program that includes booting?

Also -s option in spike pk is varying with -s option in verilator, any suggestion?

I am not sure if wc -l of output file in commit log is good idea, as i do not see any relation here with information printed with/without -s option.

How can I understand when boot is complete or shutdown started so I can try to get cycle count?

Thanks,
owahid

Christopher Celio

unread,
Feb 22, 2017, 2:59:30 AM2/22/17
to riscv-boom
"pk -s" is telling the proxy-kernel, which is a RISC-V program, to count the delta in cycle and instret counters. From inside the program. Naturally, that will count less time than the total simulation time, which includes loading the pk binary into the boom memory space, etc. Ideally, "pk -s" only counts the workload that is being executed on top of the pk. 

Also, "spike" is simply counting 1 instret takes 1 cycle, which will dramatically differ from BOOM, where instructions take multiple cycles to execute.



-Chris

Abdullah Owahid

unread,
Feb 22, 2017, 2:43:36 PM2/22/17
to riscv-boom
H Chris,
Thank you very much for your reply. One more thing I would like to ask, which pattern do I need look for (in verilator commit log) to see work load is now loaded or exiting after it is done.
Thanks,
owahid

Abdullah Owahid

unread,
Feb 22, 2017, 2:51:59 PM2/22/17
to riscv-boom
As well as in spike -l commit log?
Thanks,
owahid

Christopher Celio

unread,
Feb 22, 2017, 7:36:46 PM2/22/17
to riscv-boom
There are a couple of signs -- the processor's privilege mode, the address of the instructions, or even looking for a particular instruction (like rdcycle) which won't show up outside of your workload.

Abdullah Owahid

unread,
Mar 13, 2017, 2:57:07 PM3/13/17
to riscv-boom
Hi Chris,
Counting lines from some specific traps in commit log, I was able to get close enough CPI in Spike simulation that matches ./emu*  pk -s *.elf. Thanks for the hint.
Going back to the original problem, I was able to run basicmath.elf progrm after commenting two assertions. In case you want to replicate the error, commented assertions are given below. Just on the footnote, basicmath takes 3-4 days to run in ./emulator* and basicmath.elf is attached.

Assertion failed: [fdiv] Buffered output being overwritten by another output from the fdiv/fsqrt unit.
    at fdiv.scala:186 assert (!(r_out_val && (divsqrt.io.outValid_div || divsqrt.io.outValid_sqrt)),
[30135773] %Error: rocketchip.BOOMConfig.v:92055: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.ALUExeUnit_1_1.fdivsqrt
%Error: /home/owahid/boom-rocket2/rocket-chip/emulator/generated-src/rocketchip.BOOMConfig.v:92055: Verilog $stop
Aborting...



Assertion failed: [ROB] writeback occurred to an invalid ROB entry.
    at rob.scala:553 assert (!(io.wb_resps(i).valid && MatchBank(GetBankIdx(rob_idx)) &&
[2297189] %Error: rocketchip.BOOMConfig.v:671096: Assertion failed in TOP.TestHarness.ExampleTop_1.coreplex.BOOMTile_1.core.rob
%Error: /home/owahid/boom-rocket/rocket-chip/emulator/generated-src-debug/rocketchip.BOOMConfig.v:671096: Verilog $stop
Aborting...


Thanks,
owahid
basicmath_small.elf
Reply all
Reply to author
Forward
Message has been deleted
0 new messages