Cannot Boot Up in GEM 5

46 views
Skip to first unread message

Xiyue Xiang

unread,
Apr 17, 2015, 8:53:31 PM4/17/15
to topaz-...@googlegroups.com, Pisacha Srinuan, Su Wei
Hi,

We've already followed the instruction to patch TOPAZ with GEM5. We can successfully run ruby_mem_test.py. However, during full system simulation, we get the following errors during boot up. It happens at the very early stage.

....
warn: Prefetch instructions in Alpha do not do anything
panic: Possible Deadlock detected. Aborting!
version: 0 request.paddr: 0x[0x42bc0, line 0x42bc0] m_readRequestTable: 1 current time: 19500001500 issue_time: 19051037000 difference: 448964500
 @ tick 19500001500
[wakeup:build/ALPHA_token_topaz/mem/ruby/system/Sequencer.cc, line 102]
Memory Usage: 876924 KBytes
Program aborted at tick 19500001500
Aborted

My command line is "./build/ALPHA_token_topaz/gem5.opt ./configs/example/fs.py --ruby --num-cpus=16 --num-dirs=16 --topology=Mesh --mesh-rows=4 --num-l2caches=16 --topaz-network=M44-CT-UC --topaz-flit-size=16 --topaz-adaptive-interface-threshold=0 --topaz-init-file=./TPZSimul.ini".

I saw a similar question in the discussion. He said disable the panic function works for him. However, it doesn't work for us.

Please let us know what could be wrong in this regard.

Thanks.


Valentín Puente

unread,
Apr 19, 2015, 2:59:30 PM4/19/15
to topaz-...@googlegroups.com
Thats a bit awkward. If you can run the tester, fs shouldn't be a problem. Do you have the topaz output after the deadlock? If there is a problem, some packets might stuck there. 

--
Valentin
--
You received this message because you are subscribed to the Google Groups "topaz-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topaz-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Xiyue Xiang

unread,
Apr 19, 2015, 3:05:32 PM4/19/15
to topaz-...@googlegroups.com
Hi, Valentin,

Thanks for the quick response.

Now, I pull a clean copy. I could not even start booting. I didn't make any modification. Just "scons -j16 build/ALPHA_token_topaz/gem5.debug". Seg fault is invoked right after booting. 

I tried different protocols, none of them can boot up the system. 

Valentín Puente

unread,
Apr 19, 2015, 3:11:48 PM4/19/15
to topaz-...@googlegroups.com
But the segfault happens in topaz or gem5 code? Try to use the previous change setfor gem5... 
Perhaps last merge wasn't fully correct. I just checked the mem tester (we use another tree for gem5).
--
Valentin

Xiyue Xiang

unread,
Apr 19, 2015, 3:23:24 PM4/19/15
to topaz-...@googlegroups.com
Seg fault happens in GEM5 code. I pulled through  hg clone https://code.google.com/p/tpzsimul.gem5/ gem5.

I am trying to use the old version of GEM5. BTW, I can run mem test as well. But not full system. 

So I will try to patch topaz with other change set of GEM5 and let you know.

Thanks a lot.

Xiyue Xiang

unread,
Apr 20, 2015, 6:47:27 PM4/20/15
to topaz-...@googlegroups.com
Hi, Valentin,

I've tried at least 10 version of gem5 with different combination of ruby protocols. None of them can boot up successfully. Some of them even raise compilation errors.

Can you please somehow send me a copy of gem5 (with TOPAZ built-in) which you successfully run in FS mode?

Thanks.

Xiyue Xiang

Valentin Puente

unread,
Apr 21, 2015, 1:57:35 AM4/21/15
to topaz-...@googlegroups.com

Did you user mercurial commands to do the rollback? Which change set gives you compilation errors?

Valentin Puente

unread,
Apr 21, 2015, 3:33:45 AM4/21/15
to topaz-...@googlegroups.com
... i mean use " hg update 10374" to go to the previous change set.
--
--
vpuente

Xiyue Xiang

unread,
Apr 21, 2015, 9:37:52 AM4/21/15
to topaz-...@googlegroups.com
Hi, Valentin,

I tried changeset 10374. I keep all setting default. I tried both ALPHA_token_topaz and ALPHA_directory_topaz. None of them works. I've attached the output and the fs.py (given by changeset 10374) for your reference. Thanks.


changeSet10374.txt
fs.py

Valentin Puente

unread,
Apr 21, 2015, 9:42:49 AM4/21/15
to topaz-...@googlegroups.com

That's a segmentation fault.  Use the debugger to see where that is happening.

El 21/4/2015 15:37, "Xiyue Xiang" <anders...@gmail.com> escribió:
Hi, Valentin,

I tried changeset 10374. I keep all setting default. I tried both ALPHA_token_topaz and ALPHA_directory_topaz. None of them works. I've attached the output and the fs.py (given by changeset 10374) for your reference. Thanks.


Xiyue Xiang

unread,
Apr 21, 2015, 9:53:54 AM4/21/15
to topaz-...@googlegroups.com
Sorry, I forgot to attache the output from debugger.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/localhome/xxx1698/TestBuild/gem5/src/python/m5/main.py", line 360, in main
    filecode = compile(filedata, filename, 'exec')
TypeError: compile() expected string without null bytes

Xiyue Xiang

unread,
Apr 21, 2015, 11:07:46 AM4/21/15
to topaz-...@googlegroups.com
Hi, Valentin,

I've looked up the error. This compiler() function will raise error if the source contains null bytes.

I am not sure where to start cracking this issue.


On Tuesday, April 21, 2015 at 8:42:49 AM UTC-5, Valentin Puente wrote:

Valentin Puente

unread,
Apr 21, 2015, 3:14:19 PM4/21/15
to topaz-...@googlegroups.com

This seems to be outside topaz interface. Use gdb to backtrack the problem (and be sure that build dir has no mixed compilation s from different change sets).

Xiyue Xiang

unread,
Apr 23, 2015, 10:47:05 AM4/23/15
to topaz-...@googlegroups.com
I've tried different changesets with different protocol.  Now, I can successfully boot up changeset 73025fb3b272 (pulled from TOPAZ repository) with ALPHA_token_topaz, which use MOESI_CMP_token. I will try other protocol shortly and post the outcomes here.

Please note that if GEM5 assert with a panic message as following, you can double the deadlock threshold in Sequencer.py. The problem should be gone afterward.

"panic: Possible Deadlock detected. Aborting!"
" version: 0 request.paddr: 0x[0x50280, line 0x50280] m_writeRequestTable: 1 current time: 500000000 issue_time: 168    749500 difference: 331250500 @ cycle 26675744"
"[wakeup:build/ALPHA_directory_topaz/mem/ruby/system/Sequencer.cc, line 122]"

For simplicity, for those users who also have this issue, please try changeset 73025fb3b272  and double the deadlock_threshold in Sequencer.py. Good luck.

Xiyue Xiang

unread,
Apr 23, 2015, 4:46:24 PM4/23/15
to topaz-...@googlegroups.com
Hi, Valentin,

Now, I am trying to boot up BLESS in FS mode. I encounter the following error:
gem5.debug: build/ALPHA_token_topaz/mem/ruby/system/PersistentTable.cc:97: void PersistentTable::persistentRequestUnlock(const Address&, MachineID): Assertion `m_map.count(address)' failed.

Do you have any idea of this problem? What's the function of this "persistent table"? Is it related to some license issue? Thanks.

Xiyue Xiang

Pablo Abad

unread,
Apr 24, 2015, 2:40:53 AM4/24/15
to topaz-...@googlegroups.com
If I am not wrong, token protocol requires some communications point-to-point ordered (persistent request activations/deactivations). That message seems to indicate that a deactivation has arrived to destination out of order.
 Bless router does not guarantee this order, due to misrouted messages. You can make use of an additional physical network for ordered messages, or change your router to a conventional one (CT-DOR).

Regards.

Valentin Puente

unread,
Apr 24, 2015, 10:35:37 AM4/24/15
to topaz-...@googlegroups.com
Probably Token isn't the right place to start. There is a lot of subtle implications in the network side. Order delivering is one but other is the previous problem you had. Token enters in a persistent request storm when the network contention is high. That is not seem with contention-less networks (such as the original one), Performance will be low in some workloads unless you fine tune carefully persisten request (which usually is not possible for all usage scenarios), 

Bottom line ... use other protocol :-)
--
vpuente
Reply all
Reply to author
Forward
0 new messages