panic: Possible Deadlock detected. Aborting!

381 views
Skip to first unread message

Liang Wang

unread,
Aug 26, 2014, 4:20:27 AM8/26/14
to topaz-...@googlegroups.com
Hi, everyone,

I'm trying to use your TOPAZ simulator for research. But I have a problem when running some benchmarks in the full system mode with gem5 simulator.
This is my command,

    ./build/ALPHA_token_topaz/gem5.opt  ./configs/example/ruby_fs.py --topology=Mesh --mesh-rows=2 --num-cpus=4 --num-dirs=4 --num-l2caches=4 --topaz-network=M44-CT-MC --topaz-init-file="./TPZSimul.ini" --script=./run/blackscholes_4c_simtest.rcS


But the system aborts due to deadlocks in Ruby system. This is the error information.


   **** REAL SIMULATION ****
info: Entering event queue @ 0.  Starting simulation...
info: Launching CPU 1 @ 749805500
info: Launching CPU 2 @ 760943000
info: Launching CPU 3 @ 772080500
warn: Prefetch instructions in Alpha do not do anything
3072443500: system.terminal: attach terminal 0
warn: Prefetch instructions in Alpha do not do anything
panic: Possible Deadlock detected. Aborting!
version: 3 request.paddr: 0x[0xd15b80, line 0xd15b80] m_writeRequestTable: 1 current time: 103376282000 issue_time: 102882408500 difference: 493873500
 @ cycle 103376282000
[wakeup:build/ALPHA_token_topaz/mem/ruby/system/Sequencer.cc, line 122]
Memory Usage: 987292 KBytes
Program aborted at cycle 103376282000


I wonder if you have encountered similar problems. Do you have any ideas about how to resolve this problem?

Valentin Puente

unread,
Aug 26, 2014, 6:02:24 AM8/26/14
to topaz-...@googlegroups.com
You are using a 16 node network (4x4) with a 4 core system. Although this might still work (with 12 idle routers) it is not correct. In any case it looks like the deadlock (because happens after a lot cycles) is more related with ruby coherence protocol. Which protocol are you using?

(BTW, your simulation will take ages. I suggest you to take a checkpoint at the beginning of the ROI using atomic mode)


--
You received this message because you are subscribed to the Google Groups "topaz-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to topaz-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
--
vpuente

Liang Wang

unread,
Aug 26, 2014, 7:50:40 AM8/26/14
to topaz-...@googlegroups.com
Thanks so much for your suggestion.
Actually M44-CT-MC was changed to 2*2 NoC, but I forgot to mention that.
I'm using protocol MOESI_CMP_token.

Valentin Puente

unread,
Aug 28, 2014, 3:43:32 AM8/28/14
to topaz-...@googlegroups.com
uhmmm...that is quite hard to answer. That protocol is quite reliable.The ruby tester it si running ok?  With other protocol such as hammer do you see similar problems?
Message has been deleted

Liang Wang

unread,
Aug 28, 2014, 10:38:56 AM8/28/14
to topaz-...@googlegroups.com
Aug 28 (8 hours ago)
Thank you. I have solved this problem by annotating the "panic("Possible Deadlock detected. Aborting! ...." in Sequencer.cc. And it works!
I have another question about the topology.  If I want to define a new topology, do I need to both create a new python script in ./configs/topologies and change the .sgm files in TOPAZ?

Valentin Puente

unread,
Aug 28, 2014, 11:15:56 AM8/28/14
to topaz-...@googlegroups.com
Good!

You have to play with SGML (network.sgml) to use another network topology. You can use directly one the supported ones... ie. mesh, torus, midimew, ... (be aware that it might be necessary to choose the right router). If the topology isn't supported, you need to add a new class to TOPAZ.

srinivas a.v.

unread,
Apr 21, 2016, 1:03:40 PM4/21/16
to topaz-discuss
Hi Liang:

Could you please tell me how were you able to fix the deadlock problem? I have the same issue too.

Thanks
Srinivas

Valentin Puente

unread,
Apr 21, 2016, 1:21:36 PM4/21/16
to topaz-discuss
It is the ruby random tester working?

Perhaps the "delay" for deadlock detection in ruby is too short. If you are using a token protocol (it is quite demanding... especially if reach the "persistent request storm" [1] ), this might be the case: rise the deadlock detection threshold. It should fix it.


ali...@live.com

unread,
Jun 28, 2017, 8:36:41 AM6/28/17
to topaz-discuss
Hi
I ran into the same problem you mentioned.How you solved this problem? Are you only commented out "panic" instruction?
Reply all
Reply to author
Forward
0 new messages