Systolic Array with SMAUG

340 views
Skip to first unread message

Sean Carroll

unread,
Jan 12, 2021, 4:05:15 PM1/12/21
to gem5-Aladdin users
Hi, 

I'm trying to run a DL workload (imagenet-resnet in the experiments/models/ dir) through SMAUG, and I want to simulate using the gem5 implementation of the systolic array rather than the SMV accelerator. Is there documentation or a tutorial somewhere on how to enable this? And if this isn't the right forum, please let me know what is. I didn't find a SMAUG mailing list, so I went to the next closest thing. 

Thanks,
Sean Carroll

yaoyu...@gmail.com

unread,
Jan 12, 2021, 4:30:46 PM1/12/21
to Sean Carroll, gem5-Aladdin users

Hi Sean,

 

Try the --use-systolic-array commandline flag to enable it. It will use the systolic array of the backend whenever possible.

 

Currently we don’t have a mailing-list for SMAUG, you can either ask questions here or file issues/questions on the SMAUG repo.

 

Thanks,

Yuan

--
You received this message because you are subscribed to the Google Groups "gem5-Aladdin users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to gem5-aladdin-us...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/gem5-aladdin-users/2f8febd8-aa42-4786-98b8-760d2d56e96bn%40googlegroups.com.

Sean Carroll

unread,
Jan 13, 2021, 2:32:09 PM1/13/21
to gem5-Aladdin users
Hi Yuan, 

Thanks for the tip and the quick response. 

I added that commandline flag to the generation of the trace file

build/bin/smaug-instrumented resnet_smv_topo.pbtxt resnet_smv_params.pb --sample-level very_high --use-systolic-array

and then I also added it to the gem5 simulation command

/workspace/gem5-aladdin/build/X86/gem5.opt \
  --debug-flags=Aladdin,HybridDatapath \
  --outdir=outputs \
  /workspace/gem5-aladdin/configs/aladdin/aladdin_se.py \
  --num-cpus=1 \
  --mem-size=4GB \
  --mem-type=LPDDR4_3200_2x16  \
  --cpu-clock=2.5GHz \
  --cpu-type=DerivO3CPU \
  --ruby \
  --access-backing-store \
  --l2_size=2097152 \
  --l2_assoc=16 \
  --cacheline_size=32 \
  --accel_cfg_file=gem5.cfg \
  --fast-forward=10000000000 \
  -c /workspace/smaug/build/bin/smaug \
  -o "resnet_smv_topo.pbtxt resnet_smv_params.pb --gem5 --use-systolic-array --debug-level=0"

and I added the text from the systolic_array.cfg found in gem5-aladdin/src/systolic_array/test/systolic_array.cfg to the gem5.cfg file I was using (which was from the minerva example listed in the SMAUG README). 

This seemed to enable the gem5 implementation of the systolic array, and I believe the simulation was indeed invoking that accelerator when possible, but it died about 3 hours into the sim after an assertion in the systolic array fired. Last bit of output:

info: Received mapping for array host_weights at vaddr 11123180 of length 32768.
gem5.opt: build/X86/systolic_array/commit.cpp:74: virtual void systolic::Commit::evaluate(): Assertion `!outputBuffer[index].isWindowEnd() && "A new output pixel finished while the previous one from the " "same PE has not been written back."' failed.
Program aborted at tick 3632178694000
--- BEGIN LIBC BACKTRACE ---
/workspace/gem5-aladdin/build/X86/gem5.opt(_Z15print_backtracev+0x2c)[0x564d0696cc4c]
/workspace/gem5-aladdin/build/X86/gem5.opt(_Z12abortHandleri+0x4a)[0x564d0697eeba]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f4730617890]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xc7)[0x7f472e881e97]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x141)[0x7f472e883801]
/lib/x86_64-linux-gnu/libc.so.6(+0x3039a)[0x7f472e87339a]
/lib/x86_64-linux-gnu/libc.so.6(+0x30412)[0x7f472e873412]
/workspace/gem5-aladdin/build/X86/gem5.opt(_ZN8systolic6Commit8evaluateEv+0x1303)[0x564d06c46263]
/workspace/gem5-aladdin/build/X86/gem5.opt(_ZN8systolic8Dataflow8evaluateEv+0xad)[0x564d06c3aecd]
.....

Are there more configuration steps necessary to enable the gem5 implementation of the systolic array? Does anything in the python file building the resnet50 graph need to be modified? I'm trying to recreate and characterize the setups discussed in the SMAUG paper for the imagenet-resnet implementation in smaug/experiments/models. 

Thanks, 
Sean

yaoyu...@gmail.com

unread,
Jan 13, 2021, 11:00:11 PM1/13/21
to Sean Carroll, gem5-Aladdin users

Hi Sean,

 

Thanks for reporting this. The commandlines for generating trace and running gem5 simulation look right to me. That systolic_array.cfg should also be correct.

 

I tried to reproduce this using the same trace and config, and for me it was a segfault error during the stage 3 of ResNet50. So something must be wrong in the systolic array. Rooting causing this is a little challenging as it takes a couple of hours to crash, but I’ll let you know if I find the bug.

Sean Carroll

unread,
Jan 14, 2021, 3:42:07 PM1/14/21
to gem5-Aladdin users
Of course. If there is anything I can do to help, let me know. 

Thanks, 
Sean 

yaoyu...@gmail.com

unread,
Jan 14, 2021, 9:13:47 PM1/14/21
to Sean Carroll, gem5-Aladdin users

I can reproduce the same error now and got some progress. So in the systolic array, the commit unit basically collects finished data from the PE array, buffers it to fill up a memory request size, and then sends a write request to the output scratchpad. That assertion failed because for some reason when the commit unit received new data from the PE array, the previous one in the buffer had not finished writing to the scratchpad. This seems to be a memory/bus bandwidth issue, but I’m not sure yet. I’ll dig more to root cause it.

 

Some suggestions that would help debug:

  • use the TimingSimpleCPU can reduce the simulation time quite a lot (and still reproduce the issue),
  • also turn on gem5 debug flags like SystolicToplevel (and SystolicCommit in this case) will a give detailed trace of the systolic array.

Sean Carroll

unread,
Jan 22, 2021, 3:45:29 PM1/22/21
to gem5-Aladdin users
Hi Yuan, 

I changed the cpu and added those debug flags. Were you able to make any progress on this issue?
Do you need any outputs from my most recent run with the TimingSimpleCPU and the gem5 flags enabled?


Thanks,
Sean

Sam Xi

unread,
Jan 22, 2021, 4:08:39 PM1/22/21
to Sean Carroll, gem5-Aladdin users
Hi Sean,

Yuan should have fixed the issue (https://github.com/harvard-acc/gem5-aladdin/pull/37). I got a bit behind in the review though, but you can patch it in for now. It'll get merged shortly.

Sam Xi
Google Inc., Software Engineer
http://www.samxi.org



yaoyu...@gmail.com

unread,
Jan 22, 2021, 4:35:17 PM1/22/21
to Sean Carroll, gem5-Aladdin users, Sam Xi

Hi Sean,

 

Sorry for the late response. This issue is because of a bug in the TensorIndexIterator class that’s used to easily index data in a tensor. Please pull the changes from the master branch, this should have been fixed. Let us know if you run into any more issues.

 

Thanks,

Yuan

Reply all
Reply to author
Forward
0 new messages