It is not true for Cuda (on multi-threaded decoders). It is very easy to make it work for MKL or CBLAS, I am running30 decoder on a fairly complex nnet3 with each one under real-time decoding for days.
But is easily crashed on GPU ( k20, Titan and etc) within seconds. Sometimes it is segmentation fault; sometimes it is crash. All are related cuda functions like the following. Cuda driver functions are very lousy for multiple thread decoding. Still, try to fix...
LOG ([5.2.142~6-90600]:kaldi::CuDevice::IsComputeExclusive():cu-device.cc:264) CUDA setup operating under Compute Exclusive Process Mode.
LOG ([5.2.142~6-90600]:kaldi::CuDevice::SelectGpuIdMan():cu-device.cc:491) The active GPU is [0]: Tesla K20c free:4647M, used:78M, total:4726M, free/total:0 .98335 version 3.5
LOG ([5.2.142~6-90600]:kaldi::nnet3::Nnet::RemoveSomeNodes():nnet-nnet.cc:926) Removed 2 orphan nodes.
LOG ([5.2.142~6-90600]:kaldi::nnet3::Nnet::RemoveOrphanComponents():nnet-nnet.cc:849) Removing 2 orphan components.
LOG ([5.2.142~6-90600]:kaldi::nnet3::ModelCollapser::Collapse():nnet-utils.cc:798) Added 1 components, removed 2
LOG ([5.2.142~6-90600]:kaldi::nnet3::CompileLooped():nnet-compile-looped.cc:337) Spent 2.86904 seconds in looped compilation.
ERROR ([5.2.142~6-90600]:kaldi::CuMemoryAllocator::Free():cu-allocator.cc:283) Attempt to free CUDA memory pointer that was not allocated: 0000004306AA0000
ASSERTION_FAILED ([5.2.142~6-90600]:kaldi::CuMemoryAllocator::MruCache::Lookup():cu-allocator.cc:317) : '!q.empty()'
WARNING ([5.2.142~6-90600]:kaldi::nnet3::NnetComputer::ExecuteCommand():nnet-compute.cc:341) Printing some background info since error was detected
LOG ([5.2.142~6-90600]:kaldi::nnet3::NnetComputer::ExecuteCommand():nnet-compute.cc:342) matrix m1(79, 29), m2(77, 29), m3(77, 29), m4(75, 928), m5(73, 464), m 6(71, 464), m7(69, 464), m8(63, 464), m9(63, 512), m10(21, 1536), m11(21, 512), m12(1, 640), m13(1, 2560), m14(1, 1024), m15(1, 256), m16(1, 640), m17(1, 640), m18(1, 2560), m19(1, 1024), m20(1, 256), m21(1, 640), m22(1, 640), m23(1, 2560), m24(1, 1024), m25(1, 256), m26(1, 640), m27(1, 640), m28(1, 2560), m29(1, 102 4), m30(1, 256), m31(1, 640), m32(1, 640), m33(1, 2560), m34(1, 1024), m35(1, 256), m36(1, 640), m37(1, 640), m38(1, 2560), m39(1, 1024), m40(1, 256), m41(1, 6 40), m42(1, 640), m43(1, 2560), m44(1, 1024), m45(1, 256), m46(1, 640), m47(1, 640), m48(1, 2560), m49(1, 1024), m50(1, 256), m51(1, 640), m52(1, 640), m53(1, 2560), m54(1, 1024), m55(1, 256), m56(1, 640), m57(1, 640), m58(1, 2560), m59(1, 1024), m60(1, 256), m61(1, 640), m62(1, 640), m63(1, 2560), m64(1, 1024), m65( 1, 256), m66(1, 640), m67(1, 640), m68(1, 2560), m69(1, 1024), m70(1, 256), m71(1, 640), m72(1, 640), m73(1, 2560), m74(1, 1024), m75(1, 256), m76(1, 640), m77 (1, 640), m78(1, 2560), m79(1, 1024), m80(1, 256), m81(1, 640), m82(1, 640), m83(1, 2560), m84(1, 1024), m85(1, 256), m86(1, 640), m87(1, 640), m88(1, 2560), m 89(1, 1024), m90(1, 256), m91(1, 640), m92(1, 640), m93(1, 2560), m94(1, 1024), m95(1, 256), m96(1, 640), m97(1, 640), m98(1, 2560), m99(1, 1024), m100(1, 256) , m101(1, 640), m102(1, 640), m103(1, 2560), m104(1, 1024), m105(1, 256), m106(1, 640), m107(1, 640), m108(1, 2560), m109(1, 1024), m110(1, 256), m111(1, 640), m112(1, 640), m113(1, 2560), m114(1, 1024), m115(1, 256), m116(19, 768), m117(19, 512), m118(17, 1536), m119(17, 512), m120(1, 640), m121(1, 2560), m122(1, 10 24), m123(1, 256), m124(1, 640), m125(1, 640), m126(1, 2560), m127(1, 1024), m128(1, 256), m129(1, 640), m130(1, 640), m131(1, 2560), m132(1, 1024), m133(1, 25 6), m134(1, 640), m135(1, 640), m136(1, 2560), m137(1, 1024), m138(1, 256), m139(1, 640), m140(1, 640), m141(1, 2560), m142(1, 1024), m143(1, 256), m144(1, 640 ), m145(1, 640), m146(1, 2560), m147(1, 1024), m148(1, 256), m149(1, 640), m150(1, 640), m151(1, 2560), m152(1, 1024), m153(1, 256), m154(1, 640), m155(1, 640) , m156(1, 2560), m157(1, 1024), m158(1, 256), m159(1, 640), m160(1, 640), m161(1, 2560), m162(1, 1024), m163(1, 256), m164(1, 640), m165(1, 640), m166(1, 2560) , m167(1, 1024), m168(1, 256), m169(1, 640), m170(1, 640), m171(1, 2560), m172(1, 1024), m173(1, 256), m174(1, 640), m175(1, 640), m176(1, 2560), m177(1, 1024) , m178(1, 256), m179(1, 640), m180(1, 640), m181(1, 2560), m182(1, 1024), m183(1, 256), m184(1, 640), m185(1, 640), m186(1, 2560), m187(1, 1024), m188(1, 256), m189(1, 640), m190(1, 640), m191(1, 2560), m192(1, 1024), m193(1, 256), m194(1, 640), m195(1, 640), m196(1, 2560), m197(1, 1024), m198(1, 256), m199(1, 640), m200(1, 640), m201(1, 2560), m202(1, 1024), m203(1, 256), m204(15, 768), m205(15, 512), m206(13, 1536), m207(13, 512), m208(1, 640), m209(1, 2560), m210(1, 102 4), m211(1, 256), m212(1, 640), m213(1, 640), m214(1, 2560), m215(1, 1024), m216(1, 256), m217(1, 640), m218(1, 640), m219(1, 2560), m220(1, 1024), m221(1, 256 ), m222(1, 640), m223(1, 640), m224(1, 2560), m225(1, 1024), m226(1, 256), m227(1, 640), m228(1, 640), m229(1, 2560), m230(1, 1024), m231(1, 256), m232(1, 640) , m233(1, 640), m234(1, 2560), m235(1, 1024), m236(1, 256), m237(1, 640), m238(1, 640), m239(1, 2560), m240(1, 1024), m241(1, 256), m242(1, 640), m243(1, 640), m244(1, 2560), m245(1, 1024), m246(1, 256), m247(1, 640), m248(1, 640), m249(1, 2560), m250(1, 1024), m251(1, 256), m252(1, 640), m253(1, 640), m254(1, 2560), m255(1, 1024), m256(1, 256), m257(1, 640), m258(1, 640), m259(1, 2560), m260(1, 1024), m261(1, 256), m262(1, 640), m263(1, 640), m264(1, 2560), m265(1, 1024), m266(1, 256), m267(1, 640), m268(1, 640), m269(1, 2560), m270(1, 1024), m271(1, 256), m272(13, 256), m273(13, 3917), m274(39, 29), m275(39, 29), m276(39, 29), m277(41, 29), m278(39, 928), m279(41, 928), m280(39, 464), m281(41, 464), m282(39, 464), m283(41, 464), m284(39, 464), m285(45, 464), m286(39, 464), m287(39, 512), m288(13, 1536), m289(13, 512), m290(1, 640), m291(1, 640), m292(1, 2560), m293(1, 1024), m294(1, 256), m295(1, 640), m296(1, 640), m297(1, 2560), m298(1, 1024), m299(1, 256), m300(1, 640), m301(1, 640), m302(1, 2560), m303(1, 1024), m304(1, 256), m305(1, 640), m306(1, 640), m307(1, 2560), m308(1, 1024), m309(1, 256), m310(1, 640), m311(1, 640), m312(1, 2560), m313(1, 1024), m314(1, 256), m315(1, 640), m316(1, 640), m317(1, 2560), m318(1, 1024), m319(1, 256), m320(1, 640), m321(1, 640), m322(1, 2560), m323(1, 1024), m324(1, 256), m325(1, 640), m326(1, 640), m327(1, 2560), m328(1, 1024), m329(1, 256), m330(1, 640), m331(1, 6 40), m332(1, 2560), m333(1, 1024), m334(1, 256), m335(1, 640), m336(1, 640), m337(1, 2560), m338(1, 1024), m339(1, 256), m340(1, 640), m341(1, 640), m342(1, 25 60), m343(1, 1024), m344(1, 256), m345(1, 640), m346(1, 640), m347(1, 2560), m348(1, 1024), m349(1, 256), m350(1, 640), m351(1, 640), m352(1, 2560), m353(1, 10 24), m354(1, 256), m355(13, 768), m356(13, 1536), m357(13, 512), m358(1, 640), m359(1, 640), m360(1, 2560), m361(1, 1024), m362(1, 256), m363(1, 640), m364(1, 640), m365(1, 2560), m366(1, 1024), m367(1, 256), m368(1, 640), m369(1, 640), m370(1, 2560), m371(1, 1024), m372(1, 256), m373(1, 640), m374(1, 640), m375(1, 2 560), m376(1, 1024), m377(1, 256), m378(1, 640), m379(1, 640), m380(1, 2560), m381(1, 1024), m382(1, 256), m383(1, 640), m384(1, 640), m385(1, 2560), m386(1, 1 024), m387(1, 256), m388(1, 640), m389(1, 640), m390(1, 2560), m391(1, 1024), m392(1, 256), m393(1, 640), m394(1, 640), m395(1, 2560), m396(1, 1024), m397(1, 2 56), m398(1, 640), m399(1, 640), m400(1, 2560), m401(1, 1024), m402(1, 256), m403(1, 640), m404(1, 640), m405(1, 2560), m406(1, 1024), m407(1, 256), m408(1, 64 0), m409(1, 640), m410(1, 2560), m411(1, 1024), m412(1, 256), m413(1, 640), m414(1, 640), m415(1, 2560), m416(1, 1024), m417(1, 256), m418(1, 640), m419(1, 640 ), m420(1, 2560), m421(1, 1024), m422(1, 256), m423(13, 768), m424(13, 1536), m425(13, 512), m426(1, 640), m427(1, 640), m428(1, 2560), m429(1, 1024), m430(1, 256), m431(1, 640), m432(1, 640), m433(1, 2560), m434(1, 1024), m435(1, 256), m436(1, 640), m437(1, 640), m438(1, 2560), m439(1, 1024), m440(1, 256), m441(1, 6 40), m442(1, 640), m443(1, 2560), m444(1, 1024), m445(1, 256), m446(1, 640), m447(1, 640), m448(1, 2560), m449(1, 1024), m450(1, 256), m451(1, 640), m452(1, 64 0), m453(1, 2560), m454(1, 1024), m455(1, 256), m456(1, 640), m457(1, 640), m458(1, 2560), m459(1, 1024), m460(1, 256), m461(1, 640), m462(1, 640), m463(1, 256 0), m464(1, 1024), m465(1, 256), m466(1, 640), m467(1, 640), m468(1, 2560), m469(1, 1024), m470(1, 256), m471(1, 640), m472(1, 640), m473(1, 2560), m474(1, 102 4), m475(1, 256), m476(1, 640), m477(1, 640), m478(1, 2560), m479(1, 1024), m480(1, 256), m481(1, 640), m482(1, 640), m483(1, 2560), m484(1, 1024), m485(1, 256 ), m486(1, 640), m487(1, 640), m488(1, 2560), m489(1, 1024), m490(1, 256), m491(13, 256), m492(13, 3917), m493(39, 29), m494(39, 29), m495(39, 29), m496(41, 29 ), m497(39, 928), m498(41, 928), m499(39, 464), m500(41, 464), m501(39, 464), m502(41, 464), m503(39, 464), m504(45, 464), m505(39, 464), m506(39, 512), m507(1 3, 1536), m508(13, 512), m509(1, 640), m510(1, 640), m511(1, 2560), m512(1, 1024), m513(1, 256), m514(1, 640), m515(1, 640), m516(1, 2560), m517(1, 1024), m518 (1, 256), m519(1, 640), m520(1, 640), m521(1, 2560), m522(1, 1024), m523(1, 256), m524(1, 640), m525(1, 640), m526(1, 2560), m527(1, 1024), m528(1, 256), m529( 1, 640), m530(1, 640), m531(1, 2560), m532(1, 1024), m533(1, 256), m534(1, 640), m535(1, 640), m536(1, 2560), m537(1, 1024), m538(1, 256), m539(1, 640), m540(1 , 640), m541(1, 2560), m542(1, 1024), m543(1, 256), m544(1, 640), m545(1, 640), m546(1, 2560), m547(1, 1024), m548(1, 256), m549(1, 640), m550(1, 640), m551(1, 2560), m552(1, 1024), m553(1, 256), m554(1, 640), m555(1, 640), m556(1, 2560), m557(1, 1024), m558(1, 256), m559(1, 640), m560(1, 640), m561(1, 2560), m562(1, 1024), m563(1, 256), m564(1, 640), m565(1, 640), m566(1, 2560), m567(1, 1024), m568(1, 256), m569(1, 640), m570(1, 640), m571(1, 2560), m572(1, 1024), m573(1, 256), m574(13, 768), m575(13, 1536), m576(13, 512), m577(1, 640), m578(1, 640), m579(1, 2560), m580(1, 1024), m581(1, 256), m582(1, 640), m583(1, 640), m584(1 , 2560), m585(1, 1024), m586(1, 256), m587(1, 640), m588(1, 640), m589(1, 2560), m590(1, 1024), m591(1, 256), m592(1, 640), m593(1, 640), m594(1, 2560), m595(1 , 1024), m596(1, 256), m597(1, 640), m598(1, 640), m599(1, 2560), m600(1, 1024), m601(1, 256), m602(1, 640), m603(1, 640), m604(1, 2560), m605(1, 1024), m606(1 , 256), m607(1, 640), m608(1, 640), m609(1, 2560), m610(1, 1024), m611(1, 256), m612(1, 640), m613(1, 640), m614(1, 2560), m615(1, 1024), m616(1, 256), m617(1, 640), m618(1, 640), m619(1, 2560), m620(1, 1024), m621(1, 256), m622(1, 640), m623(1, 640), m624(1, 2560), m625(1, 1024), m626(1, 256), m627(1, 640), m628(1, 640), m629(1, 2560), m630(1, 1024), m631(1, 256), m632(1, 640), m633(1, 640), m634(1, 2560), m635(1, 1024), m636(1, 256), m637(1, 640), m638(1, 640), m639(1, 2 560), m640(1, 1024), m641(1, 256), m642(13, 768), m643(13, 1536), m644(13, 512), m645(1, 640), m646(1, 640), m647(1, 2560), m648(1, 1024), m649(1, 256), m650(1 , 640), m651(1, 640), m652(1, 2560), m653(1, 1024), m654(1, 256), m655(1, 640), m656(1, 640), m657(1, 2560), m658(1, 1024), m659(1, 256), m660(1, 640), m661(1, 640), m662(1, 2560), m663(1, 1024), m664(1, 256), m665(1, 640), m666(1, 640), m667(1, 2560), m668(1, 1024), m669(1, 256), m670(1, 640), m671(1, 640), m672(1, 2560), m673(1, 1024), m674(1, 256), m675(1, 640), m676(1, 640), m677(1, 2560), m678(1, 1024), m679(1, 256), m680(1, 640), m681(1, 640), m682(1, 2560), m683(1, 1024), m684(1, 256), m685(1, 640), m686(1, 640), m687(1, 2560), m688(1, 1024), m689(1, 256), m690(1, 640), m691(1, 640), m692(1, 2560), m693(1, 1024), m694(1, 256), m695(1, 640), m696(1, 640), m697(1, 2560), m698(1, 1024), m699(1, 256), m700(1, 640), m701(1, 640), m702(1, 2560), m703(1, 1024), m704(1, 256), m705(1, 6 40), m706(1, 640), m707(1, 2560), m708(1, 1024), m709(1, 256), m710(13, 256), m711(13, 3917), m712(39, 29)
# The following show how matrices correspond to network-nodes and
# cindex-ids. Format is: matrix = <node-id>.[value|deriv][ <list-of-cindex-ids> ]
# where a cindex-id is written as (n,t[,x]) but ranges of t values are compressed
# so we write (n, tfirst:tlast).
m1 == value: input[(0,-15:63)]
m2 == value: lda_input[(0,-15:61)]
m3 == value: lda[(0,-15:61)]
m4 == value: cnn1.conv[(0,-14:60)]
m5 == value: cnn2.conv[(0,-13:59)]
m6 == value: cnn3.conv[(0,-12:58)]