Hi, everyone!
I've met a problem and it seems that it does not make any sense to me.
I tried to implement model compression using Caffe. After pruning, the total number of parameters is drastically reduced. I didn't change the structure of the model, e.g. kernel size, pooling size, and with same test example pool and test size. However, the result is surprising to me, the model with fewer parameters (after pruning) runs slower when I did testing on CPU node. I tried it again on GPU node, the result seems normal, which shows model with fewer parameters runs faster.
Can someone can explain this? I would appreciate this.
Thanks,