Hi,
Recently , I tried read some source code of gemmlowp , and there's a question:

plz see the picture above , does the inner loop packed same rhs multiple times when the outer loop pack different rows of lhs ?
if so , why not do some pre-work that copy the whole rhs matrix to a big memory , and thus we could iterate rhs in continuously without considering cache.
btw,i wonder if there's some new ideas about optimazing gemmlowp these days?
-----
thks.