Hello developers,
Thank you all for your continued support of the software. ^_^.
The advantage is obvious, even compared with gemmlowp which process 8bit matrix multiplication. Now I have to process fix-point matrix multiplication, but I am not sure how much work I should do to support fix-point matrix multiplication on OpenBLAS for a higher efficiency.
Should I change the whole architecture or just the corresponding fix-point instructions?
Thanks.