Dear Jerry,
I have implemented similar operations C = A*D*B, C = D*A*B, etc. (with D a diagonal matrix) in TBLIS (https://github.com/devinamatthews/tblis). We're currently working on moving some of that infrastructure into BLIS so that we can easily implement the same types of operations.
(The technical answer: the way to implement this is to incorporate scaling by the weights x into the "packing kernel". This is an operation which copies blocks of A into a special optimized format. Adding the FLOPs to do the weighting there is essentially free, but requires modifying the packing kernel which is not currently easy to do.)
Thanks,
Devin Matthews
From:
"blis-...@googlegroups.com" <blis-...@googlegroups.com> on behalf of "Field G. Van Zee" <fi...@cs.utexas.edu>
Date: Thursday, August 12, 2021 at 1:49 PM
To: "blis-...@googlegroups.com" <blis-...@googlegroups.com>
Subject: Re: [blis-devel] Implementing a fast matrix “sandwich” product
[EXTERNAL SENDER]
To unsubscribe from this group and stop receiving emails from it, send an email to blis-devel+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-devel/71ac5cf1-6158-2bbc-8da0-0b47b77d5dfb%40cs.utexas.edu.
To unsubscribe from this group and stop receiving emails from it, send an email to blis-devel+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-devel/71ac5cf1-6158-2bbc-8da0-0b47b77d5dfb%40cs.utexas.edu.
You may want to look at https://github.com/ChenhanYu/hmlp for
inspiration.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-devel/7f77da61-2e24-4b73-8d3d-7bbc09ae382fn%40googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/blis-devel/7f77da61-2e24-4b73-8d3d-7bbc09ae382fn%40googlegroups.com.
Hi Jerry,
I'm glad that you've had success with the "sandbox" approach and Field's help. I mentioned that we're working towards an interface to make it easier to implement extended operations like this using the "standard" BLIS framework (i.e. not a sandbox). I put together a mock-up of what your C += A*D*A^T operation might look like, see the attached source file. This example won't compile and run currently but should in the near future. We would love to have your input on the mock-up and especially any areas that still seem a bit rough or confusing. It's only 165 lines with lots of comments so hopefully this represents a simpler and more streamlined approach to your and others' problems.
Devin
From:
"blis-...@googlegroups.com" <blis-...@googlegroups.com> on behalf of Jerry Mao <jerr...@quantco.com>
Date: Friday, August 27, 2021 at 7:19 AM
To: "blis-...@googlegroups.com" <blis-...@googlegroups.com>
Subject: Re: [blis-devel] Implementing a fast matrix “sandwich” product
[EXTERNAL SENDER]
Hello,
To view this discussion on the web visit
https://groups.google.com/d/msgid/blis-devel/7cd26f85-f959-421e-b5c1-d8f30959b124n%40googlegroups.com.