does anyone know of some mathematics for analysing which would be the best "primitives" for Matrix operations, suited to transposition and inversion, determinant and normalisation?for 3D that generally means just 2x2, 3x3 and 4x4.
i'm looking up how matrix inverses are calculated and, hoo-boy :)normalisation looks to be just "divide by the determinant":so... i am logically deducing that if you wanted something RISC-like, you'd have operations for transpose and determinant? or... can determinant be broken down further into something elegant?whilst 2x2 looks pretty obvious - 0,0 x 1,1 - 1,0 x 0,1 - it looks like it goes recursive from there, with permutations:at that point, honestly, i'm scared/alarmed to even recommend a Matrix Determinant opcode!
likewise, Transpose appears to be a series of MV operations with offsets, which tends to suggest that there may be some vector primitives that would be really good to have, that could be added to this:a quick search "matrix inverse swizzle" shows this:which mentions __mm_shuffle_epl32, _mm_shuffle_ps, _mm_movelp_ps and that leads on a merry search to SSE/AVX/AVX-512:looovely :)at which point i am definitely lost. does anyone have any suggestions?
> the algorithms I've generally used are an unrolled form of gauss jordan elimination or just using the formula from a symbolic math program [1], at which point, like operations can be grouped together.
>
>
> [1]:
> type the following into maxima:
> m:apply(matrix, makelist(makelist(concat(v, i, j), j, 0, 3), i, 0, 3)); grind(invert(m))$
https://brilliant.org/wiki/gaussian-elimination/
That looks very much like the equation substitution system i was taught at A Level. Multiply both sides by blah blah, substitute or subtract and eliminate blah blah.
Neat. Of course it applies to 3x3
>
> I've not generally had matrix inverse on a fast path, instead passing a matrix and it's inverse together if needed and only calculating the inverse when I generate the matrix.
>
>
> For most 3D programs, matrices are used much more than they are generated, so matrix inverse shouldn't generally be in the fast path, if the program is designed well.
The idea from SIGGRAPH2019 by Pixilica, the driving force, is to go into long-tail applications and thus design something that is useful for innovation where mass volume GPUs have suboptimal performance, legacy issues precisely because of their profit driven focus on highest volume makets.
Where one of these algorithms (determinant ON^3) is so long, I would be concerned even despite their potential appeal, without some form of CSR FSM reentrant state, and even then it makes me nervous.
The compromise is to in great detail the best algorithms then target the ISA at parallellisable operations that will reorder the data.
A year ago I did design a REMAP system for SV which allowed up to a 3D permutation of elements.
I am however tempted to suggest just using MV.X regs[rd] = regs[regs[rs1]] as as means to arbitrarily reorder data. Precompute a transposition set of indices and let a MV take care of them.
Likewise for determinant, MV can be used to extract the elements to reorder them into petmuted pairs that would allow them to be multiplied as 2 vectors, followed by the subtractions as a second set.
Likewise for Inverse. However MV.X is pretty expensive.
I wonder there if an answer might be to permit the rs1 vector elements to be typecast to elwidth of 8? More to the point, SV already has that capability... or does it...
No I don't think it does. The assumption is that the element indices are always XLEN wide and it is the data to which elwidth applies in the src.
Have to fix that by adding a fmt field to MV.X let me just update the page
No I don't think it does. The assumption is that the element indices are always XLEN wide and it is the data to which elwidth applies in the src.
Have to fix that by adding a fmt field to MV.X let me just update the page]
> mv.x with 8-bit indexes sounds like a good idea.
Yehyeh. I wonder...
>
> assuming a vector of 4x4 matrixes is stored as 4 separate vectors with subvl=4 in struct-of-array-of-struct form (the form I've been planning on using):
> using standard (4+4) -> 4 swizzle instructions with 2 input vectors with subvl=4 and 1 output vector with subvl, a vectorized matrix transpose operation can be done in 2 steps with 4 instructions per step to give 8 instructions in total:
How about 5?
ldimm x4, 0x0004080c # transposition indices, packed 8bit
{SVP.VL=4} MV.X x8, x4, elwidth=8
{SVP.VL=4} MV.X x9, x4, elwidth=8
{SVP.VL=4} MV.X x10, x4, elwidth=8
{SVP.VL=4} MV.X x11, x4, elwidth=8
You remember the idea we had 8 months ago to make the offsets relative to rd?
How about 2? :)
ldimm x4, 0x0004080c # transposition indices, packed 8bit
{SVP.VL=4,SUBVL=4} MV.X x8, x4, elwidth=8
Would that even work? Hm it would work if there was a special bit or opcode to apply the offsets to SUBVL looping but not VL
That would be pretty awesome.
> | m00 m10 m20 m30 |
> | m01 m11 m21 m31 |
> | m02 m12 m22 m32 |
> | m03 m13 m23 m33 |
>
>
> transpose 4 corner 2x2 matrices
>
>
>
> intermediate:
> | m00 m01 m20 m21 |
> | m10 m11 m30 m31 |
> | m02 m03 m22 m23 |
> | m12 m13 m32 m33 |
How does that work with only SUBVL swizzle in only 4 intructions? The swizzles only apply horizontally, afaik.
L.
On Tue, Sep 17, 2019, 22:24 lkcl <luke.l...@gmail.com> wrote:does anyone know of some mathematics for analysing which would be the best "primitives" for Matrix operations, suited to transposition and inversion, determinant and normalisation?for 3D that generally means just 2x2, 3x3 and 4x4.Generally, for 3D graphics, matrix left/right-multiplication by vector/matrix and transpose is by far more common than all the other matrix operations combined, so having special HW support for inverse, determinant, and normalize (never heard of that applied to matrices before) is probably unnecessary.
i'm looking up how matrix inverses are calculated and, hoo-boy :)
normalisation looks to be just "divide by the determinant":
so... i am logically deducing that if you wanted something RISC-like, you'd have operations for transpose and determinant? or... can determinant be broken down further into something elegant?
whilst 2x2 looks pretty obvious - 0,0 x 1,1 - 1,0 x 0,1 - it looks like it goes recursive from there, with permutations:at that point, honestly, i'm scared/alarmed to even recommend a Matrix Determinant opcode!yeah, determinant is an O(n^3) operation (though there may be something faster for really big n)
--
You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+u...@groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9b038581-f4b0-499b-b948-abe6c77b49aa%40groups.riscv.org.
normalisation looks to be just "divide by the determinant":NOOOH. She divides every column by the length of the column.
|
anyway, here's a web page showing one way it can be done (the swizzles will need to be decoded from the matrices provided, or just look up the sse2 instructions):
NP 0F 62 /r1 PUNPCKLDQ mm, mm/m32 | A | V/V | MMX | Interleave low-order doublewords from mm and mm/m32 into mm. |