[llvm-dev] [RFC] Permit load/store/alloca for struct containing all scalable vectors.

99 views
Skip to first unread message

Kai Wang via llvm-dev

unread,
Mar 18, 2021, 11:42:17 PM3/18/21
to LLVM Dev, Craig Topper, Evandro Menezes, Roger Ferrer Ibanez, Alex Bradbury, Sander De Smalen

Hi all,


We have a proposal to support load/store/alloca for struct containing all scalable vectors. Please help us to review it and give us your suggestions.

Thanks a lot.


Introduction

=======

In RISC-V V-extension, we have a sub-extension, Zvlsseg, that could move multiple contiguous fields in memory to and from consecutively numbered vector registers. In our intrinsic document[1], we define a set of types for these segment load/store intrinsics. We have two additional parameters attached on the Zvlsseg types, the number of fields(NF) and LMUL[2]. NF could be 2 to 8 and LMUL could be 1/8, 1/4, 1/2, 1, 2, 4, 8.


We have tried to use primitive builtin types to model Zvlsseg types. That is, we have <vscale x 2 x i32> for LMUL = 1, int32 vector type. We use <vscale x 4 x i32> for NF = 2, LMUL = 1, int32 Zvlsseg type. However, <vscale x 4 x i32> is also a legal type for LMUL = 2, int32 vector type. They are both legal types. There is no way to distinguish them in the type legalizer.


To address the issue, we use the struct type to model Zvlsseg types in our downstream version. We use {<vscale x 2 x i32>, <vscale x 2 x i32>} for NF = 2, LMUL = 1, int32 Zvlsseg type. There is no ambiguous between these scalable vector types for RISC-V V-extension. However, we have to support load/store/alloca for scalable struct to model Zvlsseg types in this way.


[1]. https://github.com/riscv/rvv-intrinsic-doc/blob/master/intrinsic_funcs/03_vector_load_store_segment_instructions_zvlsseg.md

[2]. The vector length multiplier, LMUL, when greater than 1, represents the default number of vector registers that are combined to form a vector register group.


Implementation

=======

In the current StructLayout implementation, it uses uint64_t to represent the size of struct and offsets of struct members. We use TypeSize for the size of struct and StackOffset for the offsets of elements. In this way, we could record the correct information in the StructLayout when it contains scalable elements. However, TypeSize is a one-dimension polynomial type. To minimize the impact to the current implementation and to fit our requirements, we only permit load/store/alloca all scalable types in a struct or all fixed length types in a struct. That is, TypeSize is either scalable size or fixed size.


Impact on other passes

=======

I have reviewed all uses of StructLayout. A large part of uses are related to ConstantStruct. There should be no use cases for scalable ConstantStruct. Another large part of uses are related to GetElementPtrInst. We only need to support load/store/alloca to fit our requirements. We prefer not to support getelementptr for scalable struct. We could add an assertion in the constructor of GetElementPtrInst to inhibit struct containing scalable vectors. It is a manageable work to change the internal representation of StructLayout.


How to avoid using getelementptr for scalable struct

=======

We could avoid using getelementptr by using insertvalue/extractvalue then load/store the whole structure. For example, instead of


%0 = getelementptr %struct.type, %struct.type* %val, i32 0, i32 0

store <vscale x 2 x i32> %v.coerce0, <vscale x 2 x i32>* %0

%1 = getelementptr %struct.type, %struct.type* %val, i32 0, i32 1

store <vscale x 2 x i32> %v.coerce1, <vscale x 2 x i32>* %1


We could use


%0 = insertvalue %struct.type undef, <vscale x 2 x i32> %v.coerce0, 0

%1 = insertvalue %struct.type %0, <vscale x 2 x i32> %v.coerce1, 1

store %struct.type %1, %struct.type* %val


to avoid using getelementptr for scalable struct.


How to deal with multiple returns with scalable vectors and fixed length objects?

=======

In D94142, it permits to put scalable vectors and fixed length objects in struct as multiple return values of intrinsic calls, but inhibits load/store/alloca for them. In this proposal, we still inhibit load/store/alloca for these struct. How do we deal with it when the return values are struct with scalable vectors and fixed length objects?


We extract the values into the struct with all scalable vectors and extract scalar values as needed.


For example,


%struct.type = type { <vscale x 2 x i32>, <vscale x 2 x i32> }


%3 = call { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } @llvm.riscv.test(i32* %0)

%4 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 0

%5 = insertvalue %struct.type undef, <vscale x 2 x i32> %4, 0

%6 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 1

%7 = insertvalue %struct.type %5, <vscale x 2 x i32> %6, 1

%8 = extractvalue { <vscale x 2 x i32>, <vscale x 2 x i32>, i64 } %3, 2

store i64 %8, i64* %1, align 8

ret %struct.type %7


Related patches

=======

[NFC][IR] Replace isa<ScalableVectorType> with a predicator function.

https://reviews.llvm.org/D98161

[PoC][IR] Permit load/store/alloca for struct with the same scalable vectors.

https://reviews.llvm.org/D98169


- Kai

Kai Wang via llvm-dev

unread,
Mar 30, 2021, 10:18:28 AM3/30/21
to LLVM Dev, Craig Topper, Evandro Menezes, Roger Ferrer Ibanez, Alex Bradbury, Sander De Smalen
Hi all,

Some update about the RFC. We use TypeSize for MemberOffsets now. We only permit all scalable types or all scalar types for the struct members. Use TypeSize is enough for the purpose.

I uploaded some patches related to the RFC.

The use cases for the scalable struct types in RISC-V segment load/store builtins:

To use RecordType to model segment load/store types in Clang:

We do not know the exact size of scalable struct. Do not use memcpy for struct copy in Clang:

Do not use getelementptr for scalable struct types in Clang:

Please help us to review these patches. Thanks a lot.

- Kai
Reply all
Reply to author
Forward
0 new messages