[llvm-dev] array fill idioms

Bagel via llvm-dev

unread,

Nov 10, 2016, 4:25:50 PM11/10/16

to llvm...@lists.llvm.org

I am asking for some collective wisdom/guidance.

What sort of IR construct should one use to implement filling each
element in an array (or vector) with the same value? In C++, this
might arise in "std:fill" or "std:fill_n", when the element values in the
vector are identical.
In the D language, one can fill an array or a slice of an array
by an assignment, e.g.
"A[2..10] = 42;"

1. What I would prefer is an explicit intrinsic, call it "llvm.fill.*" that
would work similar to the "llvm.memset.*" intrinsic. The memset intrinsic
only works with byte arrays, but provides wonderful optimizations in the
various code generators. Hopefully, these similar optimizations would be
implemented for "llvm.fill.*".

2. Given that I probably won't get my wish, I note that some front-ends use
vector assignment:
store <8 x i16> <i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16 42, i16
42>, <8 x i16>* %14, align 2
Does this work well for architectures without SIMD?
What chunk size should be used for the vector, and is that architecture
dependent?

3. If vectors are not used, but rather an explicit loop of stores,
element-by-element, will this be recognized as an idiom for
architecture-dependent optimizations?

Thanks in advance.
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev

Mehdi Amini via llvm-dev

unread,

Nov 10, 2016, 4:30:29 PM11/10/16

to Bagel, llvm...@lists.llvm.org

Hi,

An alternative is to perform what is done for the equivalent C construct:

void foo() {
char bar[20] = “hello”;
}

->

@foo.bar = private unnamed_addr constant [20 x i8] c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
define void @foo() #0 {
%1 = alloca [20 x i8], align 16
%2 = bitcast [20 x i8]* %1 to i8*
call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
ret void
}

—
Mehdi

Mehdi Amini via llvm-dev

unread,

Nov 10, 2016, 4:53:53 PM11/10/16

to Mehdi Amini, llvm...@lists.llvm.org, Bagel

> On Nov 10, 2016, at 1:30 PM, Mehdi Amini via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Hi,
>
> An alternative is to perform what is done for the equivalent C construct:
>
> void foo() {
> char bar[20] = “hello”;
> }
>
> ->
>
> @foo.bar = private unnamed_addr constant [20 x i8] c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
> define void @foo() #0 {
> %1 = alloca [20 x i8], align 16
> %2 = bitcast [20 x i8]* %1 to i8*
> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
> ret void
> }
>
>

Obviously that isn’t great for patterns, and LLVM can recognize pattern that can fit memset_pattern on targets that supports it, which looks like closer to what you’d like I think.

Bagel via llvm-dev

unread,

Nov 10, 2016, 5:02:21 PM11/10/16

to Mehdi Amini, llvm...@lists.llvm.org

Yes, I know this works peachy keen for char arrays. I'm looking at (which is
hard to express in C) something like

void foo () {
int bar[20] = { 42, 42, ..., 42 };
}

I don't want to do a memcopy of the 20 element constant array, and memset
doesn't work here. I want an intrinsic that copys the scalar int constant 42
to each element of the int array.

bagel

On 11/10/2016 03:30 PM, Mehdi Amini wrote:
> Hi,
>
> An alternative is to perform what is done for the equivalent C construct:
>
> void foo() {
> char bar[20] = “hello”;
> }
>
> ->
>
> @foo.bar = private unnamed_addr constant [20 x i8] c"hello\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00", align 16
> define void @foo() #0 {
> %1 = alloca [20 x i8], align 16
> %2 = bitcast [20 x i8]* %1 to i8*
> call void @llvm.memcpy.p0i8.p0i8.i64(i8* %2, i8* getelementptr inbounds ([20 x i8], [20 x i8]* @foo.bar, i32 0, i32 0), i64 20, i32 16, i1 false)
> ret void
> }
>
>
> —
> Mehdi

_______________________________________________

Ryan Taylor via llvm-dev

unread,

Nov 10, 2016, 5:07:53 PM11/10/16

to Bagel, llvm-dev

Like a list initializer, but will do partials/slices?

Are you just looking to create an intrinsic that will generate a jump to a lib routine?

Robinson, Paul via llvm-dev

unread,

Nov 10, 2016, 5:15:43 PM11/10/16

to Ryan Taylor, Bagel, llvm...@lists.llvm.org

Back in the day, we called this a BLT (block transfer, pronouced 'blit') for the PDP-10 instruction of that name.

You can do this by assigning the first value, then do an overlapping memcpy to fill in the rest.

There's probably something clever you can do with vector instructions too, in many cases.

--paulr

Bagel via llvm-dev

unread,

Nov 10, 2016, 5:17:08 PM11/10/16

to Ryan Taylor, llvm-dev

On 11/10/2016 04:07 PM, Ryan Taylor wrote:
> Like a list initializer, but will do partials/slices?
>

More than just an initializer, and yes it must do partials/slices. More like
std:fill in C++;

> Are you just looking to create an intrinsic that will generate a jump to a lib
> routine?
>

If all else fails, then call a library routine. But if the number of elements
is "small" and suitably aligned, them perhaps several elements can be filled by
a simple load a constant and store.

> On Thu, Nov 10, 2016 at 5:02 PM, Bagel via llvm-dev <llvm...@lists.llvm.org
> <mailto:llvm...@lists.llvm.org>> wrote:
>
> Yes, I know this works peachy keen for char arrays. I'm looking at (which is
> hard to express in C) something like
>
> void foo () {
> int bar[20] = { 42, 42, ..., 42 };
> }
>
> I don't want to do a memcopy of the 20 element constant array, and memset
> doesn't work here. I want an intrinsic that copys the scalar int constant 42
> to each element of the int array.
>
> bagel

_______________________________________________

Stephen Canon via llvm-dev

unread,

Nov 10, 2016, 5:23:44 PM11/10/16

to Robinson, Paul, llvm...@lists.llvm.org, Bagel

On Nov 10, 2016, at 5:15 PM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:

Back in the day, we called this a BLT (block transfer, pronouced 'blit') for the PDP-10 instruction of that name.

On x86, there is REP MOVS.

You can do this by assigning the first value, then do an overlapping memcpy to fill in the rest.

This is undefined behavior, and really doesn’t work with most modern memcpy( ) implementations (the most common thing these days is to simply implement memmove( ) semantics for memcpy, which will very much not do what you want).

– Steve

Robinson, Paul via llvm-dev

unread,

Nov 10, 2016, 6:02:45 PM11/10/16

to sca...@apple.com, llvm...@lists.llvm.org, Bagel

Right, not memcpy/memmove, sorry I should be more careful about that stuff.

REP MOVS looks like the right thing though. Does LLVM know how to produce that?

Thanks,

--paulr

From: sca...@apple.com [mailto:sca...@apple.com]
Sent: Thursday, November 10, 2016 2:24 PM
To: Robinson, Paul
Cc: Ryan Taylor; Bagel; llvm...@lists.llvm.org
Subject: Re: [llvm-dev] array fill idioms

On Nov 10, 2016, at 5:15 PM, Robinson, Paul via llvm-dev <llvm...@lists.llvm.org> wrote:

Bagel via llvm-dev

unread,

Nov 11, 2016, 1:56:47 PM11/11/16

to Robinson, Paul, llvm...@lists.llvm.org

While I am old enough to have programmed on the PDP-10, I didn't remember BLT.

However none of this answers my original questions. Something that produces
REP MOVS might be a step in the right direction. But what of other architectures?

bagel

On 11/10/2016 05:02 PM, Robinson, Paul wrote:
> Right, not memcpy/memmove, sorry I should be more careful about that stuff.
>
> REP MOVS looks like the right thing though. Does LLVM know how to produce that?
>
> Thanks,
>
> --paulr
>
>

_______________________________________________

Philip Reames via llvm-dev

unread,

Nov 25, 2016, 6:47:05 PM11/25/16

to Bagel, llvm...@lists.llvm.org

Take a look an memset (byte patterns), and memset_patternX (multi byte
patterns, only currently supported for selective targets). In general,
support for fill idioms is something we could stand to improve and it's
something I or someone on my team is likely to be working on within the
next year.

Today, the naive store loop is probably your best choice to have emitted
by the frontend. This loop will be nicely vectorized by the loop
vectorizer, specialized if the loop length is known to be small, and
otherwise decently handled. The only serious problem with this
implementation strategy is that you end up with many copies of the fill
loop scattered throughout your code (code bloat). (I'm assuming this
gets aggressively inlined. If it doesn't, well, then there are bigger
problems.)

Moving forward, any further support we added would definitely handle
pattern matching the naive loop constructs. Given that, it's also
reasonably future proof as well.

Philip

Bagel via llvm-dev

unread,

Nov 28, 2016, 12:38:17 PM11/28/16

to Philip Reames, llvm...@lists.llvm.org

Philip, thank you for your comments.

I already use memset for byte patterns, but was unaware of memset_patternX; I
will look into it. Based on your observations, I guess I will go ahead with
the naive store loop approach for now and hope for "llvm.fill.*" in the future.

Thanks, bagel.

On 11/25/2016 05:46 PM, Philip Reames wrote:
> Take a look an memset (byte patterns), and memset_patternX (multi byte
> patterns, only currently supported for selective targets). In general, support
> for fill idioms is something we could stand to improve and it's something I or
> someone on my team is likely to be working on within the next year.
>
> Today, the naive store loop is probably your best choice to have emitted by the
> frontend. This loop will be nicely vectorized by the loop vectorizer,
> specialized if the loop length is known to be small, and otherwise decently
> handled. The only serious problem with this implementation strategy is that
> you end up with many copies of the fill loop scattered throughout your code
> (code bloat). (I'm assuming this gets aggressively inlined. If it doesn't,
> well, then there are bigger problems.)
>
> Moving forward, any further support we added would definitely handle pattern
> matching the naive loop constructs. Given that, it's also reasonably future
> proof as well.
>
> Philip

_______________________________________________

Reply all

Reply to author

Forward