Re: [LLVMdev] [cfe-dev] SPIR provisional specification is now available in the Khronos website

Ouriel, Boaz

unread,

Sep 11, 2012, 7:00:59 PM9/11/12

to James Molloy, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi James,

some additional comments regarding some of your questions:
Q: Is SPIR meant to be storage-only, or to allow optimizations to be done?

I agree with Micah that optimizing a SPIR module might make it less portable.
However, SPIR doesn't prohibit optimizations. It is up to the OpenCL optimizer to decide when to "materialize" SPIR to a device specific LLVM module or even convert it to another IR.
It would be useful if we could identify areas in the specification that might break this assumption and discuss the observed limitations.
I think SPIR's offering would be stronger if optimizations could be performed safely.
So the answer to your question is: no, it is not just a storage-only format.

Q: So what's the advantage of adding a semantic-less calling convention over metadata?

Metadata could be used as well, and in fact - this is the current approach used by clang today.
1. However, since this is not a storage-only format, we felt that it would be useful to differentiate the calling conventions from the existing ones in LLVM.
2. Also, since SPIR kernels & functions are device and OS agnostic no calling convention is really suitable for SPIR. Hence, we chose to introduce new ones.
3. Another smaller reason in favor of a new calling convention vs. metadata is the fact that metadata can't be associated to functions in LLVM. This makes the metadata arrangement a bit more complex and less trivial to access by OpenCL optimizers.

Does this make sense? Do you see an issue with adding the suggested calling conventions?

Thanks,
Boaz

-----Original Message-----
From: Villmow, Micah [mailto:Micah....@amd.com]
Sent: Wednesday, September 12, 2012 00:03
To: James Molloy
Cc: James Molloy; Ouriel, Boaz; cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
Subject: RE: [cfe-dev] [LLVMdev] SPIR provisional specifciation is now available in the Khronos website

> -----Original Message-----
> From: mankey...@gmail.com [mailto:mankey...@gmail.com] On Behalf
> Of James Molloy
> Sent: Tuesday, September 11, 2012 1:45 PM
> To: Villmow, Micah
> Cc: James Molloy; Ouriel, Boaz; cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
> Subject: Re: [cfe-dev] [LLVMdev] SPIR provisional specifciation is now
> available in the Khronos website
>
> Hi Micah,
>
> >> (a) You mention special calling conventions and adding them to LLVM.
> >> What are their semantics? And what is their purpose?
> > [Villmow, Micah] One purpose is to differentiate between kernel and
> device functions.
> > Another is to differentiate between the standard calling conventions
> > that have device specific assumptions built into them.
>
> Do you have an example of such a device-specific assumption? Why
> wouldn't the default (no explicit) calling convention do for a storage-
> only format like SPIR?

[Villmow, Micah] It is my understanding, and correct me if I'm wrong, clang
generates code based on the calling convention and the device. We don't want
any assumptions made about the device calling convention until SPIR loading time and
the cleanest way to do that is to introduce our own. Also the default calling convention in LLVM is C, which
does not have the same semantics as ours(for example varargs is illegal except for printf, kernels and functions are different, etc...).
>
> When it does come to codegen (for a CPU target), an LLVM backend would
> be forced to change the calling convention back to something standard
> anyway. So what's the advantage of adding a semanticless calling
> convention over metadata?
[Villmow, Micah] I wouldn't say it is semantic-less, just that its semantics are different than the calling conventions that LLVM currently supports.
>
> >> (b) Why disallow type conversion for vector types? (ss. 3.3)
> > [Villmow, Micah] Type conversions in OpenCL between vector types is
> > doing via builtin functions and not via implicit conversions, so there
> > is no OpenCL code that can generate these conversions directly(OpenCL
> > spec 6.2.1). In order to be portable, library functions cannot be
> > lowered to their IR equivalent until after the device is known.
>
> Is SPIR meant to be storage-only, or to allow optimisations to be done
> on it (valid SPIR -> opt -> valid SPIR)?
[Villmow, Micah] While you can optimize SPIR, you run the chance of reducing portability by optimizing in a non-portable manner. The SPIR spec does not specify how the SPIR is generated or what is done to the binary format before generation, only what is and isn't valid. It is quite possible to generate valid SPIR that is non-portable, but in this case, there is no reason for using SPIR.
>
> If you allow scalar type conversions, and you allow vectors, it follows
> that an optimiser may turn scalar type conversions into vector type
> conversions. Why explicitly disallow this even though there is no
> corollary directly from CL-C source code?
[Villmow, Micah] SPIR in its current form is limited to OpenCL C(being an OpenCL extension). So things that are disallowed in OpenCL C are disallowed in SPIR at this time.
>
> Cheers,
>
> James
>
> On 11 September 2012 16:54, Villmow, Micah <Micah....@amd.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: llvmdev...@cs.uiuc.edu
> >> [mailto:llvmdev...@cs.uiuc.edu]
> >> On Behalf Of James Molloy
> >> Sent: Tuesday, September 11, 2012 8:49 AM
> >> To: Ouriel, Boaz
> >> Cc: cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
> >> Subject: Re: [LLVMdev] SPIR provisional specifciation is now
> >> available in the Khronos website
> >>
> >> Hi Boaz,
> >>
> >> I have a couple of specific questions:
> >>
> >> (a) You mention special calling conventions and adding them to LLVM.
> >> What are their semantics? And what is their purpose?
> > [Villmow, Micah] One purpose is to differentiate between kernel and
> device functions.
> > Another is to differentiate between the standard calling conventions
> > that have device specific assumptions built into them.
> >>
> >> (b) Why disallow type conversion for vector types? (ss. 3.3)
> > [Villmow, Micah] Type conversions in OpenCL between vector types is
> > doing via builtin functions and not via implicit conversions, so there
> > is no OpenCL code that can generate these conversions directly(OpenCL
> > spec 6.2.1). In order to be portable, library functions cannot be
> > lowered to their IR equivalent until after the device is known.
> >>
> >> Cheers,
> >>
> >> James
> >>
> >> On Tue, 2012-09-11 at 12:56 +0100, Ouriel, Boaz wrote:
> >> > Hi All,
> >> >
> >> > In continuation of the previous SPIR introduction email here is a
> >> > link
> >> to the specification:
> >> > http://www.khronos.org/registry/cl/specs/spir_spec-1.0-provisional.
> >> > pdf
> >> >
> >> > The first topic which we would like to discuss is "SPIR
> portability".
> >> > I will send soon an additional mail which will help in leading the
> >> discussion on this topic.
> >> >
> >> > Thanks and happy reading,
> >> > Boaz
> >> >
> >> > -----Original Message-----
> >> > From: llvmdev...@cs.uiuc.edu
> >> > [mailto:llvmdev...@cs.uiuc.edu]
> >> > On Behalf Of Ouriel, Boaz
> >> > Sent: Thursday, September 06, 2012 22:06
> >> > To: cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
> >> > Subject: [LLVMdev] "SPIR" - A Standard Portable IR for OpenCL
> >> > Kernel Language
> >> >
> >> > Greetings All,
> >> > I am sending this mail on behalf of the OpenCL Khronos members.
> >> >
> >> > **** Introduction ****
> >> > Lately, Khronos has ratified a new provisional specification which
> >> > is
> >> called SPIR.
> >> > This specification standardizes an intermediate representation for
> >> > the
> >> OpenCL kernel language.
> >> > It is based on LLVM infrastructure and this is why I am sending
> >> > this
> >> mail to the LLVM mailing list.
> >> > Khronos members would like to initiate a review on the
> >> > specification
> >> with the LLVM community.
> >> >
> >> > **** What is SPIR? ****
> >> > The SPIR specification standardizes an intermediate representation
> >> > for OpenCL programs, which a hypothetical frontend can target to
> >> > generate
> >> binaries that can be consumed and executed by OpenCL drivers
> >> supporting SPIR.
> >> > The SPIR specification, however, does not standardize the design
> >> > and
> >> implementation of such a frontend.
> >> >
> >> > **** SPIR and LLVM ****
> >> > Khronos members chose SPIR to be layered on top of LLVM.
> >> > Why? Portability is a key goal of SPIR, and LLVM has proven to be
> >> highly portable, given its many backends.
> >> > Defining a robust IR for OpenCL from scratch is difficult and
> >> > requires
> >> skills which are not the core competency of the OpenCL Khronos
> members.
> >> > In addition, after the IR is defined, implementing the necessary SW
> >> stack around it is a huge investment. LLVM thus provides a time-to-
> >> market advantage for SPIR.
> >> > Today, many of the OpenCL vendors base their technology on LLVM.
> >> > This
> >> makes LLVM IR the de facto OpenCL IR and the immediate candidate to
> >> be considered by the Khronos members.
> >> > An analysis showed that LLVM IR has its limitations but in general
> >> provides a very good solution for SPIR.
> >> >
> >> > **** Minimal Changes to LLVM ****
> >> > When defining SPIR, Khronos set a goal to keep the changes in LLVM
> >> minimal.
> >> > Most of the changes made during prototyping were in the frontends
> >> > that
> >> the different OpenCL Khronos members used.
> >> > The only changes required by SPIR in LLVM are a new target for
> >> > SPIR, a
> >> new calling convention for regular OpenCL functions, and another one
> >> for OpenCL kernels.
> >> > LLVM IR language definition remains unmodified.
> >> >
> >> > **** Why is SPIR important for OpenCL? **** SPIR offers binary
> >> > portability between OpenCL implementations, and a stable target for
> >> 3rd party compilers without having to go through OpenCL "C".
> >> >
> >> > Binary compatibility simplifies the support burden for developers
> >> delivering applications that use OpenCL.
> >> > The same application can be delivered in fully binary form and work
> >> across existing and future OpenCL implementations supporting SPIR.
> >> > This helps the entire OpenCL ecosystem.
> >> >
> >> > Generally speaking OpenCL is a JIT environment and as such deserves
> >> and requires an intermediate representation like other major JIT
> >> environments already have.
> >> >
> >> > Also, some developers using OpenCL have requested portability at
> >> binary level. Today OpenCL offers portability only at the source
> >> level with OpenCL "C".
> >> > They are concerned with protecting their IP by meeting "Digital
> >> Millennium Copyright Act" requirements.
> >> > Today, those companies are forced to distribute their OpenCL code
> >> using device specific binaries. This leads to many difficulties for
> >> SW developers and end users.
> >> > In addition, the binaries are not guaranteed to be functionally
> >> working as new devices and vendors appear in the market.
> >> > This constraint places OpenCL standard in a disadvantage compared
> >> > to
> >> other standards which already have a portable binary distribution
> form.
> >> > From discussions with some of the companies which raised the
> >> > request,
> >> LLVM IR meets their requirements.
> >> > SPIR doesn't guarantee any security / obfuscation mechanisms. It
> >> > just
> >> provides a portable IR definition.
> >> >
> >> > Khronos members also believe that SPIR will enable data parallel
> >> domain specific languages which will generate directly to SPIR and
> >> execute on top of OpenCL runtimes.
> >> >
> >> > **** SPIR Portability vs. OpenCL "C" **** Portability is one of
> >> > SPIR's goals. However, SPIR does not attempt to solve inherent
> >> > portability
> >> issues, which exist in OpenCL "C" or in C99.
> >> > It is clear that OpenCL programs could be written in a way which
> >> > make
> >> them non portable and very device specific.
> >> > Such programs will never be portable. In addition, some corner case
> >> scenarios which have been identified by Khronos have been disallowed
> >> in SPIR.
> >> > SPIR does not guarantee performance portability across devices.
> >> > This
> >> is also true for OpenCL "C".
> >> >
> >> > **** Is this the final version of SPIR specification (set in
> >> > stone?)
> >> > **** The short answer is "NO", it is not final.
> >> >
> >> > All along the definition stage of SPIR, Khronos had the goal of
> >> reviewing this proposal and collecting feedback on its content with
> >> LLVM community.
> >> > This feedback is not a "nice to have" but rather "a must have".
> >> > So why didn't we define the specification with the community right
> >> from the start? The answer to that has two aspects.
> >> > The first is that Khronos members wanted to do their homework and
> >> > make
> >> sure that the proposal is mature enough to start discussions based on
> >> it.
> >> > The due diligence includes full implementation of the specification
> >> > by
> >> a few members within Khronos.
> >> > The second aspect is the legal part which prevented Khronos from
> >> sharing this information publicly until the specification is ratified
> >> inside Khronos.
> >> > The current version of SPIR specification which is shared with LLVM
> >> community is a provisional specification.
> >> > The main goal of this version of the specification is to collect
> >> feedback from LLVM community, apply the changes and shape the
> >> specification to its final version.
> >> >
> >> > **** Suggested review process ****
> >> > SPIR introduces an intermediate language for OpenCL and hence is a
> >> very large specification with many details and a lot of topics to
> >> discuss.
> >> > Khronos will share the specification with the LLVM community as a
> >> reference.
> >> > However, Khronos believes that the right approach is to review it
> >> > in
> >> > parts: by peeling the different layers and aspects of the
> >> specification, layer by layer (the "onion" way), going from top to
> >> bottom and topic by topic.
> >> >
> >> > Each such topic would be contained in an email thread in LLVM
> >> > mailing list. Since SPIR specification deals with the "HOW", and
> >> > not with the
> >> "WHY", each topic will be associated with a short document that aims
> >> at providing insights into the considerations and goals behind the
> >> way it was defined in SPIR specification.
> >> > Some of the discussions would be accompanied by pieces of code in
> >> CLANG or LLVM that demonstrate what has been implemented by Khronos
> >> members.
> >> > A successful discussion would result with a decision acceptable by
> >> both LLVM community and Khronos.
> >> > We expect that many discussions will move to LLVM Bugzilla for
> >> resolution. This should improve convergence.
> >> >
> >> > We do not want to fork LLVM. We plan to evolve SPIR in response to
> >> LLVM community feedback.
> >> > In addition, where applicable - Khronos members would like to
> >> > upstream
> >> the relevant changes to LLVM and not wait for the entire review of
> >> the specification to be completed.
> >> > Khronos members do realize that applying changes to the LLVM code
> >> > will
> >> not always be possible since some discussions depend on other
> >> discussions.
> >> >
> >> > Why not review the entire specification as a whole? Doing the
> >> > review
> >> on the entire specification would make the discussions unfocused and
> >> difficult to track.
> >> > We expect discussions will be more effective and converge better by
> >> > a
> >> piecemeal approach.
> >> > That being said, we will try to keep the proposal coherent at a
> >> > high
> >> level.
> >> >
> >> > **** clang as a sample OpenCL SPIR generator **** Even though SPIR
> >> > does not standardize the generation process, the Khronos working
> >> > group
> >> would like clang to eventually become the sample OpenCL SPIR
> generator.
> >> > So why only make it a sample generator? Khronos wanted to permit
> >> > the
> >> different OpenCL vendors to choose their own frontend technology and
> >> not require them to use CLANG.
> >> >
> >> > Also, we avoid using clang as a reference generator because any
> >> discrepancy between SPIR outputs generated by clang and the SPIR spec
> >> will be resolved in favor of the spec.
> >> > That is, implementers of other SPIR generators would not be
> >> > required
> >> to maintain bug compatibility with clang.
> >> >
> >> > **** Suggested Topics to discuss ****
> >> >
> >> > This is the list of suggested topics to discuss:
> >> > 1. SPIR specification introduction and scope (this mail)
> >> > 2. SPIR Portability
> >> > a. 32 / 64bits architectures (pointers, size_t, ptrdiff_t,
> >> intptr_t, uintptr_t)
> >> > b. Endianness in OpenCL "C"
> >> > 3. OpenCL built-ins in SPIR
> >> > a. OpenCL Built-ins & LLVM Built-ins
> >> > b. Name Mangling
> >> > 4. OpenCL Metadata Arrangement
> >> > 5. OpenCL Specific items
> >> > a. OpenCL special data types (events, images, samplers) as
> >> opaque data types
> >> > b. Null and zeroinitializer
> >> > c. Local Memory and alloca's
> >> > d. Others
> >> >
> >> > **** Where can I find SPIR specification? **** Khronos is now
> >> > working on making SPIR specification available through the Khronos
> website.
> >> > Once available, we will send a link to the document in this mailing
> >> list.
> >> >
> >> > I am sure this is going to be a lot of fun :), Boaz
> >> > -------------------------------------------------------------------
> >> > --
> >> > Intel Israel (74) Limited
> >> >
> >> > This e-mail and any attachments may contain confidential material
> >> > for the sole use of the intended recipient(s). Any review or
> >> > distribution by others is strictly prohibited. If you are not the
> >> > intended recipient, please contact the sender and delete all
> copies.
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> > -------------------------------------------------------------------
> >> > --
> >> > Intel Israel (74) Limited
> >> >
> >> > This e-mail and any attachments may contain confidential material
> >> > for the sole use of the intended recipient(s). Any review or
> >> > distribution by others is strictly prohibited. If you are not the
> >> > intended recipient, please contact the sender and delete all
> copies.
> >> >
> >> >
> >> > _______________________________________________
> >> > LLVM Developers mailing list
> >> > LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> > http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >> >
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> LLVM Developers mailing list
> >> LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
> >> http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev
> >
> >
> >
> > _______________________________________________
> > cfe-dev mailing list
> > cfe...@cs.uiuc.edu
> > http://lists.cs.uiuc.edu/mailman/listinfo/cfe-dev

---------------------------------------------------------------------
Intel Israel (74) Limited

This e-mail and any attachments may contain confidential material for
the sole use of the intended recipient(s). Any review or distribution
by others is strictly prohibited. If you are not the intended
recipient, please contact the sender and delete all copies.

_______________________________________________
LLVM Developers mailing list
LLV...@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

James Molloy

unread,

Sep 12, 2012, 4:07:34 AM9/12/12

to Ouriel, Boaz, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi Boaz, Micah,

Thanks for the followup.

> I agree with Micah that optimizing a SPIR module might make it less portable.
> However, SPIR doesn't prohibit optimizations. It is up to the OpenCL optimizer to decide when to "materialize" SPIR to a device specific LLVM module or even convert it to another IR.
> It would be useful if we could identify areas in the specification that might break this assumption and discuss the observed limitations.
> I think SPIR's offering would be stronger if optimizations could be performed safely.
> So the answer to your question is: no, it is not just a storage-only format.

My only issue then, is if it is not specifically designed to be a
storage-only format, why restrict the allowable instructions to the
strict set that can be generated by an OpenCL-C frontend? This would
make sense for a storage-only format, but as soon as you let
optimizers loose on it it will by definition change the IR. The
transformation from scalar extractelement/fptoint/insertelement to a
vector fptoint is one concrete example I noticed when perusing the
spec where a valid, and target independent, optimisation could result
in invalid SPIR. I was wondering the benefit of restricting the
allowable instructions so strongly.

> Q: So what's the advantage of adding a semantic-less calling convention over metadata?
>
> Metadata could be used as well, and in fact - this is the current approach used by clang today.
> 1. However, since this is not a storage-only format, we felt that it would be useful to differentiate the calling conventions from the existing ones in LLVM.
> 2. Also, since SPIR kernels & functions are device and OS agnostic no calling convention is really suitable for SPIR. Hence, we chose to introduce new ones.
> 3. Another smaller reason in favor of a new calling convention vs. metadata is the fact that metadata can't be associated to functions in LLVM. This makes the metadata arrangement a bit more complex and less trivial to access by OpenCL optimizers.

My worry here, (and please take this with a pinch of salt because I am
by no means one of the core LLVM developers), is that calling
conventions in the IR are primarily related to code generation. As
SPIR isn't going to be used for codegen (pass SPIR to some backend /
converter that is responsible for that), why worry about the calling
convention? All it is is a wart, it doesn't affect midend phases at
all.

I should note that passing structs (byval or not) in the first
parameter as an sret argument is *independent of the LLVM calling
convention*, and is done at the Clang level. This is part of
LLVM/Clang where responsibility for adhering to an ABI overlaps
between the IR-generator and the IR itself. Adding a new calling
convention in IR and marking functions with it won't stop the IR
generator having to make ABI decisions.

Not only that, but for valid codegen any backend is going to have to
remove those calling convention markers anyway and replace them with
their own, so why have them in the first place?

I've rambled slightly, sorry about that!

Ouriel, Boaz

unread,

Sep 12, 2012, 2:54:00 PM9/12/12

to James Molloy, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi James,

This is very good feedback.

1. Adding the new calling conventions - It seems like the appropriate thing to do vs. metadata. Some OpenCL backends can choose to implement this calling convention and use it during code generation of OpenCL functions/kernels. Can we agree on this item?
2. Restricting the allowable instructions - As Micah mentioned before, the restrictions are there because we are only looking at this for OpenCL at this time. Hence, we currently only map what OpenCL supports. However, I agree that this might make some optimizations incompatible with SPIR. Let me discuss this a bit further in Khronos and come back with additional feedback.

James Molloy

unread,

Sep 12, 2012, 3:17:41 PM9/12/12

to Ouriel, Boaz, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi Boaz, David,

Thanks for taking my responses on board.

> 1. Adding the new calling conventions - It seems like the appropriate thing to do vs. metadata. Some OpenCL backends can choose to implement this calling convention and use it during code generation of OpenCL functions/kernels. Can we agree on this item?

Hmm, this is the one I was most shaky on. I still don't fully
understand what you're using this calling convention for, other than:

1. A marker to distinguish kernels from regular functions, in which
case there are better ways to do that (metadata)
2. A way to remove remnants of the platform or C calling convention.
In this case, I've mentioned that there are things such as
pass-struct-by-pointer that are lowered in the IR-generation stage
that you haven't addressed. Without addressing this (the
IR-generator's part of the ABI compliance contract), I'm not sure how
a new LLVM calling convention helps remove platform dependence.

Although I may have missed a/the point somewhere along the line, I've
been a bit ill in recent days and not fully engaged, brain-wise :)

Cheers,

James

Villmow, Micah

unread,

Sep 12, 2012, 3:30:08 PM9/12/12

to James Molloy, Ouriel, Boaz, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

> -----Original Message-----
> From: mankey...@gmail.com [mailto:mankey...@gmail.com] On Behalf
> Of James Molloy
> Sent: Wednesday, September 12, 2012 12:18 PM
> To: Ouriel, Boaz
> Cc: cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu; Villmow, Micah
> Subject: Re: [cfe-dev] [LLVMdev] SPIR provisional specification is now
> available in the Khronos website
>

> Hi Boaz, David,
>
> Thanks for taking my responses on board.
>
> > 1. Adding the new calling conventions - It seems like the appropriate
> thing to do vs. metadata. Some OpenCL backends can choose to implement
> this calling convention and use it during code generation of OpenCL
> functions/kernels. Can we agree on this item?
>
> Hmm, this is the one I was most shaky on. I still don't fully
> understand what you're using this calling convention for, other than:
>
> 1. A marker to distinguish kernels from regular functions, in which
> case there are better ways to do that (metadata)

[Villmow, Micah] I disagree, the 'kernel' keyword specifies a different calling convention from functions that don't have it. So we need a calling convention that maps to that. The ABI for a kernel is different than the ABI for a non-kernel function. The regular function itself also is different than the default calling convention because it has restrictions on it(one being it can only be called from a kernel function or another device function) that the normal calling convention does not. A way to think about this is that this is more similar to the PTX_[Kernel|Device] calling conventions than the default, fast or stdcall conventions. I don't think metadata is the right approach here since we are specifying unique ways for how these functions can be called.

> 2. A way to remove remnants of the platform or C calling convention.
> In this case, I've mentioned that there are things such as
> pass-struct-by-pointer that are lowered in the IR-generation stage
> that you haven't addressed. Without addressing this (the
> IR-generator's part of the ABI compliance contract), I'm not sure how
> a new LLVM calling convention helps remove platform dependence.

[Villmow, Micah] Ok, fair point, this is something that the specification should probably address.

Pekka Jääskeläinen

unread,

Sep 13, 2012, 5:19:29 AM9/13/12

to Villmow, Micah, James Molloy, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

On 09/12/2012 10:30 PM, Villmow, Micah wrote:
>> case there are better ways to do that (metadata)
> [Villmow, Micah] I disagree, the 'kernel' keyword specifies a different
> calling convention from functions that don't have it. So we need a calling
> convention that maps to that. The ABI for a kernel is different than the
> ABI for a non-kernel function. The regular function itself also is
> different than the default calling convention because it has restrictions
> on it(one being it can only be called from a kernel function or another
> device function) that the normal calling convention does not. A way to
> think about this is that this is more similar to the PTX_[Kernel|Device]
> calling conventions than the default, fast or stdcall conventions. I don't
> think metadata is the right approach here since we are specifying unique
> ways for how these functions can be called.

For what it's worth, this issue manifests itself in an unsolved pocl
bug: https://bugs.launchpad.net/pocl/+bug/987905

It would be simpler to implement a portable implementation for calling the
kernel from the host if we could assume the kernel calling convention mapped
each OpenCL setArg arg to a single kernel function arg (and preferably all
arg data in memory). For the non-kernel functions it should not matter and
could be target-specific.

--
Pekka

Pekka Jääskeläinen

unread,

Sep 24, 2012, 7:41:18 AM9/24/12

to Villmow, Micah, James Molloy, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi all,

Another OpenCL C implementation issue I'm currently fighting with is how
to best implement the automatic __local variables. Seems SPIR enforces
the current Clang implementation of them that converts the automatic
locals to C function static variables (thus, in practice global variables).

Clearly, this is not a thread safe "final implementation", thus works as is
only when multiple work groups of the same kernel are not executed in
parallel. Therefore, some other compiler pass is assumed to convert those
function static (module global variables) to some other storage where the
local buffers are allocated per work group thread.

The pocl implementation is what was suggested some time ago in this list:
the locals are converted to local arguments to the kernel function which
are then passed per-thread buffers when the work group is executed. Thus,
pocl needs to convert the references to these dummy globals to local
buffer pointers at the end of the kernel function argument list.

The problem from the use of the "semantically inadequate" 'function
static' variables for the local buffers is caused by LLVM/Clang thinking
they are buffers with a constant base which they eventually won't be in
a parallel WG implementation. This triggers an issue I'm currently working on
pocl: https://bugs.launchpad.net/pocl/+bug/1032203 because Clang generates
constant GEPs for the local buffer accesses (even though in a parallel
thread-safe implementation the local variables cannot be stored to
constant locations).

So, I wonder if this piece of SPIR specs might cause other similar
problems (LLVM optimizing incorrectly due to the slightly wrong semantics)
in the future and should be improved. The minimal fix would be
to add some kind of attribute to the function static global that prevents
Clang/LLVM thinking the address is constant and apply optimizations that rely
on that. Semantically the local buffer is actually a thread-local variable.
Are thread locals somehow supported in LLVM IR?

James Molloy

unread,

Sep 24, 2012, 9:41:57 AM9/24/12

to Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi,

I don't fully understand your problem description.

...is caused by LLVM/Clang thinking

they are buffers with a constant base which they eventually won't be in
a parallel WG implementation. This triggers an issue I'm currently working on pocl: https://bugs.launchpad.net/pocl/+bug/1032203 because Clang generates
constant GEPs for the local buffer accesses (even though in a parallel
thread-safe implementation the local variables cannot be stored to
constant locations).

Surely if you're passing in pointers to the kernel function that differ depending on workgroup, then a GEP from those pointers of a constant amount is perfectly safe. Why would a constant GEP from a per-workgroup base be a problem?

I'm sure there's something I've misunderstood about your solution...

Cheers,

James

James Molloy

unread,

Sep 24, 2012, 10:00:21 AM9/24/12

to Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Hi,

Sorry, With a prod from Silviu (cc'd) I now understand - I was interpreting your use of "constant GEP" as "GEP with constant operand" as opposed to "ConstantGEP node" which of course can only take a Constant* operand, not a Value* operand.

I now fully see the problem and realise that my solution is also prone to that problem :)

Cheers,

James

Pekka Jääskeläinen

unread,

Sep 24, 2012, 10:08:22 AM9/24/12

to James Molloy, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Well,

To be honest I'm not very comfortable with the whole constant GEP
idea. It's a new thing to me and I do not fully understand its
point in LLVM IR, so I probably wasn't very clear ;)

Anyways, me bringing it up was meant as an example of what can happen
if one (mis)uses the C function static variable semantics for something that
really is a thread local variable (in usual thread parallel implementations).

For the record, I just workarounded it in pocl by borrowing the
BreakConstantGEPs code from SAFECode. But for SPIR specs, IMHO, this should
be reconsidered.

On 09/24/2012 05:00 PM, James Molloy wrote:
> Hi,
>
> Sorry, With a prod from Silviu (cc'd) I now understand - I was interpreting
> your use of "constant GEP" as "GEP with constant operand" as opposed to
> "ConstantGEP node" which of course can only take a Constant* operand, not a
> Value* operand.
>
> I now fully see the problem and realise that my solution is also prone to
> that problem :)
>
> Cheers,
>
> James
>
> On 24 September 2012 14:41, James Molloy <ja...@jamesmolloy.co.uk

> <mailto:ja...@jamesmolloy.co.uk>> wrote:
>
> Hi,
>
> I don't fully understand your problem description.
>
> ...is caused by LLVM/Clang thinking
>
> they are buffers with a constant base which they eventually won't be in a
> parallel WG implementation. This triggers an issue I'm currently working on
> pocl: https://bugs.launchpad.net/pocl/+bug/1032203 because Clang generates
> constant GEPs for the local buffer accesses (even though in a parallel
> thread-safe implementation the local variables cannot be stored to constant
> locations).
>
>
> Surely if you're passing in pointers to the kernel function that differ
> depending on workgroup, then a GEP from those pointers of a constant amount
> is perfectly safe. Why would a constant GEP from a per-workgroup base be a
> problem?
>
>
> I'm sure there's something I've misunderstood about your solution...
>
> Cheers,
>
> James
>
> On 24 September 2012 12:41, Pekka Jääskeläinen <pekka.jaa...@tut.fi

James Molloy

unread,

Sep 24, 2012, 11:04:08 AM9/24/12

to Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

For the record, I just workarounded it in pocl by borrowing the
BreakConstantGEPs code from SAFECode. But for SPIR specs, IMHO, this should
be reconsidered.

Yes, I agree.

--
Pekka

James Molloy

unread,

Sep 26, 2012, 4:06:03 AM9/26/12

to Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Micah, Boaz,

Do you guys have any ideas about how to fix this issue?

Cheers,

James

Villmow, Micah

unread,

Sep 26, 2012, 1:21:46 PM9/26/12

to James Molloy, Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

It is my view that this is an implementation detail and not an issue with the SPIR spec. As SPIR is just a representation of a program in a portable manner, it is up to the consumer of SPIR to correctly set up the kernels based on the devices calling convention/ABI when the SPIR binary is loaded for that specific device.

From: mankey...@gmail.com [mailto:mankey...@gmail.com] On Behalf Of James Molloy
Sent: Wednesday, September 26, 2012 1:06 AM
To: Pekka Jääskeläinen
Cc: Villmow, Micah; Ouriel, Boaz; cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu; pocl-...@lists.sourceforge.net
Subject: Re: [cfe-dev] [LLVMdev] SPIR provisional specification is now available in the Khronos website

Micah, Boaz,

James

Yes, I agree.

--
Pekka

Pekka Jääskeläinen

unread,

Sep 27, 2012, 3:41:29 AM9/27/12

to Villmow, Micah, James Molloy, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

On 09/26/2012 08:21 PM, Villmow, Micah wrote:
> It is my view that this is an implementation detail and not an issue
> with the SPIR spec. As SPIR is just a representation of a program in a
> portable manner, it is up to the consumer of SPIR to correctly set up
> the kernels based on the devices calling convention/ABI when the SPIR
> binary is loaded for that specific device.

The question was not about implementing the automatic locals (which is
a device specific detail, like you correctly state), but enforcing LLVM IR
for the automatic locals that potentially leads to illegal optimizations
due to the inadequate semantics of global variables for this use.

If SPIR enforces this type of bitcode for the automatic locals, it means when
such optimizations do happen (the optimizations might be beneficial in
general so they cannot be just disabled due to the SPIR flaw), the
implementers have to work around them with kludges to implement the real
automatic local semantics. What's worse, at some point there might be
an optimization that is not easily worked around which makes this part of the
SPIR specs look bad.

Of course it's possible the constantGEP case was the only problem we will
ever get from this issue, but I wouldn't rely on it in an IR standard
specification if it's possible to avoid it.

BR,

James Molloy

unread,

Sep 27, 2012, 9:42:26 AM9/27/12

to Pekka Jääskeläinen, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu

Indeed; I agree it is not an implementation detail as potentially valid SPIR could be almost untranslatable to valid code running on a target without dedicated workgroup-local memory.

Micah: the problem can be distilled down to: __local variables in SPIR are represented as Constants (GlobalVariable : public Constant), but they are not in fact constant, for a device with no workgroup-local memory.

So it is valid SPIR, as the specification stands, to manipulate __local variables as Constants in a way that is extremely difficult to undo. That is, in order to transform SPIR to code that can run on a CPU, the GlobalVariable (which is a subclass of Constant) must be replaced with a dynamically calculated Value (which is not a subclass of constant).

The GlobalVariable can be used in ConstantExprs (of which there are many valid), and converting ConstantExprs to their Instruction corrollaries is very difficult in the general case.

Cheers,

James

Carlos Sánchez de La Lama

unread,

Sep 28, 2012, 3:34:10 AM9/28/12

to James Molloy, llv...@cs.uiuc.edu, cfe...@cs.uiuc.edu, pocl-...@lists.sourceforge.net

Hi guys,

> So it is valid SPIR, as the specification stands, to manipulate __local
> variables as Constants in a way that is extremely difficult to undo. That
> is, in order to transform SPIR to code that can run on a CPU, the
> GlobalVariable (which is a subclass of Constant) must be replaced with a
> dynamically calculated Value (which is not a subclass of constant).

What about translating automatic locals to function scope pointers?
This will make handling of automatic locals and local pointer
arguments similar, which is desirable as they are just a way to
describe the same thing (I understand automatic locals as just a
simpler way to use local buffers than local arguments).

In fact, pocl converts automatic locals to implicit "extra" kernel
arguments and manages both cases the same way.

Carlos

Villmow, Micah

unread,

Sep 28, 2012, 11:48:39 AM9/28/12

to Carlos Sánchez de La Lama, James Molloy, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net

Carlos,
AMD's OpenCL implementation(both CPU and GPU) has worked for years with the way SPIR represents locals. If there is problems with the representation then it is an implementation issue. One of the issues with using extra kernel arguments is that it requires extra validation and complexity at the runtime level that is not needed if it is handled internally by the compiler. That being said, both ways of doing it are equally valid, but the choice of which way to do it is a implementation decision. I don't think it would be that difficult to lower global variables to function arguments given SPIR representation.

Micah

> -----Original Message-----
> From: Carlos Sánchez de La Lama [mailto:csanc...@gmail.com]
> Sent: Friday, September 28, 2012 12:34 AM
> To: James Molloy
> Cc: Pekka Jääskeläinen; Ouriel, Boaz; pocl-...@lists.sourceforge.net;
> Villmow, Micah; cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
> Subject: Re: [pocl-devel] [cfe-dev] [LLVMdev] SPIR provisional
> specification is now available in the Khronos website
>

James Molloy

unread,

Sep 28, 2012, 12:45:56 PM9/28/12

to Villmow, Micah, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu

Micah,

You're saying it works for you, but Clang doesn't currently anywhere near the range of horrible constantexpr constructs it is possible to create. You can "get by" at the moment with just handling ConstantGEPs, because of the way Clang works.

But SPIR isn't restricted to Clang, and the problem is that it is *possible* (although not probable, or nice, but that is irrelevant for corner conditions) to get valid SPIR that it is *very* difficult to get into a shape that you can code generate for CPUs.

Even the SAFECode snippet that Pekka noted doesn't even handle the case of ConstantShuffleVectors, for example.

You can easily simplify this problem with a restriction in SPIR: disallow ConstantExpr casts - no ptrtoint constant expression. Because GlobalVariables have pointer type, if you disallow converting their type to non-pointer type in a constantexpr, the number of constantexpr subclasses you have to deal with is drastically reduced (to essentially just BitCast and GEP).

That would be a simple, reasonable restriction that would stop potentially maliciously horrible test cases causing all CPU SPIR clients to write upwards of a hundred lines of conversion code.

Cheers,

James

Villmow, Micah

unread,

Sep 28, 2012, 12:56:40 PM9/28/12

to James Molloy, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu

James,

Thanks for the suggestion for how to make this case easier to handle. I'll bring this up to the entire working group in our next meeting.

Micah

Pekka Jääskeläinen

unread,

Sep 28, 2012, 4:16:50 PM9/28/12

to James Molloy, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, cfe...@cs.uiuc.edu

On 09/28/2012 07:45 PM, James Molloy wrote:
> That would be a simple, reasonable restriction that would stop potentially
> maliciously horrible test cases causing all CPU SPIR clients to write upwards of
> a hundred lines of conversion code.

Are you proposing to disallow the use of an IR instruction type to *possibly*
avoid problems from the (slight) misuse of another LLVM IR construct?

IMHO there should be a more robust solution that solves the misused construct
instead of just trying to "put out the fires it creates". E.g. some sort of
"thread local"-type of qualifier for the global which disallows certain
optimizations. A new linkage type perhaps? Someone more familiar with the
LLVM IR than me might have a better idea.

I understand that adding the automatic local as a kernel argument (in the specs)
is too intrusive now given the existing implementations that do it
otherwise, especially for those for which the semantics "happens" to be
correct as is. That is, the currently popular GPUs with separate
local/scratchpad memories.

Have a nice weekend,
--
--Pekka

Villmow, Micah

unread,

Sep 28, 2012, 5:00:17 PM9/28/12

to Pekka Jääskeläinen, James Molloy, cfe...@cs.uiuc.edu, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net

What would be ideal is to have the alloca instruction be able to allocate memory indifferent address spaces instead of only being in private.

Micah

> -----Original Message-----
> From: Pekka Jääskeläinen [mailto:pekka.jaa...@tut.fi]
> Sent: Friday, September 28, 2012 1:17 PM
> To: James Molloy
> Cc: Villmow, Micah; Carlos Sánchez de La Lama; Ouriel, Boaz; pocl-
> de...@lists.sourceforge.net; cfe...@cs.uiuc.edu; llv...@cs.uiuc.edu
> Subject: Re: [pocl-devel] [cfe-dev] [LLVMdev] SPIR provisional
> specification is now available in the Khronos website
>

Owen Anderson

unread,

Sep 28, 2012, 10:16:53 PM9/28/12

to James Molloy, cfe...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, llv...@cs.uiuc.edu

On Sep 28, 2012, at 9:45 AM, James Molloy <ja...@jamesmolloy.co.uk> wrote:

You can easily simplify this problem with a restriction in SPIR: disallow ConstantExpr casts - no ptrtoint constant expression. Because GlobalVariables have pointer type, if you disallow converting their type to non-pointer type in a constantexpr, the number of constantexpr subclasses you have to deal with is drastically reduced (to essentially just BitCast and GEP).

Wouldn't an easier solution just be not to represent them as constants in the first place? For instance, you could have a built-in function to get the address of local N, where N is taken as a parameter. You can call the builtins at the beginning of the kernel, and then proceed to use them as you wish without having to worry about reversing a constant folding later. Plus, if a given vendor's backend wants the address to get constant folded, it's easy to do replaceAllUsesWith of the call with a global, and run an appropriate constant folding pass.

--Owen

Pekka Jääskeläinen

unread,

Sep 29, 2012, 6:44:41 AM9/29/12

to Villmow, Micah, llv...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, James Molloy, cfe...@cs.uiuc.edu

On 09/29/2012 12:00 AM, Villmow, Micah wrote:
> What would be ideal is to have the alloca instruction be able to allocate
> memory indifferent address spaces instead of only being in private.

This sounds like the most sensible and "correct" proposal so far.

If one wants to reuse the local space allocation overhead across multiple
WGs executing in the same thread, one would just need to find all the
allocas to the local AS and convert them, e.g., to function arguments that
are allocated by the WG launcher only once. And when creating a "multi-WI WG
function", ensure the local allocas are shared across the WIs.

It sounds like a working and robust approach to me, although, of course, it
requires the LLVM IR update for the alloca instruction. I have no idea how
drastic update it would be.

James Molloy

unread,

Sep 29, 2012, 8:30:19 AM9/29/12

to Owen Anderson, cfe...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, llv...@cs.uiuc.edu

Yes, it would.

But I was concerned Micah was just going to write it off as an implementation detail, so I felt that I should offer a "less correct but less work" option for him to consider.

Cheers,

James

Villmow, Micah

unread,

Oct 1, 2012, 12:36:12 PM10/1/12

to James Molloy, Owen Anderson, cfe...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, llv...@cs.uiuc.edu

Maybe it would be easier to provide a bitcode example of this problem.

After thinking about this more, I’m not sure if this is applicable to SPIR itself. For you to have a constant GEP expression, you have to know the pointer size in order to correctly generate the expression. Since the pointer size itself is not known, I don’t yet see how you can generate a constant expression that is valid SPIR.

Micah

James Molloy

unread,

Oct 1, 2012, 1:18:43 PM10/1/12

to Villmow, Micah, cfe...@cs.uiuc.edu, pocl-...@lists.sourceforge.net, llv...@cs.uiuc.edu

> After thinking about this more, I’m not sure if this is applicable to SPIR itself. For you to have a constant GEP expression, you have to know the pointer size in order to correctly generate the expression. Since the pointer size itself is not known, I don’t yet see how you can generate a constant expression that is valid SPIR.

GEP is defined relative to (indepenently of) pointer size.

Reply all

Reply to author

Forward