processor specific optimizations in .NET Framework

1 view
Skip to first unread message

Joachim Kaufmann

unread,
Oct 11, 2003, 8:27:07 PM10/11/03
to
Hi all,

is it possible to do processor specific optimizations when you're writing
code for .NET Framework? Or do you have to rely entirely on the clr?

Joachim


Jon Skeet [C# MVP]

unread,
Oct 12, 2003, 3:05:58 AM10/12/03
to
Joachim Kaufmann <joachim...@hotmail.com> wrote:
> is it possible to do processor specific optimizations when you're writing
> code for .NET Framework? Or do you have to rely entirely on the clr?

You have to rely on the JIT doing the optimisation for you.

--
Jon Skeet - <sk...@pobox.com>
http://www.pobox.com/~skeet
If replying to the group, please do not mail me too

Joachim Kaufmann

unread,
Oct 12, 2003, 1:37:33 PM10/12/03
to
Does that mean when new instructions like SSE2, the upcoming SSE3 or AMD64
arent't supported by the .NET Framework you can't use them in your software?

Many games or apps like 3D Max are optimized for specific architectures like
Pentium 4. Through optimizations you can gain very much performance.

Will this be completely impossible in the future? Because in my
understanding Microsoft wants to have all 3rd party apps managed. Do you
expect that in 3 or 4 years most of the new games will be written in managed
code?

Joachim

"Jon Skeet [C# MVP]" <sk...@pobox.com> schrieb im Newsbeitrag
news:MPG.19f30dace...@msnews.microsoft.com...

Jon Skeet [C# MVP]

unread,
Oct 12, 2003, 1:50:19 PM10/12/03
to
Joachim Kaufmann <joachim...@hotmail.com> wrote:
> Does that mean when new instructions like SSE2, the upcoming SSE3 or AMD64
> arent't supported by the .NET Framework you can't use them in your software?

Yup - but look at it this way: it means then new instructions are
introduced and then they *are* supported by the .NET framework, you get
the performance benefit absolutely free, without any work at all,
including recompiling.



> Many games or apps like 3D Max are optimized for specific architectures like
> Pentium 4. Through optimizations you can gain very much performance.

Yes - but those games won't take advantage of SSE3 or AMD64 either,
will they? By the time applications using those optimisations are
widely available, MS are likely to have an update to the framework to
support them as well, I expect. Furthermore, because the JIT can
compile the code appropriately for the particular processor, apps don't
need to be optimised for a specific architecture at the possible
expense of *other* architectures.

> Will this be completely impossible in the future?

Obviously I can't say for sure, but I'd expect so.

> Because in my
> understanding Microsoft wants to have all 3rd party apps managed. Do you
> expect that in 3 or 4 years most of the new games will be written in managed
> code?

I suspect games will take longer, if many are ever written in a managed
way. Things like garbage collection are a pain for games - but I'm sure
there will be quite a few, with managed DirectX and the like.

Niall

unread,
Oct 12, 2003, 8:06:34 PM10/12/03
to
I'm a bit sceptical about the JIT's ability to get good use out of things
such as SSE2. I know that currently it only uses it for a couple of things,
but I presume in the future, MS will put more effort into using such
instructions in more cases. However, a lot of the time, using SIMD
instructions requires you to rework your algorithms to line up similar
operations. I'll be curious to see how well the JIT can rearrange your code
to do this, because it's often a nontrivial rework. Relying on the JIT to do
the work means you may end up needing to rework your own code in such a way
as to prod the JIT into realising there is an opportunity for these
instructions.

Niall

"Jon Skeet [C# MVP]" <sk...@pobox.com> wrote in message
news:MPG.19f3a4b22...@msnews.microsoft.com...

Daniel O'Connell

unread,
Oct 12, 2003, 8:21:04 PM10/12/03
to

"Niall" <as...@me.com> wrote in message
news:OpwJP3Rk...@TK2MSFTNGP12.phx.gbl...

> I'm a bit sceptical about the JIT's ability to get good use out of things
> such as SSE2. I know that currently it only uses it for a couple of
things,
> but I presume in the future, MS will put more effort into using such
> instructions in more cases. However, a lot of the time, using SIMD
> instructions requires you to rework your algorithms to line up similar
> operations. I'll be curious to see how well the JIT can rearrange your
code
> to do this, because it's often a nontrivial rework. Relying on the JIT to
do
> the work means you may end up needing to rework your own code in such a
way
> as to prod the JIT into realising there is an opportunity for these
> instructions.
>

That brings up portability vs performance. If your code is important enough,
performance wise, to merit writing code for all possible available
instruction sets, then you probably shouldn't be writing it in a high level
language. In that case using native code via MC++ would probably be a better
idea, assuming native inlining is possible(I'm not sure on that point).
That is not to say, however, that language\runtime features that can help
you provide code that will JIT into a more proper instruction sequence
wouldn't be of use.

Ori Gershony [MSFT]

unread,
Oct 13, 2003, 8:03:13 PM10/13/03
to

To put this in perspective: in the case of native code when you want to use
processor-specific optimizations, you generally need to code them in
assembly. This is still possible with the .Net Framework (using pinvoke),
but at the cost of portability. However, the .Net Framework also has the
ability to generate processor-specific code for your app, depending on the
type of processor on the machine. This is of course not possible with
native code, where code generation takes place before you ship.

Some processor-specific optimizations that are already used in RTM and
Everett include using cmov for conditional execution and fcomip to speed up
floating point comparions. Future releases will add to this list.

Both the .Net Framework and native code benefit from libraries that have
been updated to take advantage of processor-specific optimizations.

-- Ori.


--------------------
>From: "Daniel O'Connell" <onyxkirx@--NOSPAM--comcast.net>
>Newsgroups: microsoft.public.dotnet.framework.performance
>References: <uOf0NeFk...@tk2msftngp13.phx.gbl>
<MPG.19f30dace...@msnews.microsoft.com>
<e3GcCeOk...@tk2msftngp13.phx.gbl>
<MPG.19f3a4b22...@msnews.microsoft.com>
<OpwJP3Rk...@TK2MSFTNGP12.phx.gbl>
>Subject: Re: processor specific optimizations in .NET Framework
>Lines: 79
>X-Priority: 3
>X-MSMail-Priority: Normal
>X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165
>Message-ID: <Qdmib.548833$Oz4.483005@rwcrnsc54>
>NNTP-Posting-Host: 12.251.163.157
>X-Complaints-To: ab...@comcast.net
>X-Trace: rwcrnsc54 1066004464 12.251.163.157 (Mon, 13 Oct 2003 00:21:04
GMT)
>NNTP-Posting-Date: Mon, 13 Oct 2003 00:21:04 GMT
>Organization: Comcast Online
>Date: Mon, 13 Oct 2003 00:21:04 GMT
>Path:
cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!news-out.cwix.com!newsfeed.cwix.co
m!prodigy.com!in.100proofnews.com!in.100proofnews.com!attla2!ip.att.net!attb
i_feed3!attbi.com!rwcrnsc54.POSTED!not-for-mail
>Xref: cpmsftngxa06.phx.gbl
microsoft.public.dotnet.framework.performance:5486
>X-Tomcat-NG: microsoft.public.dotnet.framework.performance


--

This posting is provided "AS IS" with no warranties, and confers no rights.
Use of included script samples are subject to the terms specified at
http://www.microsoft.com/info/cpyright.htm

Note: For the benefit of the community-at-large, all responses to this
message are best directed to the newsgroup/thread from which they
originated.

Niall

unread,
Oct 13, 2003, 9:26:48 PM10/13/03
to
An exception to this is the SSE & MMX primitives library that you can use
from C++, which seemed quite handy, though I never got around to putting
them to decent use. I wasn't intending to say that the Framework won't have
better use of processor specific instructions, as I have heard that it will
a few times. I just meant that it still seems to be in its infancy, and
doesn't seem to be used in highly computationally intensive areas such as
arithmetic calculations as yet. Certainly it would be a bonus for your same
framework code to suddenly be running on SSE / 3DNow etc one day, though
this goes back to my original query about whether the JIT will be
intelligent enough to draw together disparate operations for streaming where
possible.

Niall

"Ori Gershony [MSFT]" <Ori_Ge...@online.microsoft.com> wrote in message
news:qHICUaek...@cpmsftngxa06.phx.gbl...

Ori Gershony [MSFT]

unread,
Oct 14, 2003, 2:19:37 PM10/14/03
to
That's a good point. I don't think that adding MMX & SSE intrinsics to IL
is the right answer, though, because that would go against the goal of
making it portable. Instead you should write your inner loops with C++
intrinsics or x86 assembly, and pinvoke to them.

In future releases the JIT will use some SSE2 instructions to speed up
certain operations (float to int, initializaing memory, etc.). Writing a
full vectorizing compiler, though, is a much more difficult problem. There
is research in this area, but I don't know of anything planned in the near
term.

Thanks!

-- Ori.

--------------------
>From: "Niall" <as...@me.com>


>References: <uOf0NeFk...@tk2msftngp13.phx.gbl>
<MPG.19f30dace...@msnews.microsoft.com>
<e3GcCeOk...@tk2msftngp13.phx.gbl>
<MPG.19f3a4b22...@msnews.microsoft.com>

<OpwJP3Rk...@TK2MSFTNGP12.phx.gbl> <Qdmib.548833$Oz4.483005@rwcrnsc54>
<qHICUaek...@cpmsftngxa06.phx.gbl>


>Subject: Re: processor specific optimizations in .NET Framework

>Date: Tue, 14 Oct 2003 11:26:48 +1000
>Lines: 177


>X-Priority: 3
>X-MSMail-Priority: Normal
>X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165

>Message-ID: <#w4lwIfk...@TK2MSFTNGP12.phx.gbl>
>Newsgroups: microsoft.public.dotnet.framework.performance
>NNTP-Posting-Host: ip-64-215-221-9.agcx.net 64.215.221.9
>Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP12.phx.gbl
>Xref: cpmsftngxa06.phx.gbl
microsoft.public.dotnet.framework.performance:5498
>X-Tomcat-NG: microsoft.public.dotnet.framework.performance

Joachim Kaufmann

unread,
Oct 14, 2003, 3:13:53 PM10/14/03
to
Great the somebody of Microsoft posts here in this newsgroup! Do you expect
games coming to the .net Framework in the near future? Because managed code
is one of the big topics for Longhorn. Thank you.

Joachim

"Ori Gershony [MSFT]" <Ori_Ge...@online.microsoft.com> schrieb im
Newsbeitrag news:Dyrp99nk...@cpmsftngxa06.phx.gbl...

Ori Gershony [MSFT]

unread,
Oct 14, 2003, 9:42:07 PM10/14/03
to

People are looking at writing managed games, but there are various issues
with performance and GC that you need to be aware of. For more information
you can start at the site for managed DirectX:
http://www.gotdotnet.com/team/directx/ (this site has various interesting
links that you could follow).

David Notario also has an informative summary of these issues in his BLOG:
http://www.xplsv.com/blogs/devdiary/2003_07_01_archive.html#1058658127517803
70

David also reads this newgroups and is a great resource for game-related
questions.

Thanks!

-- Ori.

--------------------
>From: "Joachim Kaufmann" <joachim...@hotmail.com>


>References: <uOf0NeFk...@tk2msftngp13.phx.gbl>
<MPG.19f30dace...@msnews.microsoft.com>
<e3GcCeOk...@tk2msftngp13.phx.gbl>
<MPG.19f3a4b22...@msnews.microsoft.com>
<OpwJP3Rk...@TK2MSFTNGP12.phx.gbl> <Qdmib.548833$Oz4.483005@rwcrnsc54>
<qHICUaek...@cpmsftngxa06.phx.gbl>

<#w4lwIfk...@TK2MSFTNGP12.phx.gbl>
<Dyrp99nk...@cpmsftngxa06.phx.gbl>


>Subject: Re: processor specific optimizations in .NET Framework

>Date: Tue, 14 Oct 2003 21:13:53 +0200
>Lines: 272


>X-Priority: 3
>X-MSMail-Priority: Normal
>X-Newsreader: Microsoft Outlook Express 6.00.2800.1158
>X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1165

>Message-ID: <uJVkQdok...@TK2MSFTNGP11.phx.gbl>
>Newsgroups: microsoft.public.dotnet.framework.performance
>NNTP-Posting-Host: p50804848.dip0.t-ipconnect.de 80.128.72.72
>Path: cpmsftngxa06.phx.gbl!TK2MSFTNGP08.phx.gbl!TK2MSFTNGP11.phx.gbl
>Xref: cpmsftngxa06.phx.gbl
microsoft.public.dotnet.framework.performance:5506
>X-Tomcat-NG: microsoft.public.dotnet.framework.performance

Michael Giagnocavo [MVP]

unread,
Oct 15, 2003, 12:34:59 PM10/15/03
to
Maybe some kind of attributes or "hints" could be added to code to indicate
to the JIT compiler that if it's running on a certain platform, it should
try to use certain platform-specific instructions. I don't see anything
wrong with letting people write non-portable code if they so desire. For
instance, suppose there was a Intel.Jit (or Jit.Intel, but perhaps outside
of System) namespace with classes to do processor specific things that the
JIT compiler would need to be aware of, coders would just make sure they are
running on the right processor before taking that codepath. Then it could
be written all in managed code.

-mike
MVP

"Ori Gershony [MSFT]" <Ori_Ge...@online.microsoft.com> wrote in message

news:Dyrp99nk...@cpmsftngxa06.phx.gbl...

David Notario

unread,
Oct 16, 2003, 1:19:59 AM10/16/03
to
We already ship some of these SSE2 optimizations in VS.NET 2003 (notably
double/float to int), which is a typical performance problem if

As Ori says, your best bet today is PInvoking to your critical assembly
code. Or best yet, Use MC++ and make that part of the code native (easier
than C# Pinvoke). You won't be 100% portable, but you should be close.

--
David Notario
Software Design Engineer - CLR JIT Compiler

http://xplsv.com/blogs/devdiary/


"Ori Gershony [MSFT]" <Ori_Ge...@online.microsoft.com> wrote in message

news:Dyrp99nk...@cpmsftngxa06.phx.gbl...

David Notario

unread,
Oct 16, 2003, 1:23:55 AM10/16/03
to
You are correct, it's a difficult problem. It's even more difficult if you
have much more limited time/space constraints to do the JITting. Let's say
we would love to have the perfect JIT compiler, but our resources are also
limited and currently we're dedicating them to other things we think will
benefit customers more.

--
David Notario
Software Design Engineer - CLR JIT Compiler
http://xplsv.com/blogs/devdiary/

"Niall" <as...@me.com> wrote in message
news:OpwJP3Rk...@TK2MSFTNGP12.phx.gbl...

Niall

unread,
Oct 16, 2003, 3:06:08 AM10/16/03
to
Yes, I imagine the time constraint on the JIT must cut short a fair few
things that could be done to improve code performance. Is there anyway to
ask the JIT to be more agressive, at the cost of a longer JIT time, if there
is some area of the code that is high use? Does NGen do more agressive
optimisation?

Niall

"David Notario" <dnot...@online.microsoft.com> wrote in message
news:uFRAyW6k...@TK2MSFTNGP11.phx.gbl...

David Notario [MSFT]

unread,
Oct 16, 2003, 11:57:25 AM10/16/03
to
No, currently there isn't a way of doing that. NGen at the moment just uses
the normal JIT.

--
David Notario
Software Design Engineer, CLR JIT Compiler
http://devdiary.xplsv.com

"Niall" <as...@me.com> wrote in message

news:OUB$wP7kDH...@TK2MSFTNGP09.phx.gbl...

Martin Maat [EBL]

unread,
Feb 5, 2004, 12:35:43 PM2/5/04
to
I just found this group and caught up on a couple of interesting threads. I
have something to add here that seems unsufficiently addressed.

"Michael Giagnocavo [MVP]" <mggU...@Atrevido.net> wrote in message
news:%23fIRdwz...@tk2msftngp13.phx.gbl...

> Maybe some kind of attributes or "hints" could be added to code to
indicate
> to the JIT compiler that if it's running on a certain platform, it should
> try to use certain platform-specific instructions. I don't see anything
> wrong with letting people write non-portable code if they so desire.

This leans towards the initial question and I think it is wrong, both as an
ambition and as a way to be effective. In order to have any JIT compiler
use a fancy feature you must have an entity in your high level code (one or
more C# classes in our case) that map fairly well to the low level hardware
you want to be utilized. For instance, in C# you write

a = b + c

and you know it will map to some sort of ADD operation available in any CPU.
This is obvious. But if you have a processor that is capable of transforming
an 32 x 32 matrix of 64 bit floating point values in one cycle, you cannot
expect any JIT to recognize your nested set of C# loops that performes this
operation and to map that to filling these registers and pulling the handle.
Your algorith will be followed and eventually the primitive arithmetic
operations of the processor will be used regardless the nifty feature.

If you want the matrix transformation hardware to be used in the managed
world you would need an API that takes matrices as input parameters. IL
would have to be matrix-aware too. Only then it would be possible for the
JIT to use the hardware functionality if it were available on the platform.
It would then be up to the JIT of the platform that does not have hardware
support for this feature to output conventional native code that does the
job (slower but still).


Having a hardware specific namespace in the .NET library is both undesirable
and ineffective. As long as IL is unaware on a feature level, there will be
no way you will get the super-duper hardware to be utilized. Not by playing
fair anyhow, which is doing things the managed way all the way. I cannot
imagine "Managed DirectX" for instance "playing fair" in this respect, I
suspect this to be only half-managed. Sure it does garbage collection but
communication with video hardware will be direct rather than managed.

As processors get more powerful features, IL will see extensions to support
this hardware in a generic way allowing the different JIT compilers to map
to their specific platforms. This is the way we want it, we don't need
Intel, AMD or nVIDIA namespaces.

Martin.


Reply all
Reply to author
Forward
0 new messages