Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

performance fluctuations due to x86 Decoded Stream Buffer

72 views
Skip to first unread message

Bruce Hoult

unread,
Nov 22, 2016, 7:56:27 AM11/22/16
to
Here's something I haven't seen talked about here before, but it may be affecting many of our attempts at code tuning.

http://schd.ws/hosted_files/llvmdevelopersmeetingbay2016/d9/LLVM-Conference-US-2016-Code-Alignment.pdf

Short version:

Since Sandy Bridge decoded uops have been stored in a new cache, to avoid decode rate limiting loops.

The cache has:

- 32 sets, of
- 8 ways/set, of
- 6 uops/way

* all uops in a way come from instructions in the same 32 byte aligned window
* max of 3 ways (18 uops) per 32B window ... more will cause thrashing.
* two or more branches with the same target in a single instruction window may cause poor branch prediction. If they have the same target then the last byte of each conditional branch instruction should be in a different 16 byte aligned block

Omer Paparo Bivas of Intel Israel is currently making a pass for llvm which looks for problems caused by this and inserts NOPs to prevent them.

See http://lists.llvm.org/pipermail/llvm-dev/2016-November/107223.html
and following conversation.

Noob

unread,
Nov 22, 2016, 8:20:57 AM11/22/16
to
Thanks for the interesting links.

I have mixed feelings about LLVM. It's good to see livelier R&D in the
open source compiler design field. Competition gave GCC the swift kick
in the butt it needed.

But what if LLVM grabs most of the mind-share, and GCC starts to flounder?
Then we're stuck with a permissive-licensed compiler, and it's the
Unix wars all over again.

Regards.

Rick C. Hodgin

unread,
Nov 22, 2016, 8:33:24 AM11/22/16
to
I am working on a replacement compiler platform for GCC and LLVM. I
would like to ask if anybody would like to come on board and help out?

It's called RDC for Rapid Development Compiler. It's designed to be
a framework for creating compilers within. My first compiler will be
a hybrid C/C++ language I call CAlive (injects new life in the C
language, and removes so much bulk of C++, plus new features).

CAlive
https://groups.google.com/forum/#!forum/caliveprogramminglanguage

I plan to have the RDC framework completed by the end of next year.
It would go faster and be more robust if others would help me.

-----
I am creating this offering as a God-fearing alternative to non-God-
fearing companies and entities which are creating products for their
own purposes. In this effort I'm pursuing, it is an offering given
back to the Lord for Him having first given us our skills:

http://www.libsf.org/indexmain.html

And yes, I am serious. :-) It's only a recognition of the fundamental
reality that we are all created beings made explicitly for a purpose.
I acknowledge this, recognize what it means, and then move it to the
forefront of my life so I can give back to Him a portion of that which
He first gave me. It is me acknowledging Him in my life, relying upon
Him to provide for my needs while giving me also these opportunities
to serve and honor Him in this way.

I would encourage each of you to do the same. You can learn about
this philosophy in this video beginning around 30 minutes in:

A video I made about Visual FreePro in 2012 describing the
goals of that project, which have since expanded into the full
software and hardware stack you've seen me post about here:

http://www.visual-freepro.org/videos/2012_12_08__01_vvmmc__and_vfrps_relationship_to_christianity.ogv

If you are unable to view the video, use VLC (http://www.videolan.org).

Best regards,
Rick C. Hodgin

Bruce Hoult

unread,
Nov 22, 2016, 8:40:58 AM11/22/16
to
gcc doesn't seem in any imminent danger.

gcc still often generates slightly faster code on even the most popular targets, such as x86 and ARM. It's usually only single-digit percent, but it's there.

gcc also officially supports ... it looks .., about 47 targets, while LLVM only supports about 15. I can't see anyone ever going and creating LLVM support for PDP11, VAX, or 68000 (still officially supported by gcc), let alone things suzh as z80 or 6809 which have existing but not officially supported ports.

Personally I prefer permissive licensing. It is precisely LLVM's permissive licensing -- and structure as a dynamicly loadable library -- which has propelled it to a huge number of under the radar uses in everything from GPU drivers to virtual machines for Java or C# or Javascript. There are probably people working on half a dozen projects using LLVM within 50 meters of my desk. I'm working on an LLVM-based driver for a new GPU architecture myself.

Bruce Hoult

unread,
Nov 22, 2016, 8:43:26 AM11/22/16
to
Not in a million years.

Rick C. Hodgin

unread,
Nov 22, 2016, 8:48:36 AM11/22/16
to
On Tuesday, November 22, 2016 at 8:40:58 AM UTC-5, Bruce Hoult wrote:
> gcc doesn't seem in any imminent danger.
>
> gcc still often generates slightly faster code on even the most popular
> targets, such as x86 and ARM. It's usually only single-digit percent,
> but it's there.

I was very impressed with how far Microsoft's compiler development has
improved since 2010.

On this benchmark:

http://pastebin.com/VKu0RCiM

The time improved from an optimized build in Visual Studio 2010 of
2.343 seconds on my Intel 2.90 GHz Core i7 laptop (i7-3520M) down
to 1.441 seconds. Even the code used in my mandel_custom() block
was improved from 2.066 seconds down to 1.767.

I'd be curious to see how much more Terje could optimize my custom
FPU code for that mandel_custom() function (using the FPU). I've
always been impressed at the optimizations he finds. A very wide-
in-machine-thinking mind.

See here for the list of results in VS 2010, 2012, 2015.

https://groups.google.com/d/msg/comp.lang.c/QnIdGWh_J6Q/RVfU-IzMBgAJ

Rick C. Hodgin

unread,
Nov 22, 2016, 8:50:51 AM11/22/16
to
On Tuesday, November 22, 2016 at 8:48:36 AM UTC-5, Rick C. Hodgin wrote:
> On Tuesday, November 22, 2016 at 8:40:58 AM UTC-5, Bruce Hoult wrote:
> > gcc doesn't seem in any imminent danger.
> >
> > gcc still often generates slightly faster code on even the most popular
> > targets, such as x86 and ARM. It's usually only single-digit percent,
> > but it's there.
>
> I was very impressed with how far Microsoft's compiler development has
> improved since 2010.
>
> On this benchmark:
>
> http://pastebin.com/VKu0RCiM
>
> The time improved from an optimized build in Visual Studio 2010 of
> 2.343 seconds on my Intel 2.90 GHz Core i7 laptop (i7-3520M) down
> to 1.441 seconds. Even the code used in my mandel_custom() block
> was improved from 2.066 seconds down to 1.767.
>
> I'd be curious to see how much more Terje could optimize my custom
> FPU code for that mandel_custom() function (using the FPU). I've
> always been impressed at the optimizations he finds. A very wide-
> in-machine-thinking mind.

I have considered one optimization, and that's to assign the two
value first and leave it in the stack, thereby saving a load and
a store on each iteration.

Mike Stump

unread,
Nov 24, 2016, 1:00:02 PM11/24/16
to
In article <o11gmj$epe$1...@dont-email.me>, Noob <ro...@127.0.0.1> wrote:
>But what if LLVM grabs most of the mind-share, and GCC starts to flounder?
>Then we're stuck with a permissive-licensed compiler, and it's the
>Unix wars all over again.

The name of the game is called compete or die. Very similar to the
hardware game of the same name. Don't worry about the wars, they will
sort themselves out.
0 new messages