We recently upgraded a number of applications from LLVM 3.5.2 (old JIT)
to LLVM 3.7.1 (MCJit).
We made the minimum changes needed for the switch (no changes to the IR
generated or the IR optimizations applied).
The resulting code pass all tests (8000+).
However the runtime performance dropped significantly: 30% to 40% for
all applications.
The applications I am talking about optimize airline rosters and
pairings. LLVM is used for compiling high level business rules to
efficient machine code.
A typical optimization run takes 6 to 8 hours. So a 30% to 40% reduction
in speed has real impact (=> we can't upgrade from 3.5.2).
We have triple checked and reviewed the changes we made from old JIT to
MCJIt. We also tried different ways to optimize the IR.
However all results indicate that the performance drop happens in the
(black box) IR to machine code stage.
So my question is if the runtime performance reduction is known/expected
for MCJit vs. old JIT? Or if we might be doing something wrong?
If you need more information, in order to understand the issue, please
tell us so that we can provide you with more details.
Thanks
Morten
_______________________________________________
LLVM Developers mailing list
llvm...@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-dev
And maybe the register allocator? Are you using the greedy one or the linear one? Are there any other MI-level optimizations running?
-Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
It seems quite likely to help. Please do.
-Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
We are using the default register allocator. I assume the greedy one is
default?
As for other target machine optimizations:
I have tried:
llvm::TargetMachine* tm = ...;
tm->setOptLevel(llvm::CodeGenOpt::Aggressive);
And it doesn't make much of a difference.
And also:
tm->setFastISel(true);
(previous email).
Is there anything else I can try?
Cheers,
Rafael
On 4 February 2016 at 22:26, Morten Brodersen via llvm-dev
From your previous e-mail, it seems like this is a case of too little optimization, not too much, right?
Are you creating a TargetTransformInfo object for your target?
CodeGenPasses->add(
createTargetTransformInfoWrapperPass(TM->getTargetIRAnalysis())
I assume you're dominated by integer computation, not floating-point, is that correct?
-Hal
--
Hal Finkel
Assistant Computational Scientist
Leadership Computing Facility
Argonne National Laboratory
Not easily (llc).
Is there a way to make MCJit not use the large code model when JIT'ing?
Cheers
Morten
I think Davide started adding support for the small code model.
Cheers,
Rafael
We have basically two use cases: (1) offline queued batch processing on a computation farm, in which a 50% hit in compilation time (seconds to minutes of CPU time) is not a big deal compared to the many hours of time for a full render (and which even a SLIGHT improvement in runtime of the resulting JITed code makes up for it); but also (2) interactive use in front of a human, where the JIT time is experienced as waiting around for something to happen (mostly at the beginning of the run, when they are antsy to see the first results show up on screen), and having that suddenly get 50% slower is a really big deal.
This is quite different than something like clang, where longer compilation time may annoy developers (or not, they like their coffee breaks) but would never be noticed by end users. Our users wait for the JIT every time they use the software.
I can see that the MCJIT takes much longer than old JIT, but I'm afraid I never profiled it or investigated specifically why this is the case. For unrelated reasons, my users have largely been unable to switch their toolchains to C++11 up until now, so they were also stuck on LLVM 3.4 and thus the need to figure out what was up with MCJIT was not a high priority for me. But now that the switch to C++11 is afoot this year, unlocking more recent LLVM releases to me, MCJIT is on my radar again, so it's the perfect time for this topic to get revived.
-- lg
> On Feb 4, 2016, at 11:37 PM, Keno Fischer via llvm-dev <llvm...@lists.llvm.org> wrote:
>
> Actually, reading over all of this again, I realize I may have made the
> wrong statement. The runtime regressions we see in julia are actually
> regressions in how long LLVM itself takes to do the compilation (but since
> it happens at run time in the JIT case, I think of it as a regression in
> our running time). We have only noticed occasional regressions in the
> performance of the generated code (which we are in the process of fixing).
> Which kind of regression are you talking about, time taken by LLVM or time
> taken by the LLVM-generated code?
>
--
Larry Gritz
l...@larrygritz.com
With LLVM 3.7, We have noticed that the MemCpy pass will attempt to copy LLVM struct using moves that are as large as possible. For example, a struct of 3 floats is copied using a 64-bit and a 32-bit move. It is therefore important that such a struct be aligned on 8-byte boundary, not just 4 bytes! Else, one runs the risk of triggering store-forwarding failure pipelining stalls (which we did encountered really badly with one of our internal performance benchmark). It is therefore important that the SROA pass correctly eliminates the load/store to the alloca memory regions.
Benoit Belley
Sr Principal Developer
M&E-Product Development Group
MAIN +1 514 393 1616
DIRECT +1 438 448 6304
FAX +1 514 393 0110
Autodesk, Inc.
10 Duke Street
Montreal, Quebec, Canada H3C 2L7
![]()
Did we change what you had to do to set the CPU at some point, or am I
misremembering that? If the old JIT automatically went for
-march=native but the new one doesn't that could explain the slowdown.
Tim.