the purpose of these patches is to reduce the kernel's .text size, in particular if CONFIG_CC_OPTIMIZE_FOR_SIZE is specified. The effect of the patches on x86 is:
i.e. a 5.3% .text reduction (!) with a larger .config, and a 1.2% .text reduction with a smaller .config.
i've also done test-builds with CC_OPTIMIZE_FOR_SIZE disabled:
text data bss dec hex filename 4080998 870384 387260 5338642 517612 vmlinux-speed-orig 4084421 872024 387260 5343705 5189d9 vmlinux-speed-inline 4010957 834048 387748 5232753 4fd871 vmlinux-speed-inline+units
so the more flexible inlining did not result in many changes [which is good, we want gcc to inline those in the optimized-for-speed case], but unit-at-a-time optimization resulted in smaller code - very likely meaning speed advantages as well.
unit-at-a-time still increases the kernel stack footprint somewhat (by about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree gcc3 used to, which prompted the original -fno-unit-at-a-time addition.
so i think the combination of the two patches is a win both for small and for large systems. In fact the 5.3% .text reduction for embedded kernels is very significant.
the patches are against -git, and were test-built and test-booted on x86, using gcc 4.0.2.
Why do you mix the two up? I'd assume they are independent, and if they aren't, please explain why?
The forced inlining is not just a good idea. Several versions of gcc would NOT COMPILE the kernel without it. The fact that it works with your configurations and your particular compiler version has absolutely ZERO relevance.
Gcc has had horrible mistakes in inlining functions. Inlining too much, and quite often, not inlining things that absolutely _have_ to be inlined. Trivial things that inline to an instruction or two, but that look complicated because they have a big switch-statement that just happens to be known at compile-time.
And not inlining them not only results in horribly bad code (dynamic tests for something that should be static), but also results in link errors when cases that should be statically unreachable suddenly become reachable after all.
So the fact that your gcc-4.x version happens to get things right for your case in no way means that you can do this in general.
Also, the inlining patch apparently makes code larger in some cases, so it's not even a unconditional win.
What's the effect of _just_ the "unit-at-a-time" thing which we can (and you did) much more easily make gcc-version-dependent?
> The forced inlining is not just a good idea. Several versions of gcc would > NOT COMPILE the kernel without it.
yup that's why the patch only does it for gcc4, in which the inlining heuristics finally got rewritten to something that seems to resemble sanity...
> Also, the inlining patch apparently makes code larger in some cases, so > it's not even a unconditional win.
... as long as you give the inlining algorithm enough information. -fno-unit-at-a-time prevents gcc from having the information, and the decisions it makes are then less optimal...
(unit-at-a-time allows gcc to look at the entire .c file, eg things like number of callers etc etc, disabling that tells gcc to do the .c file as single pass top-to-bottom only)
> yup that's why the patch only does it for gcc4, in which the inlining > heuristics finally got rewritten to something that seems to resemble > sanity...
Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 being a real stinker.
> > Also, the inlining patch apparently makes code larger in some cases, > > so it's not even a unconditional win.
> .... as long as you give the inlining algorithm enough information. > -fno-unit-at-a-time prevents gcc from having the information, and the > decisions it makes are then less optimal...
> (unit-at-a-time allows gcc to look at the entire .c file, eg things like > number of callers etc etc, disabling that tells gcc to do the .c file as > single pass top-to-bottom only)
I'd still prefer to see numbers with -funit-at-a-time only. I think it's an independent knob, and I'd be much less worried about that, because we do know that unit-at-a-time has been enabled on x86-64 for a long time ("forever"). So that's less of a change, I feel.
On Wed, 2005-12-28 at 13:02 -0800, Linus Torvalds wrote:
> On Wed, 28 Dec 2005, Arjan van de Ven wrote:
> > yup that's why the patch only does it for gcc4, in which the inlining > > heuristics finally got rewritten to something that seems to resemble > > sanity...
> Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 > being a real stinker.
it is... if you disable unit-at-a-time for sure. But I'm not entirely sure when this got in, if it was 4.0 or 4.1
> > (unit-at-a-time allows gcc to look at the entire .c file, eg things like > > number of callers etc etc, disabling that tells gcc to do the .c file as > > single pass top-to-bottom only)
> I'd still prefer to see numbers with -funit-at-a-time only. I think it's > an independent knob, and I'd be much less worried about that, because we > do know that unit-at-a-time has been enabled on x86-64 for a long time > ("forever"). So that's less of a change, I feel.
the only effect I expect is more inlining actually, since we on the one hand tie gcc's hands via the forced inline, and one the other hand now give it more room to inline more. But yeah it's worth to look at for sure, even if it is to see it's getting bigger ;)
> > yup that's why the patch only does it for gcc4, in which the inlining > > heuristics finally got rewritten to something that seems to resemble > > sanity...
> Is that actually true of all gcc4 versions? I seem to remember gcc-4.0 > being a real stinker.
all my tests were with gcc 4.0.2.
> > > Also, the inlining patch apparently makes code larger in some cases, > > > so it's not even a unconditional win.
> > .... as long as you give the inlining algorithm enough information. > > -fno-unit-at-a-time prevents gcc from having the information, and the > > decisions it makes are then less optimal...
> > (unit-at-a-time allows gcc to look at the entire .c file, eg things like > > number of callers etc etc, disabling that tells gcc to do the .c file as > > single pass top-to-bottom only)
> I'd still prefer to see numbers with -funit-at-a-time only. I think > it's an independent knob, and I'd be much less worried about that, > because we do know that unit-at-a-time has been enabled on x86-64 for > a long time ("forever"). So that's less of a change, I feel.
the two patches are completely independent, and the only reason i did them together was because i was looking at .text size in general and these were the two things that made a difference. Also, the inlining was a loss in one of the .config's, unless combined with the wider-scope unit-at-a-time optimization.
(there's a third thing that i was also playing with, -ffunction-sections and -fdata-sections, but those dont seem to be reliable on the binutils side yet.)
here are the isolated unit-at-a-time numbers as well:
so both inlining and unit-at-a-time is a win independently [although inlining alone does bloat .data], but applied together they bring an additional 1.6% of .text savings. All builds done with:
gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)
how about giving the inlining stuff some more exposure in -mm (if it's fine with Andrew), to check for any regressions? I'd suggest the same for the unit-at-a-time thing too, in any case.
> (there's a third thing that i was also playing with, -ffunction-sections > and -fdata-sections, but those dont seem to be reliable on the binutils > side yet.)
> here are the isolated unit-at-a-time numbers as well:
> so both inlining and unit-at-a-time is a win independently [although > inlining alone does bloat .data], but applied together they bring an > additional 1.6% of .text savings. All builds done with:
> gcc version 4.0.2 20051109 (Red Hat 4.0.2-6)
> how about giving the inlining stuff some more exposure in -mm (if it's > fine with Andrew), to check for any regressions? I'd suggest the same > for the unit-at-a-time thing too, in any case.
another thing: i wanted to decrease the size of -Os (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to keep the icache footprint down).
I think gcc should arguably not be forced to inline things when doing -Os, and it's also expected to mess up much less than when optimizing for speed. So maybe forced inlining should be dependent on !CONFIG_CC_OPTIMIZE_FOR_SIZE?
I.e. like the patch below?
Ingo
-----------------> Subject: allow gcc4 to control inlining
allow gcc4 compilers to decide what to inline and what not - instead of the kernel forcing gcc to inline all the time.
Signed-off-by: Ingo Molnar <mi...@elte.hu> Signed-off-by: Arjan van de Ven <ar...@infradead.org> ----
Ingo Molnar <mi...@elte.hu> writes: >> gcc version 4.0.2 20051109 (Red Hat 4.0.2-6) > another thing: i wanted to decrease the size of -Os > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to > keep the icache footprint down).
Remember the above gcc miscompiles the x86-32 kernel with -Os:
> how about giving the inlining stuff some more exposure in -mm (if it' s > fine with Andrew), to check for any regressions? I'd suggest the same > for the unit-at-a-time thing too, in any case.
I am willing to give a try to the patches on both ia32 and ppc (which i s what I have at hand). I'm using Debian testing, but I can, perhaps, giv e GCC 4.1 a shot (if I happen to grab my hands on such patched tree soon enough).
I am interested in anything that could bring me memory reduction. Actually, I am even considering using the -tiny patches here on my father's computer---an old Pentium MMX 200MHz with 64MB of RAM.
Also, the PowerMac 9500 that I have here was inherited from my uncle an d it has a slow SCSI disk (only 2MB/s of transfer rates) and 192MB of RAM . Anything that makes it avoid hitting swap is a plus, as you can imagine .
> I think gcc should arguably not be forced to inline things when doing > -Os, and it's also expected to mess up much less than when optimizing > for speed. So maybe forced inlining should be dependent on > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
When it comes to inlining I just don't trust gcc as far as I can spit it. We're putting the kernel at the mercy of future random brainfarts and bugs from the gcc guys. It would be better and safer IMO to continue to force `inline' to have strict and sane semamtics, and to simply be vigilant about our use of it.
IOW: I'd prefer that we be the ones who specify which functions are going to be inlined and which ones are not.
If no-forced-inlining makes the kernel smaller then we probably have (yet more) incorrect inlining. We should hunt those down and fix them. We did quite a lot of this in 2.5.x/2.6.early. Didn't someone have a script which would identify which functions are a candidate for uninlining? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
> the purpose of these patches is to reduce the kernel's .text size, in > particular if CONFIG_CC_OPTIMIZE_FOR_SIZE is specified. The effect of > the patches on x86 is:
> text data bss dec hex filename > 3286166 869852 387260 4543278 45532e vmlinux-orig > 3194123 955168 387260 4536551 4538e7 vmlinux-inline >...
The most interesting question is: Which object files do these savings come from
We have two cases in the kernel: - header files where forced inlining is required - C files where forced inlining is nearly always wrong
The classical example are functions some marked as "inline" when they where tiny and had one caller, but now are huge and have many callers.
An interesting number would be the space saving after doing some kind of s/inline//g in all .c files.
> unit-at-a-time still increases the kernel stack footprint somewhat (by > about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree > gcc3 used to, which prompted the original -fno-unit-at-a-time addition. >...
Please hold off this patch.
I do already plan to look at this after the smoke has cleared after the 4k stacks issue. I want to avoid two different knobs both with negative effects on stack usage (currently CONFIG_4KSTACKS=y, and after your patch gcc >= 4.0) giving a low testing coverage of the worst cases.
> Ingo
cu Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
> > I think gcc should arguably not be forced to inline things when doing > > -Os, and it's also expected to mess up much less than when optimizing > > for speed. So maybe forced inlining should be dependent on > > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
> When it comes to inlining I just don't trust gcc as far as I can spit > it. We're putting the kernel at the mercy of future random brainfarts > and bugs from the gcc guys. It would be better and safer IMO to > continue to force `inline' to have strict and sane semamtics, and to > simply be vigilant about our use of it.
i think there's quite an attitude here - we are at the mercy of "gcc brainfarts" anyway, and users are at the mercy of "kernel brainfarts" just as much. Should users disable swapping and trash-talk it just because the Linux kernel used to have a poor VM? (And the gcc folks are certainly listening - it's not like they were unwilling to fix stuff, they simply had their own decade-old technological legacies that made certain seemingly random problems much harder to attack. E.g. -Os has recently been improved quite significantly in to-be-gcc-4.2.)
at least let us allow gcc do it in the CONFIG_CC_OPTIMIZE_FOR_SIZE case, -Os means "optimize for space" - no ifs and when, it's a _very_ clear and definite goal. I dont think there's much space for gcc to mess up there, it's a mostly binary decision: either the inlining of a particular function saves space, or not.
in the other case, when optimizing for speed, the decisions are alot less clear, and gcc has arguably alot more leeway to mess up.
also, there's a fundamental conflict of 'speed vs. performance' here, for a certain boundary region. For the extremes, very small and very large functions, the decision is clear, but if e.g. a CPU has tons of cache, it might prefer more agressive inlining than if it doesnt. So it's not like we can do it in a fully static manner.
> If no-forced-inlining makes the kernel smaller then we probably have > (yet more) incorrect inlining. We should hunt those down and fix them. > We did quite a lot of this in 2.5.x/2.6.early. Didn't someone have a > script which would identify which functions are a candidate for > uninlining?
this is going to be a never ending battle, and it's not about peanuts either: we are talking about 5% of .text space here, on a .config that carries most of the important subsystems and drivers. Do we really want to take on this battle and fight it for 30,000+ kernel functions - when gcc today can arguably do a _better_ job than what we attempted to do manually for years? We went to great trouble going to BK just to make development easier - shouldnt we let a fully open-source tool like gcc make our lives easier and not worry about details like that? Whether to inline or not _is_ a mostly thoughtless work with almost zero intellect in it. I'd rather trust gcc do it than some script doing the same much worse.
> > another thing: i wanted to decrease the size of -Os > > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to > > keep the icache footprint down).
> Remember the above gcc miscompiles the x86-32 kernel with -Os:
i'm not sure what the point is. There was no sudden rush of -Os related bugs when Fedora switched to it for the kernel, and the 35% code-size savings were certainly worth it in terms of icache footprint. Yes, -Os is a major change for how the compiler works, and the kernel is a major piece of software.
> IOW: I'd prefer that we be the ones who specify which functions are going > to be inlined and which ones are not.
a bold statement... especially since the "and which ones are not" isn't currently there, we still leave gcc a lot of freedom there ... but only in one direction.
> > unit-at-a-time still increases the kernel stack footprint somewhat (by > > about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree > > gcc3 used to, which prompted the original -fno-unit-at-a-time addition. > >...
> Please hold off this patch.
> I do already plan to look at this after the smoke has cleared after > the 4k stacks issue. I want to avoid two different knobs both with > negative effects on stack usage (currently CONFIG_4KSTACKS=y, and > after your patch gcc >= 4.0) giving a low testing coverage of the > worst cases.
this is obviously not 2.6.15 stuff, so we've got enough time to see the effects. [ And what does "I do plan to look at this" mean? When precisely, and can i thus go to other topics without the issue being dropped on the floor indefinitely? ]
also note that the inlining patch actually _reduces_ average stack footprint by ~3-4%: orig +inlining # of functions above 256 bytes: 683 660 total stackspace, bytes: 148492 142884
it is the unit-at-a-time patch that increases stack footprint (by about 7-8%, which together with the inlining patch gives a net ~5%).
On Thu, Dec 29, 2005 at 08:41:07AM +0100, Ingo Molnar wrote:
> > * Krzysztof Halasa <k...@pm.waw.pl> wrote: > > > Ingo Molnar <mi...@elte.hu> writes: > > > > >> gcc version 4.0.2 20051109 (Red Hat 4.0.2-6) > > > > > another thing: i wanted to decrease the size of -Os > > > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to > > > keep the icache footprint down). > > > > Remember the above gcc miscompiles the x86-32 kernel with -Os: > > > > https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=173764 > > i'm not sure what the point is. There was no sudden rush of -Os related > bugs when Fedora switched to it for the kernel, and the 35% code-size > savings were certainly worth it in terms of icache footprint. Yes, -Os > is a major change for how the compiler works, and the kernel is a major > piece of software.
> > > unit-at-a-time still increases the kernel stack footprint somewhat (by > > > about 5% in the CC_OPTIMIZE_FOR_SIZE case), but not by the insane degree > > > gcc3 used to, which prompted the original -fno-unit-at-a-time addition. > > >...
> > Please hold off this patch.
> > I do already plan to look at this after the smoke has cleared after > > the 4k stacks issue. I want to avoid two different knobs both with > > negative effects on stack usage (currently CONFIG_4KSTACKS=y, and > > after your patch gcc >= 4.0) giving a low testing coverage of the > > worst cases.
> this is obviously not 2.6.15 stuff, so we've got enough time to see the > effects. [ And what does "I do plan to look at this" mean? When > precisely, and can i thus go to other topics without the issue being > dropped on the floor indefinitely? ]
It won't be dropped on the floor indefinitely.
"I do plan to look at this" means that I'd currently estimate this being 2.6.19 stuff.
Yes that's one year from now, but we need it properly analyzed and tested before getting it into Linus' tree, and I do really want it untangled from and therefore after 4k stacks.
> also note that the inlining patch actually _reduces_ average stack > footprint by ~3-4%: > orig +inlining > # of functions above 256 bytes: 683 660 > total stackspace, bytes: 148492 142884
> it is the unit-at-a-time patch that increases stack footprint (by about > 7-8%, which together with the inlining patch gives a net ~5%).
The problem with the stack is that average stack usage is relatively uninteresting - what matters is the worst case stack usage. And I'd expect the stack footprint improvements you see with less inlining in different places than the deteriorations with unit-at-a-time.
> Ingo
cu Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
> another thing: i wanted to decrease the size of -Os > (CONFIG_CC_OPTIMIZE_FOR_SIZE) kernels, which e.g. Fedora uses too (to > keep the icache footprint down).
> I think gcc should arguably not be forced to inline things when doing > -Os, and it's also expected to mess up much less than when optimizing > for speed. So maybe forced inlining should be dependent on > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
I don't care too much whether we put always_inline or inline at the function we _really_ want to inline. But all others shouldn't have any inline marker. So instead of changing the pretty useful redefinitions we have to keep the code a little more readable what about getting rid of all the stupid inlines we have over the place? I think many things we have static inline in headers now should move to proper out of line functions. This is more work, but also more useful than just flipping a bit.
> I don't care too much whether we put always_inline or inline at the function > we _really_ want to inline. But all others shouldn't have any inline marker. > So instead of changing the pretty useful redefinitions we have to keep the > code a little more readable what about getting rid of all the stupid inlines > we have over the place?
just in drivers/ there are well over 6400 of those. Changing most of those is going to be a huge effort. The reality is, most driver writers (in fact kernel code writers) tend to overestimate the gain of inline in THEIR code, and to underestimate the cumulative cost of it. Despite what akpm says, I think gcc can make a better judgement than most of these authors (probably including me :). We can remove 6400 now, but a year from now, another 1000 have been added back again I bet.
You describe a nice utopia where only the most essential functions are inlined.. but so far that hasn't worked out all that well ;) Turning "inline" back into the hint to the compiler that the C language makes it is maybe a cop-out, but it's a sustainable approach at least.
> I think many things we have static inline in headers > now should move to proper out of line functions.
I suspect the biggest gains aren't the ones in the headers; those tend to be quite small and often mostly optimize away due to constant arguments (there may be a few exceptions of course), and also have been attacked by various people in the 2.5/2.6 series before. It's the local functions that got too many "inline" hints.
Ingo Molnar <mi...@elte.hu> wrote: > * Andrew Morton <a...@osdl.org> wrote: > > Ingo Molnar <mi...@elte.hu> wrote: > > > I think gcc should arguably not be forced to inline things when doing > > > -Os, and it's also expected to mess up much less than when optimizing > > > for speed. So maybe forced inlining should be dependent on > > > !CONFIG_CC_OPTIMIZE_FOR_SIZE? > > When it comes to inlining I just don't trust gcc as far as I can spit > > it. We're putting the kernel at the mercy of future random brainfarts > > and bugs from the gcc guys. It would be better and safer IMO to > > continue to force `inline' to have strict and sane semamtics, and to > > simply be vigilant about our use of it. > i think there's quite an attitude here - we are at the mercy of "gcc > brainfarts" anyway, and users are at the mercy of "kernel brainfarts" > just as much. Should users disable swapping and trash-talk it just > because the Linux kernel used to have a poor VM? (And the gcc folks are > certainly listening - it's not like they were unwilling to fix stuff, > they simply had their own decade-old technological legacies that made > certain seemingly random problems much harder to attack. E.g. -Os has > recently been improved quite significantly in to-be-gcc-4.2.)
Also, we do trust gcc not to screw up on lots of other stuff. I.e., we trust it to use registers wisely (register anyone?), to set up sane counting loops and related array handling (noone is using pointers to traverse arrays "for speed" anymore), and to select the best code sequence for the machine at hand in lots of cases, ... And not only for the kernel, for the whole userspace too!
Sure, this is a large change, and it might be warranted to place it under CONFIG_NEW_COMPILER_OPTIONS (Marked experimental, high explosive, etc if it makes you too uneasy). -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
> > IOW: I'd prefer that we be the ones who specify which functions are going > > to be inlined and which ones are not. > a bold statement... especially since the "and which ones are not" isn't > currently there, we still leave gcc a lot of freedom there ... but only > in one direction.
Besides, this is currently an everywhere or nowhere switch. gcc (in principle at least) could decide which calls to inline and for which ones it isn't worth it. Just like the (also long to die) "register" keyword. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513
On Thu, Dec 29, 2005 at 03:54:09PM +0100, Arjan van de Ven wrote:
> > I don't care too much whether we put always_inline or inline at the function > > we _really_ want to inline. But all others shouldn't have any inline marker. > > So instead of changing the pretty useful redefinitions we have to keep the > > code a little more readable what about getting rid of all the stupid inlines > > we have over the place?
> just in drivers/ there are well over 6400 of those. Changing most of > those is going to be a huge effort. The reality is, most driver writers > (in fact kernel code writers) tend to overestimate the gain of inline in > THEIR code, and to underestimate the cumulative cost of it. Despite what > akpm says, I think gcc can make a better judgement than most of these > authors (probably including me :). We can remove 6400 now, but a year > from now, another 1000 have been added back again I bet.
Are we that bad reviewing code?
An "inline" in a .c file is simply nearly always wrong in the kernel, and unless the author has a good justification for it it should be removed.
> You describe a nice utopia where only the most essential functions are > inlined.. but so far that hasn't worked out all that well ;) Turning > "inline" back into the hint to the compiler that the C language makes it > is maybe a cop-out, but it's a sustainable approach at least. >...
But shouldn't nowadays gcc be able to know best even without an "inline" hint?
cu Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
> > You describe a nice utopia where only the most essential functions are > > inlined.. but so far that hasn't worked out all that well ;) Turning > > "inline" back into the hint to the compiler that the C language makes it > > is maybe a cop-out, but it's a sustainable approach at least. > >...
> But shouldn't nowadays gcc be able to know best even without an "inline" > hint?
it will, the inline hint only affects the thresholds so it's not entirely without effects, but I can imagine that there are cases that truely are performance critical and can be optimized out and where you don't want to help gcc a bit (say a one line wrapper around readl or writel). Otoh I suspect that modern gcc will be more than smart enough and inline one liners anyway (if they're static of course).
> > > I think gcc should arguably not be forced to inline things when doing > > > -Os, and it's also expected to mess up much less than when optimizing > > > for speed. So maybe forced inlining should be dependent on > > > !CONFIG_CC_OPTIMIZE_FOR_SIZE?
> > When it comes to inlining I just don't trust gcc as far as I can spit > > it. We're putting the kernel at the mercy of future random brainfarts > > and bugs from the gcc guys. It would be better and safer IMO to > > continue to force `inline' to have strict and sane semamtics, and to > > simply be vigilant about our use of it.
> i think there's quite an attitude here - we are at the mercy of "gcc > brainfarts" anyway, and users are at the mercy of "kernel brainfarts" > just as much. Should users disable swapping and trash-talk it just > because the Linux kernel used to have a poor VM? (And the gcc folks are > certainly listening - it's not like they were unwilling to fix stuff, > they simply had their own decade-old technological legacies that made > certain seemingly random problems much harder to attack. E.g. -Os has > recently been improved quite significantly in to-be-gcc-4.2.) >... > also, there's a fundamental conflict of 'speed vs. performance' here, > for a certain boundary region. For the extremes, very small and very > large functions, the decision is clear, but if e.g. a CPU has tons of > cache, it might prefer more agressive inlining than if it doesnt. So > it's not like we can do it in a fully static manner. >...
I'd formulate it the other way round as Andrew:
We should force gcc to inline code where we do know best ("static inline"s in header files) and leave the decision to gcc in the cases where gcc should know best controlled by some high-level knobs like -Os/-O2.
gcc simply needs to be forced to inline in some cases in which we really need inlining, but in all other cases gcc knows best and we can trust gcc to make the right decision.
> Ingo
cu Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out of the darkness. There had been need of rain for many days. "Only a promise," Lao Er said. Pearl S. Buck - Dragon Seed
On Thu, Dec 29, 2005 at 04:35:29PM +0100, Adrian Bunk wrote: > > You describe a nice utopia where only the most essential functions are > > inlined.. but so far that hasn't worked out all that well ;) Turning > > "inline" back into the hint to the compiler that the C language makes it > > is maybe a cop-out, but it's a sustainable approach at least. > >...
> But shouldn't nowadays gcc be able to know best even without an "inline" > hint?
Only for static functions (and in -funit-at-a-time mode). Anything else would require full IMA over the whole kernel and we aren't there yet. So inline hints are useful. But most of the inline keywords in the kernel really should be that, hints, because e.g. while it can be beneficial to inline something on one arch, it may be not beneficial on another arch, depending on cache sizes, number of general registers available to the compiler, register preassure, speed of the call/ret pair, calling convention and many other factors.