Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[patch 00/17] Text Edit Lock and Immediate Values for 2.6.25-rc8-mm1

2 views
Skip to first unread message

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:22:50 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell
Hi Andrew,

Here is the complete patchset required to get basic Immediate Values support
ported to 2.6.25-rc8-mm1. It provides the "simple", non nmi-safe version of
immediate values, which means that immediate values should not be used in code
paths reachable by NMI or MCE handlers. This version also uses
stop_machine_run() for immediate values updates, which is a very heavy lock. In
order to make incremental, easy to review, changes, I think we could start by
this "simple" version as a first step before we switch to the nmi-safe version
later.

It applies at the end of your series files in the following order :

# The following patches are required for any kind of immediate values
# implementation
kprobes-use-mutex-for-insn-pages.patch
kprobes-dont-use-kprobes-mutex-in-arch-code.patch
kprobes-declare-kprobes-mutex-static.patch

x86-enhance-debug-rodata-support-alternatives.patch
fix-text-poke-for-vmalloced-pages.patch
x86-enhance-debug-rodata-support-for-hotplug-and-kprobes.patch

text-edit-lock-architecture-independent-code.patch
text-edit-lock-kprobes-architecture-independent-support.patch

# The following patches provide non nmi-safe immediate values
add-all-cpus-option-to-stop-machine-run.patch
immediate-values-architecture-independent-code.patch
implement-immediate-update-via-stop-machine-run.patch
immediate-values-kconfig-menu-in-embedded.patch
immediate-values-x86-optimization.patch
add-text-poke-and-sync-core-to-powerpc.patch
immediate-values-powerpc-optimization.patch
immediate-values-documentation.patch

#Those are the immediate values users
scheduler-profiling-use-immediate-values.patch

Thanks,

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:23:04 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Ananth N Mavinakayanahalli, anil.s.kes...@intel.com, da...@davemloft.net
kprobes-dont-use-kprobes-mutex-in-arch-code.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:23:31 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
immediate-values-kconfig-menu-in-embedded.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:23:43 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
scheduler-profiling-use-immediate-values.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:24:00 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
immediate-values-documentation.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:24:23 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
immediate-values-architecture-independent-code.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:24:39 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Christoph Hellwig, Paul Mackerras, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
add-text-poke-and-sync-core-to-powerpc.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:24:53 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Christoph Hellwig, Paul Mackerras, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
immediate-values-powerpc-optimization.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:25:14 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, page...@freemail.hu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge
x86-enhance-debug-rodata-support-alternatives.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:26:08 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, page...@freemail.hu, Thomas Gleixner, Ingo Molnar, H. Peter Anvin, Jeremy Fitzhardinge
x86-enhance-debug-rodata-support-for-hotplug-and-kprobes.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:26:36 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
add-all-cpus-option-to-stop-machine-run.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:26:47 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Mathieu Desnoyers, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
implement-immediate-update-via-stop-machine-run.patch

Mathieu Desnoyers

unread,
Apr 9, 2008, 12:27:10 PM4/9/08
to ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Mathieu Desnoyers, Andi Kleen, H. Peter Anvin, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
immediate-values-x86-optimization.patch

Alexey Dobriyan

unread,
Apr 9, 2008, 2:15:01 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
Please, stop Cc'ing me on this.

I continue to think that all this so-called "immediate" infrastructure
is an absolute overkill and declared D-cache line savings will _never_
matter and will never be measured on real life workloads. Right now you
claim 1 (one) cacheline saving in schedule() which can be trivially
moved under CONFIG_PROFILING umbrella. On the other hand is more ugly
assembler code. Sorry, people add infra with much better good/bad ratios.

Also, my gut feeling of a guy who is also on receiveing end of bugreports
at SWsoft that line with .text games was crossed by SMP alternatives and
fooprobes. The rest won't matter because programs like firefox will fuck
up L1, L2 caches in one blow just by showing animation.

We don't need bugs when immediate update screwed up fighting for
cacheline.

And bugs when screwup happened out of the window of "Code:" printout, so
you won't have a chance to compare relevant vmlinux .text and printout.

And bugs when CPU executed bullshit and silently rebooted.

And so on.

Alexey, more and more liking OpenBSD
where such games won't even
hit mailing lists

H. Peter Anvin

unread,
Apr 9, 2008, 2:16:40 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
Mathieu Desnoyers wrote:
> Ok, so the most flexible solution that I see, that should fit for both
> x86 and x86_64 would be :
> 1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
> x86-64 it is equivalent to "r" class (for 8-bit
> instructions that do not use upper halves).
> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is in a
> general register.

Any reason to keep carrying this completely misleading comment chunk still?

-hpa

Andi Kleen

unread,
Apr 9, 2008, 2:20:37 PM4/9/08
to Alexey Dobriyan, Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
On Wed, Apr 09, 2008 at 10:10:10PM +0400, Alexey Dobriyan wrote:
> I continue to think that all this so-called "immediate" infrastructure
> is an absolute overkill and declared D-cache line savings will _never_
> matter

You are quite wrong on that.

> and will never be measured on real life workloads. Right now you
> claim 1 (one) cacheline saving in schedule() which can be trivially

That is just an example. Once the infrastructure is in a lot more
flags would move it into it. I think eventually most sysctls
should be immediate values for once.

> Also, my gut feeling of a guy who is also on receiveing end of bugreports
> at SWsoft that line with .text games was crossed by SMP alternatives and

You already lost -- Linux regularly rewrites itself. Ok not quite yet
but self modifying code is already wide spread and happens commonly
(e.g. with alternatives and some other cases)

>
> And bugs when CPU executed bullshit and silently rebooted.

So far nobody has seen that and the probably of it actually happening
is rather remote too.


> And so on.
>
> Alexey, more and more liking OpenBSD
> where such games won't even
> hit mailing lists

Maybe that is why Linux scales to large systems and OpenBSD
doesn't ...

-Andi

Mathieu Desnoyers

unread,
Apr 9, 2008, 2:55:10 PM4/9/08
to Alexey Dobriyan, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
* Alexey Dobriyan (adob...@gmail.com) wrote:
> Please, stop Cc'ing me on this.
>

Fixed.

> I continue to think that all this so-called "immediate" infrastructure
> is an absolute overkill and declared D-cache line savings will _never_
> matter and will never be measured on real life workloads. Right now you
> claim 1 (one) cacheline saving in schedule() which can be trivially
> moved under CONFIG_PROFILING umbrella. On the other hand is more ugly
> assembler code. Sorry, people add infra with much better good/bad ratios.
>
> Also, my gut feeling of a guy who is also on receiveing end of bugreports
> at SWsoft that line with .text games was crossed by SMP alternatives and
> fooprobes. The rest won't matter because programs like firefox will fuck
> up L1, L2 caches in one blow just by showing animation.
>

I did not know the only thing we should ever care about as kernel
developers was Firefox ? ;)

I find it quite amusing that at one end of the spectrum we have Ingo
fearing about a few bytes added to the scheduler object even when they
are never loaded in cache, and that at the complete other extreme we
find people arguing that cache lines used by the kernel does not matter
because this and this userspace app is-oh-so-bloated.

Am I the only one interested in finally getting a tracer infrastructure
in the Linux kernel ? (yes, this is the original objective behind
immediate values by the way)

Mathieu

> We don't need bugs when immediate update screwed up fighting for
> cacheline.
>
> And bugs when screwup happened out of the window of "Code:" printout, so
> you won't have a chance to compare relevant vmlinux .text and printout.
>
> And bugs when CPU executed bullshit and silently rebooted.
>
> And so on.
>
> Alexey, more and more liking OpenBSD
> where such games won't even
> hit mailing lists
>

--

Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Mathieu Desnoyers

unread,
Apr 9, 2008, 3:08:31 PM4/9/08
to H. Peter Anvin, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
* H. Peter Anvin (h...@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Ok, so the most flexible solution that I see, that should fit for both
>> x86 and x86_64 would be :
>> 1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
>> x86-64 it is equivalent to "r" class (for 8-bit
>> instructions that do not use upper halves).
>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is
>> in a
>> general register.
>
> Any reason to keep carrying this completely misleading comment chunk still?
>
> -hpa

This comment explains why I use the =q constraint for the 1 bytes
immediate value. It makes sure we use an instruction with 1-byte opcode,
without REX.R prefix, on x86_64.

That's required for the NMI-safe version of the immediate values, which
uses a breakpoint, but not for this version based on stop_machine_run().
However, to minimize the amount of changes between the two versions, I
left the =q constraint, which is more restrictive. Is it worth it to use
=r instead ? It will typically let the compiler use a wider range of
registers on x86_64.

Thanks,

Mathieu


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Masami Hiramatsu

unread,
Apr 9, 2008, 4:10:58 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Ananth N Mavinakayanahalli, anil.s.kes...@intel.com, da...@davemloft.net
Mathieu Desnoyers wrote:
> Remove the kprobes mutex from kprobes.h, since it does not belong there. Also
> remove all use of this mutex in the architecture specific code, replacing it by
> a proper mutex lock/unlock in the architecture agnostic code.
>
> Changelog :
> - remove unnecessary kprobe_mutex around arch_remove_kprobe()
>
> Signed-off-by: Mathieu Desnoyers <mathieu....@polymtl.ca>
> Acked-by: Ananth N Mavinakayanahalli <ana...@in.ibm.com>
> CC: anil.s.kes...@intel.com
> CC: da...@davemloft.net

Acked-by: Masami Hiramatsu <mhir...@redhat.com>

Thanks,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhir...@redhat.com

Mathieu Desnoyers

unread,
Apr 9, 2008, 4:21:27 PM4/9/08
to H. Peter Anvin, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
* H. Peter Anvin (h...@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Ok, so the most flexible solution that I see, that should fit for both
>> x86 and x86_64 would be :
>> 1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
>> x86-64 it is equivalent to "r" class (for 8-bit
>> instructions that do not use upper halves).
>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is
>> in a
>> general register.
>
> Any reason to keep carrying this completely misleading comment chunk still?
>
> -hpa

Hrm, since even the nmi-safe version supports REX-prefixed instructions,
there is no need for an =q constraint on single-byte immediate values
anymore. (thanks to your "discard" section used in the nmi-safe version)

Here is the updated patch for the "[patch 13/17] Immediate Values - x86
Optimization". Thanks!

Mathieu

Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Changelog:
- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).
- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.
- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.

Signed-off-by: Mathieu Desnoyers <mathieu....@polymtl.ca>
CC: Andi Kleen <a...@muc.de>
CC: "H. Peter Anvin" <h...@zytor.com>
CC: Chuck Ebbert <ceb...@redhat.com>
CC: Christoph Hellwig <h...@infradead.org>
CC: Jeremy Fitzhardinge <jer...@goop.org>
CC: Thomas Gleixner <tg...@linutronix.de>
CC: Ingo Molnar <mi...@redhat.com>
CC: Rusty Russell <ru...@rustcorp.com.au>
CC: Adrian Bunk <bu...@stusta.de>
CC: ak...@osdl.org
---
arch/x86/Kconfig | 1
include/asm-x86/immediate.h | 77 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 78 insertions(+)

Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- /dev/null 1970-01-01 00:00:00.000000000 +0000
+++ linux-2.6-lttng/include/asm-x86/immediate.h 2008-04-09 15:02:34.000000000 -0400
@@ -0,0 +1,67 @@
+#ifndef _ASM_X86_IMMEDIATE_H
+#define _ASM_X86_IMMEDIATE_H
+
+/*
+ * Immediate values. x86 architecture optimizations.
+ *
+ * (C) Copyright 2006 Mathieu Desnoyers <mathieu....@polymtl.ca>
+ *
+ * This file is released under the GPLv2.
+ * See the file COPYING for more details.
+ */
+
+#include <asm/asm.h>
+
+/**
+ * imv_read - read immediate variable
+ * @name: immediate value name
+ *
+ * Reads the value of @name.
+ * Optimized version of the immediate.
+ * Do not use in __init and __exit functions. Use _imv_read() instead.
+ * If size is bigger than the architecture long size, fall back on a memory
+ * read.
+ *
+ * Make sure to populate the initial static 64 bits opcode with a value
+ * what will generate an instruction with 8 bytes immediate value (not the REX.W
+ * prefixed one that loads a sign extended 32 bits immediate value in a r64
+ * register).
+ */
+#define imv_read(name) \
+ ({ \
+ __typeof__(name##__imv) value; \
+ BUILD_BUG_ON(sizeof(value) > 8); \
+ switch (sizeof(value)) { \
+ case 1: \
+ case 2: \
+ case 4: \
+ asm(".section __imv,\"a\",@progbits\n\t" \
+ _ASM_PTR "%c1, (3f)-%c2\n\t" \
+ ".byte %c2\n\t" \
+ ".previous\n\t" \
+ "mov $0,%0\n\t" \
+ "3:\n\t" \
+ : "=r" (value) \
+ : "i" (&name##__imv), \
+ "i" (sizeof(value))); \
+ break; \
+ case 8: \
+ if (sizeof(long) < 8) { \
+ value = name##__imv; \
+ break; \
+ } \
+ asm(".section __imv,\"a\",@progbits\n\t" \
+ _ASM_PTR "%c1, (3f)-%c2\n\t" \
+ ".byte %c2\n\t" \
+ ".previous\n\t" \
+ "mov $0xFEFEFEFE01010101,%0\n\t" \
+ "3:\n\t" \
+ : "=r" (value) \
+ : "i" (&name##__imv), \
+ "i" (sizeof(value))); \
+ break; \
+ }; \
+ value; \
+ })
+
+#endif /* _ASM_X86_IMMEDIATE_H */
Index: linux-2.6-lttng/arch/x86/Kconfig
===================================================================
--- linux-2.6-lttng.orig/arch/x86/Kconfig 2008-04-09 11:04:58.000000000 -0400
+++ linux-2.6-lttng/arch/x86/Kconfig 2008-04-09 15:00:01.000000000 -0400
@@ -23,6 +23,7 @@ config X86
select HAVE_KPROBES
select HAVE_KRETPROBES
select HAVE_KVM if ((X86_32 && !X86_VOYAGER && !X86_VISWS && !X86_NUMAQ) || X86_64)
+ select HAVE_IMMEDIATE


config GENERIC_LOCKBREAK


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

H. Peter Anvin

unread,
Apr 9, 2008, 6:48:32 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
Mathieu Desnoyers wrote:
> * H. Peter Anvin (h...@zytor.com) wrote:
>> Mathieu Desnoyers wrote:
>>> Ok, so the most flexible solution that I see, that should fit for both
>>> x86 and x86_64 would be :
>>> 1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
>>> x86-64 it is equivalent to "r" class (for 8-bit
>>> instructions that do not use upper halves).
>>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is
>>> in a
>>> general register.
>> Any reason to keep carrying this completely misleading comment chunk still?
>>
>> -hpa
>
> This comment explains why I use the =q constraint for the 1 bytes
> immediate value. It makes sure we use an instruction with 1-byte opcode,
> without REX.R prefix, on x86_64.

No, it doesn't. That would be "=Q".

-hpa

H. Peter Anvin

unread,
Apr 9, 2008, 6:48:46 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
Mathieu Desnoyers wrote:
>
> Hrm, since even the nmi-safe version supports REX-prefixed instructions,
> there is no need for an =q constraint on single-byte immediate values
> anymore. (thanks to your "discard" section used in the nmi-safe version)
>

WRONG!

Using =r for single-byte values is incorrect for 32-bit code -- that
would permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.
That is not the incorrect bit, it's the description that is confused.

-hpa

Mathieu Desnoyers

unread,
Apr 9, 2008, 7:21:03 PM4/9/08
to H. Peter Anvin, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
* H. Peter Anvin (h...@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> Hrm, since even the nmi-safe version supports REX-prefixed instructions,
>> there is no need for an =q constraint on single-byte immediate values
>> anymore. (thanks to your "discard" section used in the nmi-safe version)
>
> WRONG!
>
> Using =r for single-byte values is incorrect for 32-bit code -- that would
> permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode. That is not
> the incorrect bit, it's the description that is confused.
>
> -hpa

Ah, right. I remembered there was something that made us use =q, but
could not remember what. I'll describe it correctly.

Therefore, this updated patch is bogus. The original one was ok, given
that we change the header to :


Immediate Values - x86 Optimization

x86 optimization of the immediate values which uses a movl with code patching
to set/unset the value used to populate the register used as variable source.

Note : a movb needs to get its value froma =q constraint.

Quoting "H. Peter Anvin" <h...@zytor.com>

Using =r for single-byte values is incorrect for 32-bit code -- that would
permit %spl, %bpl, %sil, %dil which are illegal in 32-bit mode.

Changelog:


- Use text_poke_early with cr0 WP save/restore to patch the bypass. We are doing
non atomic writes to a code region only touched by us (nobody can execute it
since we are protected by the imv_mutex).

- Put imv_set and _imv_set in the architecture independent header.


- Use $0 instead of %2 with (0) operand.
- Add x86_64 support, ready for i386+x86_64 -> x86 merge.
- Use asm-x86/asm.h.
- Bugfix : 8 bytes 64 bits immediate value was declared as "4 bytes" in the
immediate structure.

- Change the immediate.c update code to support variable length opcodes.


- Vastly simplified, using a busy looping IPI with interrupts disabled.
Does not protect against NMI nor MCE.
- Pack the __imv section. Use smallest types required for size (char).
- Use imv_* instead of immediate_*.


Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Mathieu Desnoyers

unread,
Apr 9, 2008, 8:42:35 PM4/9/08
to H. Peter Anvin, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
* H. Peter Anvin (h...@zytor.com) wrote:
> Mathieu Desnoyers wrote:
>> * H. Peter Anvin (h...@zytor.com) wrote:
>>> Mathieu Desnoyers wrote:
>>>> Ok, so the most flexible solution that I see, that should fit for both
>>>> x86 and x86_64 would be :
>>>> 1 byte : "=q" : "a", "b", "c", or "d" register for the i386. For
>>>> x86-64 it is equivalent to "r" class (for 8-bit
>>>> instructions that do not use upper halves).
>>>> 2, 4, 8 bytes : "=r" : A register operand is allowed provided that it is
>>>> in a
>>>> general register.
>>> Any reason to keep carrying this completely misleading comment chunk
>>> still?
>>>
>>> -hpa
>> This comment explains why I use the =q constraint for the 1 bytes
>> immediate value. It makes sure we use an instruction with 1-byte opcode,
>> without REX.R prefix, on x86_64.
>
> No, it doesn't. That would be "=Q".
>
> -hpa

Ok. Sorry, it's been a few months since we looked at this. So the =q
opcode lets the compiler choose instructions with or without REX prefix.
We can allow this because

- We don't need the opcode length in the stop_machine_run() version
- we support variable length opcode in the nmi-safe version

Am I remembering correctly now ?

Mathieu

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

H. Peter Anvin

unread,
Apr 9, 2008, 9:01:32 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Andi Kleen, Chuck Ebbert, Christoph Hellwig, Jeremy Fitzhardinge, Thomas Gleixner, Ingo Molnar, Adrian Bunk, Alexey Dobriyan, ak...@osdl.org
Mathieu Desnoyers wrote:
>
> Ok. Sorry, it's been a few months since we looked at this. So the =q
> opcode lets the compiler choose instructions with or without REX prefix.
> We can allow this because
>
> - We don't need the opcode length in the stop_machine_run() version
> - we support variable length opcode in the nmi-safe version
>
> Am I remembering correctly now ?
>

Yes. Both =r and =q allow REX opcodes.

-hpa

Rusty Russell

unread,
Apr 9, 2008, 11:25:12 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
On Thursday 10 April 2008 01:08:41 Mathieu Desnoyers wrote:
> +config IMMEDIATE
> + default y if !DISABLE_IMMEDIATE

Wouldn't it be simlpler to roll DISABLE_IMMEDIATE into this?

ie.
default y
depends on HAVE_IMMEDIATE
bool "Immediate value optimization" if EMBEDDED
help
Immediate values are used as read-mostly variables that are rarely
updated. They use code patching to modify the values inscribed in the
instruction stream. It provides a way to save precious cache lines
that would otherwise have to be used by these variables.

It consumes slightly more memory and requires to modify the instruction
stream each time a variable is updated. Should really be disabled for
embedded systems with read-only text.

Cheers,
Rusty.

Rusty Russell

unread,
Apr 9, 2008, 11:34:27 PM4/9/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> If you have to read the immediate values from a function declared as
> __init or __exit, you should explicitly use _imv_read(), which will fall
> back on a global variable read. Failing to do so will leave a reference to
> the __init section after it is freed (it would generate a modpost
> warning).

That's a real usability wart. Couldn't we skip these in the patching loop if
required and revert so noone can make this mistake?

Thanks,

Rusty Russell

unread,
Apr 9, 2008, 11:35:20 PM4/9/08
to Andi Kleen, Alexey Dobriyan, Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Jason Baron, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
On Thursday 10 April 2008 04:24:54 Andi Kleen wrote:
> On Wed, Apr 09, 2008 at 10:10:10PM +0400, Alexey Dobriyan wrote:
> > I continue to think that all this so-called "immediate" infrastructure
> > is an absolute overkill and declared D-cache line savings will _never_
> > matter
>
> You are quite wrong on that.

Well, Alexey did make me read through the patches reasonably thoroughly.

I think it's interesting, and to be honest the current simplified approach
just isn't that scary. There are some minor warts, but it's generally sound.

Cheers,
Rusty.

KOSAKI Motohiro

unread,
Apr 10, 2008, 12:24:26 AM4/10/08
to Mathieu Desnoyers, kosaki....@jp.fujitsu.com, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell
Hi Mathieu

> Here is the complete patchset required to get basic Immediate Values support
> ported to 2.6.25-rc8-mm1. It provides the "simple", non nmi-safe version of
> immediate values, which means that immediate values should not be used in code
> paths reachable by NMI or MCE handlers. This version also uses
> stop_machine_run() for immediate values updates, which is a very heavy lock. In
> order to make incremental, easy to review, changes, I think we could start by
> this "simple" version as a first step before we switch to the nmi-safe version
> later.

I think this patch is marker performance improvement infrastructure, right?
if so, I will help debug and testing.

at least, I like this patch series :)

KOSAKI Motohiro

unread,
Apr 10, 2008, 12:28:31 AM4/10/08
to Andi Kleen, kosaki....@jp.fujitsu.com, Alexey Dobriyan, Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Rusty Russell, Jason Baron, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
> > and will never be measured on real life workloads. Right now you
> > claim 1 (one) cacheline saving in schedule() which can be trivially
>
> That is just an example. Once the infrastructure is in a lot more
> flags would move it into it. I think eventually most sysctls
> should be immediate values for once.

It seems very good viewpoint.
if sysctl value become immediate, I think some fastpath increase performance.

Takashi Nishiie

unread,
Apr 10, 2008, 3:31:58 AM4/10/08
to linux-...@vger.kernel.org
Hi Mathieu

> Here is the complete patchset required to get basic Immediate Values
> support ported to 2.6.25-rc8-mm1. It provides the "simple", non
> nmi-safe version of immediate values, which means that immediate
> values should not be used in code paths reachable by NMI or MCE
> handlers. This version also uses
> stop_machine_run() for immediate values updates, which is a very heavy
> lock. In order to make incremental, easy to review, changes, I think
> we could start by this "simple" version as a first step before we
> switch to the nmi-safe version later.

I am interested in the performance improvement of "marker."
It is important in arguing that it is "simple."


I will help debug and testing.

Thanks

KOSAKI Motohiro

unread,
Apr 10, 2008, 4:06:28 AM4/10/08
to Mathieu Desnoyers, kosaki....@jp.fujitsu.com, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
Hi

> -Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
> cpus using IPIs leads to deadlocks. Process A held a read lock on
> tasklist_lock, then process B called apply_imv_update(). Process A received the
> IPI and begins executing ipi_busy_loop(). Then process C takes a write lock
> irq on the task list lock, before receiving the IPI. Thus, process A holds up
> process C, and C can't get an IPI b/c interrupts are disabled. Solve this
> problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which
> runs a function on all cpus after they are busy looping and have disabled
> irqs. Since this is done in a new process context, we don't have to worry
> about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
> of testing...

it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
this patch remove almost portion of [10/17].
IMHO these patch merge into 1 patch is better.


> +static int stop_machine_imv_update(void *imv_ptr)
> +{
> + struct __imv *imv = imv_ptr;
> +
> + if (!wrote_text) {

it seems racy.
Why don't need test_and_set?

I think your stop_machine_run(ALL_CPUS) call fn concurrency...


> + text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> + wrote_text = 1;
> + smp_wmb(); /* make sure other cpus see that this has run */
> + } else
> + sync_core();
> +
> + flush_icache_range(imv->imv, imv->imv + imv->size);
> +
> + return 0;
> +}
> +

Mathieu Desnoyers

unread,
Apr 10, 2008, 3:37:45 PM4/10/08
to Rusty Russell, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
* Rusty Russell (ru...@rustcorp.com.au) wrote:
> On Thursday 10 April 2008 01:08:41 Mathieu Desnoyers wrote:
> > +config IMMEDIATE
> > + default y if !DISABLE_IMMEDIATE
>
> Wouldn't it be simlpler to roll DISABLE_IMMEDIATE into this?
>
> ie.
> default y
> depends on HAVE_IMMEDIATE
> bool "Immediate value optimization" if EMBEDDED
> help
> Immediate values are used as read-mostly variables that are rarely
> updated. They use code patching to modify the values inscribed in the
> instruction stream. It provides a way to save precious cache lines
> that would otherwise have to be used by these variables.
>
> It consumes slightly more memory and requires to modify the instruction
> stream each time a variable is updated. Should really be disabled for
> embedded systems with read-only text.
>

Sure, thanks for the tip. Here is the updated version.


Immediate Values - Kconfig menu in EMBEDDED

Immediate values provide a way to use dynamic code patching to update variables
sitting within the instruction stream. It saves caches lines normally used by
static read mostly variables. Enable it by default, but let users disable it
through the EMBEDDED menu with the "Disable immediate values" submenu entry.

Note: Since I think that I really should let embedded systems developers using
RO memory the option to disable the immediate values, I choose to leave this
menu option there, in the EMBEDDED menu. Also, the "CONFIG_IMMEDIATE" makes
sense because we want to compile out all the immediate code when we decide not
to use optimized immediate values at all (it removes otherwise unused code).

Changelog:
- Change ARCH_SUPPORTS_IMMEDIATE for HAS_IMMEDIATE
- Turn DISABLE_IMMEDIATE into positive logic

Signed-off-by: Mathieu Desnoyers <mathieu....@polymtl.ca>


CC: Rusty Russell <ru...@rustcorp.com.au>
CC: Adrian Bunk <bu...@stusta.de>

CC: Andi Kleen <an...@firstfloor.org>
CC: Christoph Hellwig <h...@infradead.org>
CC: mi...@elte.hu
CC: ak...@osdl.org
---
init/Kconfig | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)

Index: linux-2.6-lttng/init/Kconfig
===================================================================
--- linux-2.6-lttng.orig/init/Kconfig 2008-04-10 15:22:37.000000000 -0400
+++ linux-2.6-lttng/init/Kconfig 2008-04-10 15:29:09.000000000 -0400
@@ -758,6 +758,24 @@ config PROC_PAGE_MONITOR
/proc/kpagecount, and /proc/kpageflags. Disabling these
interfaces will reduce the size of the kernel by approximately 4kb.

+config HAVE_IMMEDIATE
+ def_bool n
+


+config IMMEDIATE
+ default y

+ depends on HAVE_IMMEDIATE
+ bool "Immediate value optimization" if EMBEDDED
+ help
+ Immediate values are used as read-mostly variables that are rarely
+ updated. They use code patching to modify the values inscribed in the
+ instruction stream. It provides a way to save precious cache lines
+ that would otherwise have to be used by these variables. They can be
+ disabled through the EMBEDDED menu.
+
+ It consumes slightly more memory and requires to modify the
+ instruction stream each time a variable is updated. Should really be
+ disabled for embedded systems with read-only text.
+
endmenu # General setup

config SLABINFO

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Mathieu Desnoyers

unread,
Apr 10, 2008, 4:02:20 PM4/10/08
to KOSAKI Motohiro, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
* KOSAKI Motohiro (kosaki....@jp.fujitsu.com) wrote:
> Hi
>
> > -Updating immediate values, cannot rely on smp_call_function() b/c synchronizing
> > cpus using IPIs leads to deadlocks. Process A held a read lock on
> > tasklist_lock, then process B called apply_imv_update(). Process A received the
> > IPI and begins executing ipi_busy_loop(). Then process C takes a write lock
> > irq on the task list lock, before receiving the IPI. Thus, process A holds up
> > process C, and C can't get an IPI b/c interrupts are disabled. Solve this
> > problem by using a new 'ALL_CPUS' parameter to stop_machine_run(). Which
> > runs a function on all cpus after they are busy looping and have disabled
> > irqs. Since this is done in a new process context, we don't have to worry
> > about interrupted spin_locks. Also, less lines of code. Has survived 24 hours+
> > of testing...
>
> it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
> this patch remove almost portion of [10/17].
> IMHO these patch merge into 1 patch is better.
>

Hi Kosaki,

You are right, I will merge them and resend them in the following post.

>
> > +static int stop_machine_imv_update(void *imv_ptr)
> > +{
> > + struct __imv *imv = imv_ptr;
> > +
> > + if (!wrote_text) {
>
> it seems racy.
> Why don't need test_and_set?
>
> I think your stop_machine_run(ALL_CPUS) call fn concurrency...
>

The answer to this mistery is in include/linux/stop_machine.h modified
by add-all-cpus-option-to-stop-machine-run.patch :


/**
* stop_machine_run: freeze the machine on all CPUs and run this function
* @fn: the function to run
* @data: the data ptr for the @fn()
- * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
+ * @cpu: if @cpu == n, run @fn() on cpu n
+ * if @cpu == NR_CPUS, run @fn() on any cpu
+ * if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
+ * concurrently on all the other cpus
*
* Description: This causes a thread to be scheduled on every other cpu,
* each of which disables interrupts, and finally interrupts are disabled

Therefore, the first execution of the function is done before all other
executions.

Thanks,

Mathieu

>
> > + text_poke((void *)imv->imv, (void *)imv->var, imv->size);
> > + wrote_text = 1;
> > + smp_wmb(); /* make sure other cpus see that this has run */
> > + } else
> > + sync_core();
> > +
> > + flush_icache_range(imv->imv, imv->imv + imv->size);
> > +
> > + return 0;
> > +}
> > +
>
>

--

Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Mathieu Desnoyers

unread,
Apr 10, 2008, 9:17:09 PM4/10/08
to Rusty Russell, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
* Rusty Russell (ru...@rustcorp.com.au) wrote:
> On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > If you have to read the immediate values from a function declared as
> > __init or __exit, you should explicitly use _imv_read(), which will fall
> > back on a global variable read. Failing to do so will leave a reference to
> > the __init section after it is freed (it would generate a modpost
> > warning).
>
> That's a real usability wart. Couldn't we skip these in the patching loop if
> required and revert so noone can make this mistake?
>

Yeah, I know :(

Well, only if we can find a way to detect the macro is put within a init
or exit section. Is there some assembly trickery that would permit us to
do that ?

Otherwise, given the memory freed from the init section could be reused
later by the kernel, I don't see how we can detect the pointer leads to
a freed init section and, say, a module.

Mathieu

> Thanks,
> Rusty.

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

KOSAKI Motohiro

unread,
Apr 11, 2008, 12:50:47 AM4/11/08
to Mathieu Desnoyers, KOSAKI Motohiro, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Rusty Russell, Jason Baron, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org, linuxde...@ml.css.fujitsu.com
Hi Mathieu,

>> it seems this patch is must, Why do you separate patch [10/17] and [11/17]?
>> this patch remove almost portion of [10/17].
>> IMHO these patch merge into 1 patch is better.
>>
>>
>
> Hi Kosaki,
>
> You are right, I will merge them and resend them in the following post.
>
Thanks

>>> +static int stop_machine_imv_update(void *imv_ptr)
>>> +{
>>> + struct __imv *imv = imv_ptr;
>>> +
>>> + if (!wrote_text) {
>>>
>> it seems racy.
>> Why don't need test_and_set?
>>
>> I think your stop_machine_run(ALL_CPUS) call fn concurrency...
>>
> The answer to this mistery is in include/linux/stop_machine.h modified
> by add-all-cpus-option-to-stop-machine-run.patch :
>
>
> /**
> * stop_machine_run: freeze the machine on all CPUs and run this function
> * @fn: the function to run
> * @data: the data ptr for the @fn()
> - * @cpu: the cpu to run @fn() on (or any, if @cpu == NR_CPUS.
> + * @cpu: if @cpu == n, run @fn() on cpu n
> + * if @cpu == NR_CPUS, run @fn() on any cpu
> + * if @cpu == ALL_CPUS, run @fn() first on the calling cpu, and then
> + * concurrently on all the other cpus
> *
> * Description: This causes a thread to be scheduled on every other cpu,
> * each of which disables interrupts, and finally interrupts are disabled
>
> Therefore, the first execution of the function is done before all other
> executions.
>
>

Ah, OK. I understand.
Thanks.

Mathieu Desnoyers

unread,
Apr 11, 2008, 9:50:02 AM4/11/08
to Rusty Russell, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
Supports placing immediate values in init code

We need to put the immediate values in RW data section so we can edit them
before init section unload.

This code puts NULL pointers in lieu of original pointer referencing init code
before the init sections are freed, both in the core kernel and in modules.

(experimental patch submitted for comments and testing)

Signed-off-by: Mathieu Desnoyers <mathieu....@polymtl.ca>
CC: Rusty Russell <ru...@rustcorp.com.au>

CC: "Frank Ch. Eigler" <fc...@redhat.com>
---
Documentation/immediate.txt | 5 -----
include/asm-generic/vmlinux.lds.h | 8 ++++----
include/asm-powerpc/immediate.h | 4 ++--
include/asm-x86/immediate.h | 6 +++---
include/linux/immediate.h | 7 ++++++-
include/linux/module.h | 2 +-
init/main.c | 1 +
kernel/immediate.c | 32 ++++++++++++++++++++++++++++++--
kernel/module.c | 2 ++
9 files changed, 49 insertions(+), 18 deletions(-)

Index: linux-2.6-lttng/kernel/immediate.c
===================================================================
--- linux-2.6-lttng.orig/kernel/immediate.c 2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/kernel/immediate.c 2008-04-11 09:41:33.000000000 -0400
@@ -22,6 +22,7 @@
#include <linux/cpu.h>
#include <linux/stop_machine.h>

+#include <asm/sections.h>
#include <asm/cacheflush.h>

/*
@@ -30,8 +31,8 @@
static int imv_early_boot_complete;
static int wrote_text;

-extern const struct __imv __start___imv[];
-extern const struct __imv __stop___imv[];
+extern struct __imv __start___imv[];
+extern struct __imv __stop___imv[];

static int stop_machine_imv_update(void *imv_ptr)
{
@@ -118,6 +119,8 @@ void imv_update_range(const struct __imv
int ret;
for (iter = begin; iter < end; iter++) {
mutex_lock(&imv_mutex);
+ if (!iter->imv) /* Skip removed __init immediate values */
+ goto skip;
ret = apply_imv_update(iter);
if (imv_early_boot_complete && ret)
printk(KERN_WARNING
@@ -126,6 +129,7 @@ void imv_update_range(const struct __imv
"instruction at %p, size %hu\n",
(void *)iter->imv,
(void *)iter->var, iter->size);
+skip:
mutex_unlock(&imv_mutex);
}
}
@@ -143,6 +147,30 @@ void core_imv_update(void)
}
EXPORT_SYMBOL_GPL(core_imv_update);

+/**
+ * imv_unref_init
+ *
+ * Deactivate any immediate value reference pointing into init code before
+ * it vanishes.
+ */
+void imv_unref_init(struct __imv *begin, struct __imv *end, void *init,
+ unsigned long init_size)
+{
+ struct __imv *iter;
+
+ for (iter = begin; iter < end; iter++)
+ if (iter->imv >= (unsigned long)init
+ && iter->imv < (unsigned long)init + init_size)
+ iter->imv = 0UL;
+}
+
+void imv_unref_core_init(void)
+{
+ imv_unref_init(__start___imv, __stop___imv,
+ __init_begin,
+ (unsigned long)__init_end - (unsigned long)__init_begin);
+}
+
void __init imv_init_complete(void)
{
imv_early_boot_complete = 1;
Index: linux-2.6-lttng/kernel/module.c
===================================================================
--- linux-2.6-lttng.orig/kernel/module.c 2008-04-11 09:35:54.000000000 -0400
+++ linux-2.6-lttng/kernel/module.c 2008-04-11 09:36:58.000000000 -0400
@@ -2208,6 +2208,8 @@ sys_init_module(void __user *umod,
/* Drop initial reference. */
module_put(mod);
unwind_remove_table(mod->unwind_info, 1);
+ imv_unref_init(mod->immediate, mod->immediate + mod->num_immediate,
+ mod->module_init, mod->init_size);
module_free(mod, mod->module_init);
mod->module_init = NULL;
mod->init_size = 0;
Index: linux-2.6-lttng/include/linux/module.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/module.h 2008-04-11 09:35:43.000000000 -0400
+++ linux-2.6-lttng/include/linux/module.h 2008-04-11 09:36:58.000000000 -0400
@@ -357,7 +357,7 @@ struct module
keeping pointers to this stuff */
char *args;
#ifdef CONFIG_IMMEDIATE
- const struct __imv *immediate;
+ struct __imv *immediate;
unsigned int num_immediate;
#endif
#ifdef CONFIG_MARKERS
Index: linux-2.6-lttng/include/asm-generic/vmlinux.lds.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-generic/vmlinux.lds.h 2008-04-11 09:35:37.000000000 -0400
+++ linux-2.6-lttng/include/asm-generic/vmlinux.lds.h 2008-04-11 09:36:58.000000000 -0400
@@ -52,7 +52,10 @@
. = ALIGN(8); \
VMLINUX_SYMBOL(__start___markers) = .; \
*(__markers) \
- VMLINUX_SYMBOL(__stop___markers) = .;
+ VMLINUX_SYMBOL(__stop___markers) = .; \
+ VMLINUX_SYMBOL(__start___imv) = .; \
+ *(__imv) /* Immediate values: pointers */ \
+ VMLINUX_SYMBOL(__stop___imv) = .;

#define RO_DATA(align) \
. = ALIGN((align)); \
@@ -61,9 +64,6 @@
*(.rodata) *(.rodata.*) \
*(__vermagic) /* Kernel version magic */ \
*(__markers_strings) /* Markers: strings */ \
- VMLINUX_SYMBOL(__start___imv) = .; \
- *(__imv) /* Immediate values: pointers */ \
- VMLINUX_SYMBOL(__stop___imv) = .; \
} \
\
.rodata1 : AT(ADDR(.rodata1) - LOAD_OFFSET) { \
Index: linux-2.6-lttng/include/linux/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/linux/immediate.h 2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/linux/immediate.h 2008-04-11 09:36:58.000000000 -0400
@@ -46,6 +46,9 @@ struct __imv {
extern void core_imv_update(void);
extern void imv_update_range(const struct __imv *begin,
const struct __imv *end);
+extern void imv_unref_core_init(void);
+extern void imv_unref_init(struct __imv *begin, struct __imv *end, void *init,
+ unsigned long init_size);

#else

@@ -73,7 +76,9 @@ extern void imv_update_range(const struc

static inline void core_imv_update(void) { }
static inline void module_imv_update(void) { }
-
+static inline void imv_unref_core_init(void) { }
+static inline void imv_unref_init(struct __imv *begin, struct __imv *end,
+ void *init, unsigned long init_size) { }
#endif

#define DECLARE_IMV(type, name) extern __typeof__(type) name##__imv
Index: linux-2.6-lttng/init/main.c
===================================================================
--- linux-2.6-lttng.orig/init/main.c 2008-04-11 09:35:37.000000000 -0400
+++ linux-2.6-lttng/init/main.c 2008-04-11 09:36:58.000000000 -0400
@@ -776,6 +776,7 @@ static void run_init_process(char *init_
*/
static int noinline init_post(void)
{
+ imv_unref_core_init();
free_initmem();
unlock_kernel();
mark_rodata_ro();
Index: linux-2.6-lttng/include/asm-x86/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-x86/immediate.h 2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-x86/immediate.h 2008-04-11 09:38:23.000000000 -0400
@@ -33,7 +33,7 @@
BUILD_BUG_ON(sizeof(value) > 8); \
switch (sizeof(value)) { \
case 1: \
- asm(".section __imv,\"a\",@progbits\n\t" \
+ asm(".section __imv,\"aw\",@progbits\n\t" \


_ASM_PTR "%c1, (3f)-%c2\n\t" \

".byte %c2\n\t" \
".previous\n\t" \
@@ -45,7 +45,7 @@
break; \
case 2: \
case 4: \
- asm(".section __imv,\"a\",@progbits\n\t" \
+ asm(".section __imv,\"aw\",@progbits\n\t" \


_ASM_PTR "%c1, (3f)-%c2\n\t" \

".byte %c2\n\t" \
".previous\n\t" \
@@ -60,7 +60,7 @@
value = name##__imv; \
break; \
} \
- asm(".section __imv,\"a\",@progbits\n\t" \
+ asm(".section __imv,\"aw\",@progbits\n\t" \


_ASM_PTR "%c1, (3f)-%c2\n\t" \

".byte %c2\n\t" \
".previous\n\t" \
Index: linux-2.6-lttng/include/asm-powerpc/immediate.h
===================================================================
--- linux-2.6-lttng.orig/include/asm-powerpc/immediate.h 2008-04-11 09:35:52.000000000 -0400
+++ linux-2.6-lttng/include/asm-powerpc/immediate.h 2008-04-11 09:36:58.000000000 -0400
@@ -26,7 +26,7 @@
BUILD_BUG_ON(sizeof(value) > 8); \
switch (sizeof(value)) { \
case 1: \
- asm(".section __imv,\"a\",@progbits\n\t" \
+ asm(".section __imv,\"aw\",@progbits\n\t" \
PPC_LONG "%c1, ((1f)-1)\n\t" \
".byte 1\n\t" \
".previous\n\t" \
@@ -36,7 +36,7 @@
: "i" (&name##__imv)); \
break; \
case 2: \
- asm(".section __imv,\"a\",@progbits\n\t" \
+ asm(".section __imv,\"aw\",@progbits\n\t" \
PPC_LONG "%c1, ((1f)-2)\n\t" \
".byte 2\n\t" \
".previous\n\t" \
Index: linux-2.6-lttng/Documentation/immediate.txt
===================================================================
--- linux-2.6-lttng.orig/Documentation/immediate.txt 2008-04-11 09:43:03.000000000 -0400
+++ linux-2.6-lttng/Documentation/immediate.txt 2008-04-11 09:43:19.000000000 -0400
@@ -42,11 +42,6 @@ The immediate mechanism supports inserti
immediate. Immediate values can be put in inline functions, inlined static
functions, and unrolled loops.

-If you have to read the immediate values from a function declared as __init or
-__exit, you should explicitly use _imv_read(), which will fall back on a
-global variable read. Failing to do so will leave a reference to the __init
-section after it is freed (it would generate a modpost warning).
-
You can choose to set an initial static value to the immediate by using, for
instance:



--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Rusty Russell

unread,
Apr 11, 2008, 10:59:28 AM4/11/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
On Friday 11 April 2008 05:32:26 Mathieu Desnoyers wrote:
> + It consumes slightly more memory and requires to modify the
> + instruction stream each time a variable is updated.

Sorry, I missed this grammar bug last time. Also, probable don't want to
scare someone that we're changing instructions on every var change :)

It consumes slightly more memory and modifies the instruction
stream each time any specially-marked variable is updated.

Thanks,
Rusty.

Rusty Russell

unread,
Apr 11, 2008, 11:08:46 AM4/11/08
to Mathieu Desnoyers, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
On Friday 11 April 2008 11:16:47 Mathieu Desnoyers wrote:
> * Rusty Russell (ru...@rustcorp.com.au) wrote:
> > On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > > If you have to read the immediate values from a function declared as
> > > __init or __exit, you should explicitly use _imv_read(), which will
> > > fall back on a global variable read. Failing to do so will leave a
> > > reference to the __init section after it is freed (it would generate a
> > > modpost warning).
> >
> > That's a real usability wart. Couldn't we skip these in the patching
> > loop if required and revert so noone can make this mistake?
>
> Yeah, I know :(
>
> Well, only if we can find a way to detect the macro is put within a init
> or exit section. Is there some assembly trickery that would permit us to
> do that ?
>
> Otherwise, given the memory freed from the init section could be reused
> later by the kernel, I don't see how we can detect the pointer leads to
> a freed init section and, say, a module.

In theory although not in practice, since everyone vmallocs modules. Let's
not rely on that tho.

How about we sweep the immediate table on init discard and remove/mark all the
init and exit references?

Cheers,
Rusty.

Mathieu Desnoyers

unread,
Apr 14, 2008, 7:52:29 PM4/14/08
to Rusty Russell, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Christoph Hellwig, ak...@osdl.org
* Rusty Russell (ru...@rustcorp.com.au) wrote:
> On Friday 11 April 2008 05:32:26 Mathieu Desnoyers wrote:
> > + It consumes slightly more memory and requires to modify the
> > + instruction stream each time a variable is updated.
>
> Sorry, I missed this grammar bug last time. Also, probable don't want to
> scare someone that we're changing instructions on every var change :)
>
> It consumes slightly more memory and modifies the instruction
> stream each time any specially-marked variable is updated.
>

Will update, thanks!

> Thanks,
> Rusty.

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

Mathieu Desnoyers

unread,
Apr 14, 2008, 8:12:43 PM4/14/08
to Rusty Russell, ak...@linux-foundation.org, Ingo Molnar, linux-...@vger.kernel.org, Andi Kleen, Adrian Bunk, Alexey Dobriyan, Christoph Hellwig, ak...@osdl.org
* Rusty Russell (ru...@rustcorp.com.au) wrote:
> On Friday 11 April 2008 11:16:47 Mathieu Desnoyers wrote:
> > * Rusty Russell (ru...@rustcorp.com.au) wrote:
> > > On Thursday 10 April 2008 01:08:45 Mathieu Desnoyers wrote:
> > > > If you have to read the immediate values from a function declared as
> > > > __init or __exit, you should explicitly use _imv_read(), which will
> > > > fall back on a global variable read. Failing to do so will leave a
> > > > reference to the __init section after it is freed (it would generate a
> > > > modpost warning).
> > >
> > > That's a real usability wart. Couldn't we skip these in the patching
> > > loop if required and revert so noone can make this mistake?
> >
> > Yeah, I know :(
> >
> > Well, only if we can find a way to detect the macro is put within a init
> > or exit section. Is there some assembly trickery that would permit us to
> > do that ?
> >
> > Otherwise, given the memory freed from the init section could be reused
> > later by the kernel, I don't see how we can detect the pointer leads to
> > a freed init section and, say, a module.
>
> In theory although not in practice, since everyone vmallocs modules. Let's
> not rely on that tho.
>
> How about we sweep the immediate table on init discard and remove/mark all the
> init and exit references?
>
> Cheers,
> Rusty.

I already posted a patch which nullifies the immediate values pointing
to init code after the init phase of the core kernel and the init phase
of modules, just before the init section is freed.

For the exit section, I could add some code which nullifies the
immediate values pointing to exit section for !CONFIG_MODULE_UNLOAD.
However, I would need to get the equivalent of init and init_size for
exit too.

I wonder what would happen if someone declares a __exit function in a
builtin object, with an immediate value in it ? Is there a possibility
that it leaves a reference to code not even linked in ?

Mathieu


--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68

0 new messages