arch/x86/cpu/common.c
---------------------
unsigned long kernel_eflags;
void __cpuinit cpu_init(void)
{
...
raw_local_save_flags(kernel_eflags);
}
arch/x86/kernel/entry_64.S
--------------------------
ENTRY(ret_from_fork)
DEFAULT_FRAME
LOCK ; btr $TIF_FORK,TI_flags(%r8)
pushq_cfi kernel_eflags(%rip)
popfq_cfi # reset kernel eflags
Every call to cpu_init renew global kernel_eflags
and every task switching does use this variable in a
sake of cleaning carry bit of flags register as far as
I can tell.
Should not every cpu has own copy of kernel_eflags? Just
to be consistent in style? Or this would be space waisting
and an optimization is done here?
Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
I noticed this a while ago and couldn't figure out why it was done this
way. On 32-bit the initial EFLAGS is simply hardcoded and that seems to
make sense to me since it's not clear that the specific value of EFLAGS
at the point it is saved in cpu_init() has any particular meaning.
I posted http://lkml.org/lkml/2010/2/9/155 to switch to the hardcoded
version for 64 bit too. Is it worth my rebasing and reposting that?
(http://lkml.org/lkml/2010/2/9/152 and http://lkml.org/lkml/2010/2/9/154
were followon cleanups).
Ian.
--
Ian Campbell
Current Noise: Kylesa - Drop Out
"... all the modern inconveniences ..."
-- Mark Twain
I think so.
Cyrill
Original i386 changeset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
Original x86_64 changset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9
The only comment in the later indicates that it is following the
former, but not why it differs in this way.
This change makes 64 bit use the same mechanism to setup the initial
EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
schedule_tail() as opposed to 32 bit which calls schedule_tail()
first. Therefore the correct value for EFLAGS has opposite IF
bit. This will be fixed in a subsequent patch.
Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---
arch/x86/kernel/cpu/common.c | 4 ----
arch/x86/kernel/entry_64.S | 2 +-
2 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..178c3b4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1077,8 +1077,6 @@ void syscall_init(void)
X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|X86_EFLAGS_IOPL);
}
-unsigned long kernel_eflags;
-
/*
* Copies of the original ist values from the tss are only accessed during
* debugging, no special alignment required.
@@ -1231,8 +1229,6 @@ void __cpuinit cpu_init(void)
fpu_init();
xsave_init();
- raw_local_save_flags(kernel_eflags);
-
if (is_uv_system())
uv_cpu_init();
}
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e13329d..b50d142 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -396,7 +396,7 @@ ENTRY(ret_from_fork)
LOCK ; btr $TIF_FORK,TI_flags(%r8)
- pushq_cfi kernel_eflags(%rip)
+ pushq_cfi $0x0002
popfq_cfi # reset kernel eflags
call schedule_tail # rdi: 'prev' task parameter
--
1.7.2.5
Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---
arch/x86/kernel/entry_64.S | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b50d142..9c28bd5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -394,13 +394,14 @@ END(save_paranoid)
ENTRY(ret_from_fork)
DEFAULT_FRAME
- LOCK ; btr $TIF_FORK,TI_flags(%r8)
- pushq_cfi $0x0002
- popfq_cfi # reset kernel eflags
+ LOCK ; btr $TIF_FORK,TI_flags(%r8)
call schedule_tail # rdi: 'prev' task parameter
+ pushq_cfi $0x0202
+ popfq_cfi # reset kernel eflags
+
GET_THREAD_INFO(%rcx)
RESTORE_REST
--
1.7.2.5
diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 5c1a9197..f032530 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -303,7 +303,7 @@ ENTRY(ret_from_fork)
call schedule_tail
GET_THREAD_INFO(%ebp)
popl_cfi %eax
- pushl_cfi $0x0202 # Reset kernel eflags
+ pushl_cfi $(X86_EFLAGS_IF|0x2) # Reset kernel eflags
popfl_cfi
jmp syscall_exit
CFI_ENDPROC
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9c28bd5..4c76abf 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -399,7 +399,7 @@ ENTRY(ret_from_fork)
call schedule_tail # rdi: 'prev' task parameter
- pushq_cfi $0x0202
+ pushq_cfi $(X86_EFLAGS_IF|0x2)
popfq_cfi # reset kernel eflags
GET_THREAD_INFO(%rcx)
--
1.7.2.5
The whole series looks good to me, thanks Ian! I hope I don't
miss anything, so lets wait for more feedback ;)
Cyrill
Not specific to this particular case, but in general: a shared variable
that used often but rarely written to will automatically replicate
itself in the caches of multiple processors. This is the purpose of the
read_mostly segment (writes are permitted but expected to be rare),
which exists to make sure that a frequently written variable doesn't
randomly end up in the cache line next to a read-mostly variable.
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
yeah, thanks peter!
Cyrill
FWIW the variable in this particular case isn't actually marked as
__read_mostly...
Ian.
--
Ian Campbell
* Simunye is on a oc3->oc12
simmy: bite me. :)
daemon: okay :)
Thanks. Actually I missed the extern declaration in asm/processor.h so
this updated patch should be used instead:
8<-----------------------------------------------
From a450bf9e1c6e3f436c652b3e4d63b50c1f9848d4 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.ca...@citrix.com>
Date: Wed, 6 Jul 2011 10:24:46 +0100
Subject: [PATCH] x86: drop unnecessary kernel_eflags variable from 64 bit.
For no reason that I can determine 64 bit x86 saves the current eflags
in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
code simply hard codes 0x0202 as the new EFLAGS which seems safer than
relying on a potentially arbitrary EFLAGS saved during cpu_init.
Original i386 changeset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
Original x86_64 changset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9
The only comment in the later indicates that it is following the
former, but not why it differs in this way.
This change makes 64 bit use the same mechanism to setup the initial
EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
schedule_tail() as opposed to 32 bit which calls schedule_tail()
first. Therefore the correct value for EFLAGS has opposite IF
bit. This will be fixed in a subsequent patch.
Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---
arch/x86/include/asm/processor.h | 1 -
arch/x86/kernel/cpu/common.c | 4 ----
arch/x86/kernel/entry_64.S | 2 +-
3 files changed, 1 insertions(+), 6 deletions(-)
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2193715..698af88 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -398,7 +398,6 @@ DECLARE_INIT_PER_CPU(irq_stack_union);
DECLARE_PER_CPU(char *, irq_stack_ptr);
DECLARE_PER_CPU(unsigned int, irq_count);
-extern unsigned long kernel_eflags;
extern asmlinkage void ignore_sysret(void);
#else /* X86_64 */
#ifdef CONFIG_CC_STACKPROTECTOR
--
Well lets ask Linus and Andi?
Reviewed-by: Pekka Enberg <pen...@kernel.org>
Great, thanks Ian. Have you a chance to run this series? ;)
Cyrill
Yeah I booted it before I posted it. I figured that the initscripts etc
would be a sufficient test of fork() and that if it were wrong it would
fall over pretty quickly and obviously.
Ian.
OK, thanks!
Cyrill
The original idea was that if someone extends EFLAGS with new bits
the BIOS could set them. Strictly said the Intel Manual
specifies that ("reserved bits -- always set to values previously read")
But that never really worked out and I don't think anyone
would dare extending it now. So hardcoding is fine I guess.
-Andi
We already clear any undefined flags almost as soon as we enter the
kernel... and quite rightly so, because there are BIOSes which leave
random crap in the flags.
It is very different from what an *application* should do in this case,
however.
-hpa