Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[Q x86-64] on kernel_eflags

0 views
Skip to first unread message

Cyrill Gorcunov

unread,
Jul 5, 2011, 6:50:02 AM7/5/11
to
While were looking into ret_from_fork code I somehow wondered
about the global kernel_eflags variable here.

arch/x86/cpu/common.c
---------------------
unsigned long kernel_eflags;

void __cpuinit cpu_init(void)
{
...
raw_local_save_flags(kernel_eflags);
}

arch/x86/kernel/entry_64.S
--------------------------
ENTRY(ret_from_fork)
DEFAULT_FRAME
LOCK ; btr $TIF_FORK,TI_flags(%r8)
pushq_cfi kernel_eflags(%rip)
popfq_cfi # reset kernel eflags

Every call to cpu_init renew global kernel_eflags
and every task switching does use this variable in a
sake of cleaning carry bit of flags register as far as
I can tell.

Should not every cpu has own copy of kernel_eflags? Just
to be consistent in style? Or this would be space waisting
and an optimization is done here?

Cyrill
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Ian Campbell

unread,
Jul 5, 2011, 7:10:02 AM7/5/11
to
On Tue, 2011-07-05 at 14:47 +0400, Cyrill Gorcunov wrote:
> While were looking into ret_from_fork code I somehow wondered
> about the global kernel_eflags variable here.
>
> arch/x86/cpu/common.c
> ---------------------
> unsigned long kernel_eflags;
>
> void __cpuinit cpu_init(void)
> {
> ...
> raw_local_save_flags(kernel_eflags);
> }
>
> arch/x86/kernel/entry_64.S
> --------------------------
> ENTRY(ret_from_fork)
> DEFAULT_FRAME
> LOCK ; btr $TIF_FORK,TI_flags(%r8)
> pushq_cfi kernel_eflags(%rip)
> popfq_cfi # reset kernel eflags
>
> Every call to cpu_init renew global kernel_eflags
> and every task switching does use this variable in a
> sake of cleaning carry bit of flags register as far as
> I can tell.
>
> Should not every cpu has own copy of kernel_eflags? Just
> to be consistent in style? Or this would be space waisting
> and an optimization is done here?

I noticed this a while ago and couldn't figure out why it was done this
way. On 32-bit the initial EFLAGS is simply hardcoded and that seems to
make sense to me since it's not clear that the specific value of EFLAGS
at the point it is saved in cpu_init() has any particular meaning.

I posted http://lkml.org/lkml/2010/2/9/155 to switch to the hardcoded
version for 64 bit too. Is it worth my rebasing and reposting that?

(http://lkml.org/lkml/2010/2/9/152 and http://lkml.org/lkml/2010/2/9/154
were followon cleanups).

Ian.

--
Ian Campbell
Current Noise: Kylesa - Drop Out

"... all the modern inconveniences ..."
-- Mark Twain

Cyrill Gorcunov

unread,
Jul 5, 2011, 7:20:02 AM7/5/11
to
On Tue, Jul 05, 2011 at 12:00:56PM +0100, Ian Campbell wrote:
...

>
> I posted http://lkml.org/lkml/2010/2/9/155 to switch to the hardcoded
> version for 64 bit too. Is it worth my rebasing and reposting that?

I think so.

Cyrill

Ian Campbell

unread,
Jul 5, 2011, 9:10:02 AM7/5/11
to
For no reason that I can determine 64 bit x86 saves the current eflags
in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
code simply hard codes 0x0202 as the new EFLAGS which seems safer than
relying on a potentially arbitrary EFLAGS saved during cpu_init.

Original i386 changeset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
Original x86_64 changset
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9

The only comment in the later indicates that it is following the
former, but not why it differs in this way.

This change makes 64 bit use the same mechanism to setup the initial
EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
schedule_tail() as opposed to 32 bit which calls schedule_tail()
first. Therefore the correct value for EFLAGS has opposite IF
bit. This will be fixed in a subsequent patch.

Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---
arch/x86/kernel/cpu/common.c | 4 ----
arch/x86/kernel/entry_64.S | 2 +-
2 files changed, 1 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c
index 22a073d..178c3b4 100644
--- a/arch/x86/kernel/cpu/common.c
+++ b/arch/x86/kernel/cpu/common.c
@@ -1077,8 +1077,6 @@ void syscall_init(void)
X86_EFLAGS_TF|X86_EFLAGS_DF|X86_EFLAGS_IF|X86_EFLAGS_IOPL);
}

-unsigned long kernel_eflags;
-
/*
* Copies of the original ist values from the tss are only accessed during
* debugging, no special alignment required.
@@ -1231,8 +1229,6 @@ void __cpuinit cpu_init(void)
fpu_init();
xsave_init();

- raw_local_save_flags(kernel_eflags);
-
if (is_uv_system())
uv_cpu_init();
}
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index e13329d..b50d142 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -396,7 +396,7 @@ ENTRY(ret_from_fork)

LOCK ; btr $TIF_FORK,TI_flags(%r8)

- pushq_cfi kernel_eflags(%rip)
+ pushq_cfi $0x0002


popfq_cfi # reset kernel eflags

call schedule_tail # rdi: 'prev' task parameter
--
1.7.2.5

Ian Campbell

unread,
Jul 5, 2011, 9:10:02 AM7/5/11
to
The 64 bit version resets EFLAGS before calling schedule_tail() and
therefore leaves EFLAGS.IF clear. 32 bit resets EFLAGS after calling
schedule_tail() and therefore leaves EFLAGS.IF set. I don't think
there is any practical difference between the two approaches since
interrupts are actually reenabled within schedule_tail
(schedule_tail->finish_task_switch->finish_lock_switch->raw_spin_unlock_irq->...->local_irq_enable)
so arbitrarily pick the 32 bit version and make 64 bit look like that.

Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---

arch/x86/kernel/entry_64.S | 7 ++++---
1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index b50d142..9c28bd5 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -394,13 +394,14 @@ END(save_paranoid)
ENTRY(ret_from_fork)
DEFAULT_FRAME

- LOCK ; btr $TIF_FORK,TI_flags(%r8)

- pushq_cfi $0x0002
- popfq_cfi # reset kernel eflags
+ LOCK ; btr $TIF_FORK,TI_flags(%r8)



call schedule_tail # rdi: 'prev' task parameter

+ pushq_cfi $0x0202
+ popfq_cfi # reset kernel eflags
+
GET_THREAD_INFO(%rcx)

RESTORE_REST
--
1.7.2.5

Ian Campbell

unread,
Jul 5, 2011, 9:20:02 AM7/5/11
to
Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org
---
arch/x86/kernel/entry_32.S | 2 +-
arch/x86/kernel/entry_64.S | 2 +-
2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/entry_32.S b/arch/x86/kernel/entry_32.S
index 5c1a9197..f032530 100644
--- a/arch/x86/kernel/entry_32.S
+++ b/arch/x86/kernel/entry_32.S
@@ -303,7 +303,7 @@ ENTRY(ret_from_fork)
call schedule_tail
GET_THREAD_INFO(%ebp)
popl_cfi %eax
- pushl_cfi $0x0202 # Reset kernel eflags
+ pushl_cfi $(X86_EFLAGS_IF|0x2) # Reset kernel eflags
popfl_cfi
jmp syscall_exit
CFI_ENDPROC
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index 9c28bd5..4c76abf 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -399,7 +399,7 @@ ENTRY(ret_from_fork)



call schedule_tail # rdi: 'prev' task parameter

- pushq_cfi $0x0202
+ pushq_cfi $(X86_EFLAGS_IF|0x2)


popfq_cfi # reset kernel eflags

GET_THREAD_INFO(%rcx)
--
1.7.2.5

Cyrill Gorcunov

unread,
Jul 5, 2011, 9:30:02 AM7/5/11
to
On Tue, Jul 05, 2011 at 02:00:12PM +0100, Ian Campbell wrote:
> For no reason that I can determine 64 bit x86 saves the current eflags
> in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
> code simply hard codes 0x0202 as the new EFLAGS which seems safer than
> relying on a potentially arbitrary EFLAGS saved during cpu_init.
>
> Original i386 changeset
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
> Original x86_64 changset
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9
>
> The only comment in the later indicates that it is following the
> former, but not why it differs in this way.
>
> This change makes 64 bit use the same mechanism to setup the initial
> EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
> schedule_tail() as opposed to 32 bit which calls schedule_tail()
> first. Therefore the correct value for EFLAGS has opposite IF
> bit. This will be fixed in a subsequent patch.
>
> Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
> Cc: x...@kernel.org

The whole series looks good to me, thanks Ian! I hope I don't
miss anything, so lets wait for more feedback ;)

Cyrill

H. Peter Anvin

unread,
Jul 5, 2011, 1:50:03 PM7/5/11
to
On 07/05/2011 03:47 AM, Cyrill Gorcunov wrote:
>
> Should not every cpu has own copy of kernel_eflags? Just
> to be consistent in style? Or this would be space waisting
> and an optimization is done here?
>

Not specific to this particular case, but in general: a shared variable
that used often but rarely written to will automatically replicate
itself in the caches of multiple processors. This is the purpose of the
read_mostly segment (writes are permitted but expected to be rare),
which exists to make sure that a frequently written variable doesn't
randomly end up in the cache line next to a read-mostly variable.

-hpa

--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.

Cyrill Gorcunov

unread,
Jul 5, 2011, 2:00:03 PM7/5/11
to
On Tue, Jul 05, 2011 at 10:47:10AM -0700, H. Peter Anvin wrote:
> On 07/05/2011 03:47 AM, Cyrill Gorcunov wrote:
> >
> > Should not every cpu has own copy of kernel_eflags? Just
> > to be consistent in style? Or this would be space waisting
> > and an optimization is done here?
> >
>
> Not specific to this particular case, but in general: a shared variable
> that used often but rarely written to will automatically replicate
> itself in the caches of multiple processors. This is the purpose of the
> read_mostly segment (writes are permitted but expected to be rare),
> which exists to make sure that a frequently written variable doesn't
> randomly end up in the cache line next to a read-mostly variable.
>
> -hpa
>

yeah, thanks peter!

Cyrill

Ian Campbell

unread,
Jul 6, 2011, 5:30:01 AM7/6/11
to
On Tue, 2011-07-05 at 10:47 -0700, H. Peter Anvin wrote:
> On 07/05/2011 03:47 AM, Cyrill Gorcunov wrote:
> >
> > Should not every cpu has own copy of kernel_eflags? Just
> > to be consistent in style? Or this would be space waisting
> > and an optimization is done here?
> >
>
> Not specific to this particular case, but in general: a shared variable
> that used often but rarely written to will automatically replicate
> itself in the caches of multiple processors. This is the purpose of the
> read_mostly segment (writes are permitted but expected to be rare),
> which exists to make sure that a frequently written variable doesn't
> randomly end up in the cache line next to a read-mostly variable.

FWIW the variable in this particular case isn't actually marked as
__read_mostly...

Ian.
--
Ian Campbell

* Simunye is on a oc3->oc12
simmy: bite me. :)
daemon: okay :)

Ian Campbell

unread,
Jul 6, 2011, 5:30:01 AM7/6/11
to
On Tue, 2011-07-05 at 17:28 +0400, Cyrill Gorcunov wrote:
> On Tue, Jul 05, 2011 at 02:00:12PM +0100, Ian Campbell wrote:
> > For no reason that I can determine 64 bit x86 saves the current eflags
> > in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
> > code simply hard codes 0x0202 as the new EFLAGS which seems safer than
> > relying on a potentially arbitrary EFLAGS saved during cpu_init.
> >
> > Original i386 changeset
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
> > Original x86_64 changset
> > http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9
> >
> > The only comment in the later indicates that it is following the
> > former, but not why it differs in this way.
> >
> > This change makes 64 bit use the same mechanism to setup the initial
> > EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
> > schedule_tail() as opposed to 32 bit which calls schedule_tail()
> > first. Therefore the correct value for EFLAGS has opposite IF
> > bit. This will be fixed in a subsequent patch.
> >
> > Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
> > Cc: x...@kernel.org
>
> The whole series looks good to me, thanks Ian! I hope I don't
> miss anything, so lets wait for more feedback ;)

Thanks. Actually I missed the extern declaration in asm/processor.h so
this updated patch should be used instead:

8<-----------------------------------------------

From a450bf9e1c6e3f436c652b3e4d63b50c1f9848d4 Mon Sep 17 00:00:00 2001
From: Ian Campbell <ian.ca...@citrix.com>
Date: Wed, 6 Jul 2011 10:24:46 +0100
Subject: [PATCH] x86: drop unnecessary kernel_eflags variable from 64 bit.

For no reason that I can determine 64 bit x86 saves the current eflags
in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
code simply hard codes 0x0202 as the new EFLAGS which seems safer than
relying on a potentially arbitrary EFLAGS saved during cpu_init.

The only comment in the later indicates that it is following the
former, but not why it differs in this way.

This change makes 64 bit use the same mechanism to setup the initial
EFLAGS on fork. Note that 64 bit resets EFLAGS before calling
schedule_tail() as opposed to 32 bit which calls schedule_tail()
first. Therefore the correct value for EFLAGS has opposite IF
bit. This will be fixed in a subsequent patch.

Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
Cc: x...@kernel.org

---
arch/x86/include/asm/processor.h | 1 -


arch/x86/kernel/cpu/common.c | 4 ----
arch/x86/kernel/entry_64.S | 2 +-

3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 2193715..698af88 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -398,7 +398,6 @@ DECLARE_INIT_PER_CPU(irq_stack_union);

DECLARE_PER_CPU(char *, irq_stack_ptr);
DECLARE_PER_CPU(unsigned int, irq_count);
-extern unsigned long kernel_eflags;
extern asmlinkage void ignore_sysret(void);
#else /* X86_64 */
#ifdef CONFIG_CC_STACKPROTECTOR

--

Pekka Enberg

unread,
Jul 6, 2011, 5:40:02 AM7/6/11
to

Reviewed-by: Pekka Enberg <pen...@kernel.org>

Pekka Enberg

unread,
Jul 6, 2011, 5:50:01 AM7/6/11
to
On Tue, Jul 5, 2011 at 4:00 PM, Ian Campbell <ian.ca...@citrix.com> wrote:
> For no reason that I can determine 64 bit x86 saves the current eflags
> in cpu_init purely for use in ret_from_fork. The equivalent 32 bit
> code simply hard codes 0x0202 as the new EFLAGS which seems safer than
> relying on a potentially arbitrary EFLAGS saved during cpu_init.
>
> Original i386 changeset
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=47a5c6fa0e204a2b63309c648bb2fde36836c826
> Original x86_64 changset
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=658fdbef66e5e9be79b457edc2cbbb3add840aa9
>
> The only comment in the later indicates that it is following the
> former, but not why it differs in this way.

Well lets ask Linus and Andi?

Pekka Enberg

unread,
Jul 6, 2011, 5:50:02 AM7/6/11
to
On Tue, Jul 5, 2011 at 4:00 PM, Ian Campbell <ian.ca...@citrix.com> wrote:
> The 64 bit version resets EFLAGS before calling schedule_tail() and
> therefore leaves EFLAGS.IF clear. 32 bit resets EFLAGS after calling
> schedule_tail() and therefore leaves EFLAGS.IF set. I don't think
> there is any practical difference between the two approaches since
> interrupts are actually reenabled within schedule_tail
> (schedule_tail->finish_task_switch->finish_lock_switch->raw_spin_unlock_irq->...->local_irq_enable)
> so arbitrarily pick the 32 bit version and make 64 bit look like that.
>
> Signed-off-by: Ian Campbell <ian.ca...@citrix.com>
> Cc: x...@kernel.org

Reviewed-by: Pekka Enberg <pen...@kernel.org>

Cyrill Gorcunov

unread,
Jul 6, 2011, 5:50:02 AM7/6/11
to
On Wed, Jul 06, 2011 at 10:25:29AM +0100, Ian Campbell wrote:
...
> >
> > The whole series looks good to me, thanks Ian! I hope I don't
> > miss anything, so lets wait for more feedback ;)
>
> Thanks. Actually I missed the extern declaration in asm/processor.h so
> this updated patch should be used instead:
>

Great, thanks Ian. Have you a chance to run this series? ;)

Cyrill

Ian Campbell

unread,
Jul 6, 2011, 6:00:02 AM7/6/11
to
On Wed, 2011-07-06 at 10:46 +0100, Cyrill Gorcunov wrote:
> On Wed, Jul 06, 2011 at 10:25:29AM +0100, Ian Campbell wrote:
> ...
> > >
> > > The whole series looks good to me, thanks Ian! I hope I don't
> > > miss anything, so lets wait for more feedback ;)
> >
> > Thanks. Actually I missed the extern declaration in asm/processor.h so
> > this updated patch should be used instead:
> >
>
> Great, thanks Ian. Have you a chance to run this series? ;)

Yeah I booted it before I posted it. I figured that the initscripts etc
would be a sufficient test of fork() and that if it were wrong it would
fall over pretty quickly and obviously.

Ian.

Cyrill Gorcunov

unread,
Jul 6, 2011, 6:20:01 AM7/6/11
to
On Wed, Jul 06, 2011 at 10:57:05AM +0100, Ian Campbell wrote:
> On Wed, 2011-07-06 at 10:46 +0100, Cyrill Gorcunov wrote:
> > On Wed, Jul 06, 2011 at 10:25:29AM +0100, Ian Campbell wrote:
> > ...
> > > >
> > > > The whole series looks good to me, thanks Ian! I hope I don't
> > > > miss anything, so lets wait for more feedback ;)
> > >
> > > Thanks. Actually I missed the extern declaration in asm/processor.h so
> > > this updated patch should be used instead:
> > >
> >
> > Great, thanks Ian. Have you a chance to run this series? ;)
>
> Yeah I booted it before I posted it. I figured that the initscripts etc
> would be a sufficient test of fork() and that if it were wrong it would
> fall over pretty quickly and obviously.
>
> Ian.
>

OK, thanks!

Cyrill

Andi Kleen

unread,
Jul 6, 2011, 7:20:02 PM7/6/11
to
> Well lets ask Linus and Andi?

The original idea was that if someone extends EFLAGS with new bits
the BIOS could set them. Strictly said the Intel Manual
specifies that ("reserved bits -- always set to values previously read")

But that never really worked out and I don't think anyone
would dare extending it now. So hardcoding is fine I guess.

-Andi

H. Peter Anvin

unread,
Jul 6, 2011, 8:10:01 PM7/6/11
to
On 07/06/2011 04:17 PM, Andi Kleen wrote:
>> Well lets ask Linus and Andi?
>
> The original idea was that if someone extends EFLAGS with new bits
> the BIOS could set them. Strictly said the Intel Manual
> specifies that ("reserved bits -- always set to values previously read")
>
> But that never really worked out and I don't think anyone
> would dare extending it now. So hardcoding is fine I guess.
>

We already clear any undefined flags almost as soon as we enter the
kernel... and quite rightly so, because there are BIOSes which leave
random crap in the flags.

It is very different from what an *application* should do in this case,
however.

-hpa

0 new messages