[PATCH] x86/copy_user.S: Remove zero byte check before copy user buffer.

Fenghua Yu

unread,

Nov 16, 2013, 3:40:02 PM11/16/13

to

From: Fenghua Yu <fengh...@intel.com>

Operation of rep movsb instruction handles zero byte copy. As pointed out by
Linus, there is no need to check zero size in kernel. Removing this redundant
check saves a few cycles in copy user functions.

Signed-off-by: Fenghua Yu <fengh...@intel.com>
---
arch/x86/lib/copy_user_64.S | 8 ++------
1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index a30ca15..ffe4eb9 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -236,8 +236,6 @@ ENDPROC(copy_user_generic_unrolled)
ENTRY(copy_user_generic_string)
CFI_STARTPROC
ASM_STAC
- andl %edx,%edx
- jz 4f
cmpl $8,%edx
jb 2f /* less than 8 bytes, go to byte copy loop */
ALIGN_DESTINATION
@@ -249,7 +247,7 @@ ENTRY(copy_user_generic_string)
2: movl %edx,%ecx
3: rep
movsb
-4: xorl %eax,%eax
+ xorl %eax,%eax
ASM_CLAC
ret

@@ -279,12 +277,10 @@ ENDPROC(copy_user_generic_string)
ENTRY(copy_user_enhanced_fast_string)
CFI_STARTPROC
ASM_STAC
- andl %edx,%edx
- jz 2f
movl %edx,%ecx
1: rep
movsb
-2: xorl %eax,%eax
+ xorl %eax,%eax
ASM_CLAC
ret

--
1.8.0.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

tip-bot for Fenghua Yu

unread,

Nov 17, 2013, 1:30:01 AM11/17/13

to

Commit-ID: f4cb1cc18f364d761d5614eb6293cccc6647f259
Gitweb: http://git.kernel.org/tip/f4cb1cc18f364d761d5614eb6293cccc6647f259
Author: Fenghua Yu <fengh...@intel.com>
AuthorDate: Sat, 16 Nov 2013 12:37:01 -0800
Committer: H. Peter Anvin <h...@linux.intel.com>
CommitDate: Sat, 16 Nov 2013 18:00:58 -0800

x86-64, copy_user: Remove zero byte check before copy user buffer.

Operation of rep movsb instruction handles zero byte copy. As pointed out by
Linus, there is no need to check zero size in kernel. Removing this redundant
check saves a few cycles in copy user functions.

Reported-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Fenghua Yu <fengh...@intel.com>
Link: http://lkml.kernel.org/r/1384634221-6006-1-git...@intel.com
Signed-off-by: H. Peter Anvin <h...@linux.intel.com>

H. Peter Anvin

unread,

Nov 17, 2013, 2:00:01 AM11/17/13

to

On 11/16/2013 10:44 PM, Linus Torvalds wrote:
> So this doesn't do the 32-bit truncation in the error path of the generic
> string copy. Oversight?
>
> Linus

Indeed... although in the kernel it seems to be taken as an invariant
that copy lengths over 4G is simply prohibited. There are places all
over the kernel which will fail in a massive way if we ever ended up
with a copy over 4G in size.

As such, I would argue the code with the patch is actually no more
broken than with the truncation in place; if anything it is *more*
correct than the modified one, since for a (very small) subset of >=4G
copies it will actually do the right thing, albeit slowly.

The truncations do make me twitch a little inside, I have to admit.

-hpa

H. Peter Anvin

unread,

Nov 18, 2013, 11:40:02 PM11/18/13

to

On 11/16/2013 10:44 PM, Linus Torvalds wrote:
> So this doesn't do the 32-bit truncation in the error path of the generic
> string copy. Oversight?
>
> Linus

Hi Linus,

Do you have a preference:

1. Considering the 32-bit truncation incidental (take it or leave it);
2. Require the 32-bit truncation, or
3. Get rid of it completely?

-hpa

Linus Torvalds

unread,

Nov 19, 2013, 2:40:02 PM11/19/13

to

On Mon, Nov 18, 2013 at 8:37 PM, H. Peter Anvin <h...@zytor.com> wrote:
>
> Do you have a preference:
>
> 1. Considering the 32-bit truncation incidental (take it or leave it);
> 2. Require the 32-bit truncation, or
> 3. Get rid of it completely?

I don't have a huge preference, but I hate the current situation (with
Fenghua's patch) where it's not consistent. One path uses just 32-bits
of the count (thanks to the "mov %edx,%ecx") while another path uses
64 bits.

One or the other, but not a mixture of both.

And only tangentially related to this: I do think that we could be
stricter about the count. Make it oops if the high bits are set,
rather than overwrite a lot of memory. So I would not be adverse to
limiting the count to 31 bits (or even less) explicitly, and thus
making the while 32-vs-64 bit issue moot.

Linus

H. Peter Anvin

unread,

Nov 20, 2013, 2:20:01 PM11/20/13

to

On 11/19/2013 11:38 AM, Linus Torvalds wrote:
> On Mon, Nov 18, 2013 at 8:37 PM, H. Peter Anvin <h...@zytor.com> wrote:
>>
>> Do you have a preference:
>>
>> 1. Considering the 32-bit truncation incidental (take it or leave it);
>> 2. Require the 32-bit truncation, or
>> 3. Get rid of it completely?
>
> I don't have a huge preference, but I hate the current situation (with
> Fenghua's patch) where it's not consistent. One path uses just 32-bits
> of the count (thanks to the "mov %edx,%ecx") while another path uses
> 64 bits.
>
> One or the other, but not a mixture of both.
>
> And only tangentially related to this: I do think that we could be
> stricter about the count. Make it oops if the high bits are set,
> rather than overwrite a lot of memory. So I would not be adverse to
> limiting the count to 31 bits (or even less) explicitly, and thus
> making the while 32-vs-64 bit issue moot.
>

I guess the question is if we want to spend the extra cycles on a test
and branch. For now I suggest that we just put back the truncation in
the form of a movl instruction.

-hpa

H. Peter Anvin

unread,

Nov 20, 2013, 2:30:02 PM11/20/13

to

On 11/16/2013 10:44 PM, Linus Torvalds wrote:
> So this doesn't do the 32-bit truncation in the error path of the
> generic string copy. Oversight?
>
> Linus

I looked at the code again, and it turns out to be false alarm.

We *do* do 32-bit truncation in every path, still:

> ENTRY(copy_user_generic_string)
> CFI_STARTPROC
> ASM_STAC

> cmpl $8,%edx
> jb 2f /* less than 8 bytes, go to byte copy loop */

-> If we jump here, we will truncate at 2:

> ALIGN_DESTINATION
> movl %edx,%ecx

-> If we don't jb 2f we end up

> shrl $3,%ecx

32-bit truncation here...

> andl $7,%edx
> 1: rep
> movsq
> 2: movl %edx,%ecx

32-bit truncation here...

> 3: rep
> movsb
> xorl %eax,%eax
> ASM_CLAC
> ret
>
> .section .fixup,"ax"
> 11: lea (%rdx,%rcx,8),%rcx
> 12: movl %ecx,%edx /* ecx is zerorest also */

-> Even if %rdx+%rcx*8 > 2^32 we end up truncating at 12: -- not that it
matters, since both arguments are prototyped as "unsigned" and therefore
the C compiler is supposed to guarantee the upper 32 bits are ignored.

So I think Fenghua's patch is fine as-is.

-hpa

Linus Torvalds

unread,

Nov 20, 2013, 3:20:02 PM11/20/13

to

On Wed, Nov 20, 2013 at 11:28 AM, H. Peter Anvin <h...@linux.intel.com> wrote:
>>
>> .section .fixup,"ax"
>> 11: lea (%rdx,%rcx,8),%rcx
>> 12: movl %ecx,%edx /* ecx is zerorest also */
>
> -> Even if %rdx+%rcx*8 > 2^32 we end up truncating at 12: -- not that it
> matters, since both arguments are prototyped as "unsigned" and therefore
> the C compiler is supposed to guarantee the upper 32 bits are ignored.

Ahh. That was the one I thought was broken, but yes, while the upper
bits of %rcx are calculated and not zeroed, they end up not actually
getting used. So yeah, I'll believe it's correct.

Linus

H. Peter Anvin

unread,

Nov 20, 2013, 3:40:02 PM11/20/13

to

On 11/20/2013 12:13 PM, Linus Torvalds wrote:
> On Wed, Nov 20, 2013 at 11:28 AM, H. Peter Anvin <h...@linux.intel.com> wrote:
>>>
>>> .section .fixup,"ax"
>>> 11: lea (%rdx,%rcx,8),%rcx
>>> 12: movl %ecx,%edx /* ecx is zerorest also */
>>
>> -> Even if %rdx+%rcx*8 > 2^32 we end up truncating at 12: -- not that it
>> matters, since both arguments are prototyped as "unsigned" and therefore
>> the C compiler is supposed to guarantee the upper 32 bits are ignored.
>
> Ahh. That was the one I thought was broken, but yes, while the upper
> bits of %rcx are calculated and not zeroed, they end up not actually
> getting used. So yeah, I'll believe it's correct.
>

That being said, "lea (%rdx,%rcx,8),%ecx" (leal, as opposed to leaq) is
a perfectly legitimate instruction and actually one byte shorter. The
big question is if some broken version of gas will choke on it.

-hpa

tip-bot for H. Peter Anvin

unread,

Nov 20, 2013, 4:00:02 PM11/20/13

to

Commit-ID: 372474e12a858807f03f73cb30e830a76fd1ae07
Gitweb: http://git.kernel.org/tip/372474e12a858807f03f73cb30e830a76fd1ae07
Author: H. Peter Anvin <h...@linux.intel.com>
AuthorDate: Wed, 20 Nov 2013 12:50:51 -0800

Committer: H. Peter Anvin <h...@linux.intel.com>

CommitDate: Wed, 20 Nov 2013 12:50:51 -0800

x86-64, copy_user: Use leal to produce 32-bit results

When we are using lea to produce a 32-bit result, we can use the leal
form, rather than using leaq and worry about truncation elsewhere.

Cc: Fenghua Yu <fengh...@intel.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>

Link: http://lkml.kernel.org/r/1384634221-6006-1-git...@intel.com
Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
---

arch/x86/lib/copy_user_64.S | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index ffe4eb9..f00a87c 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S
@@ -186,7 +186,7 @@ ENTRY(copy_user_generic_unrolled)
30: shll $6,%ecx
addl %ecx,%edx
jmp 60f
-40: lea (%rdx,%rcx,8),%rdx
+40: lea (%rdx,%rcx,8),%edx
jmp 60f
50: movl %ecx,%edx
60: jmp copy_user_handle_tail /* ecx is zerorest also */
@@ -252,7 +252,7 @@ ENTRY(copy_user_generic_string)
ret

.section .fixup,"ax"
-11: lea (%rdx,%rcx,8),%rcx
+11: lea (%rdx,%rcx,8),%ecx

12: movl %ecx,%edx /* ecx is zerorest also */

jmp copy_user_handle_tail
.previous

Linus Torvalds

unread,

Nov 20, 2013, 4:50:02 PM11/20/13

to

On Wed, Nov 20, 2013 at 12:36 PM, H. Peter Anvin <h...@zytor.com> wrote:
>
> That being said, "lea (%rdx,%rcx,8),%ecx" (leal, as opposed to leaq) is
> a perfectly legitimate instruction and actually one byte shorter. The
> big question is if some broken version of gas will choke on it.

At least gcc-4.8.2 generates that instruction (try multiplying an
integer value by 9), so I guess gas will be happy. Except gcc uses
"leal", so to be safe..

Linus

tip-bot for H. Peter Anvin

unread,

Nov 20, 2013, 5:10:02 PM11/20/13

to

Commit-ID: 661c80192d21269c7fc566f1d547510b0c867677
Gitweb: http://git.kernel.org/tip/661c80192d21269c7fc566f1d547510b0c867677

Author: H. Peter Anvin <h...@linux.intel.com>
AuthorDate: Wed, 20 Nov 2013 12:50:51 -0800
Committer: H. Peter Anvin <h...@linux.intel.com>

CommitDate: Wed, 20 Nov 2013 13:57:07 -0800

x86-64, copy_user: Use leal to produce 32-bit results

When we are using lea to produce a 32-bit result, we can use the leal
form, rather than using leaq and worry about truncation elsewhere.

Make the leal explicit, both to be more obvious and since that is what
gcc generates and thus is less likely to trigger obscure gas bugs.

Cc: Fenghua Yu <fengh...@intel.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Link: http://lkml.kernel.org/r/1384634221-6006-1-git...@intel.com
Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
---
arch/x86/lib/copy_user_64.S | 4 ++--

1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/copy_user_64.S b/arch/x86/lib/copy_user_64.S
index ffe4eb9..dee945d 100644
--- a/arch/x86/lib/copy_user_64.S
+++ b/arch/x86/lib/copy_user_64.S

@@ -186,7 +186,7 @@ ENTRY(copy_user_generic_unrolled)
30: shll $6,%ecx
addl %ecx,%edx
jmp 60f
-40: lea (%rdx,%rcx,8),%rdx

+40: leal (%rdx,%rcx,8),%edx

jmp 60f
50: movl %ecx,%edx
60: jmp copy_user_handle_tail /* ecx is zerorest also */
@@ -252,7 +252,7 @@ ENTRY(copy_user_generic_string)
ret

.section .fixup,"ax"
-11: lea (%rdx,%rcx,8),%rcx

+11: leal (%rdx,%rcx,8),%ecx

12: movl %ecx,%edx /* ecx is zerorest also */
jmp copy_user_handle_tail
.previous