[ 087/184] xfrm_user: return error pointer instead of NULL

Willy Tarreau

unread,

Jun 4, 2013, 6:50:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit 864745d291b5ba80ea0bd0edcbe67273de368836 upstream.

When dump_one_state() returns an error, e.g. because of a too small
buffer to dump the whole xfrm state, xfrm_state_netlink() returns NULL
instead of an error pointer. But its callers expect an error pointer
and therefore continue to operate on a NULL skbuff.

This could lead to a privilege escalation (execution of user code in
kernel context) if the attacker has CAP_NET_ADMIN and is able to map
address 0.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Acked-by: Steffen Klassert <steffen....@secunet.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
net/xfrm/xfrm_user.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index a8d83c4..dff20ac 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -647,6 +647,7 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
{
struct xfrm_dump_info info;
struct sk_buff *skb;
+ int err;

skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_ATOMIC);
if (!skb)
@@ -657,9 +658,10 @@ static struct sk_buff *xfrm_state_netlink(struct sk_buff *in_skb,
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;

- if (dump_one_state(x, 0, &info)) {
+ err = dump_one_state(x, 0, &info);
+ if (err) {
kfree_skb(skb);
- return NULL;
+ return ERR_PTR(err);
}

return skb;
--
1.7.12.2.21.g234cd45.dirty

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Willy Tarreau

unread,

Jun 4, 2013, 6:50:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.M...@netapp.com>

On Tue, 2010-12-14 at 16:58 +0800, Mi Jinlong wrote:
> Hi,
>
> When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
> at NFS client's __rpc_create_common function.
>
> The panic place is:
> rpc_mkpipe
> __rpc_lookup_create() <=== find pipefile *idmap*
> __rpc_mkpipe() <=== pipefile is *idmap*
> __rpc_create_common()
> ****** BUG_ON(!d_unhashed(dentry)); ****** *panic*
>
> It means that the dentry's d_flags have be set DCACHE_UNHASHED,
> but it should not be set here.
>
> Is someone known this bug? or give me some idea?
>
> A reproduce program is append, but it can't reproduce the bug every time.
> the export is: "/nfsroot *(rw,no_root_squash,fsid=0,insecure)"
>
> And the panic message is append.
>
> ============================================================================
> #!/bin/sh
>
> LOOPTOTAL=768
> LOOPCOUNT=0
> ret=0
>
> while [ $LOOPCOUNT -ne $LOOPTOTAL ]
> do
> ((LOOPCOUNT += 1))
> service nfs restart
> /usr/sbin/rpc.idmapd
> mount -t nfs4 127.0.0.1:/ /mnt|| return 1;
> ls -l /var/lib/nfs/rpc_pipefs/nfs/*/
> umount /mnt
> echo $LOOPCOUNT
> done
>
> ===============================================================================
> Code: af 60 01 00 00 89 fa 89 f0 e8 64 cf 89 f0 e8 5c 7c 64 cf 31 c0 8b 5c 24 10 8b
> 74 24 14 8b 7c 24 18 8b 6c 24 1c 83 c4 20 c3 <0f> 0b eb fc 8b 46 28 c7 44 24 08 20
> de ee f0 c7 44 24 04 56 ea
> EIP:[<f0ee92ea>] __rpc_create_common+0x8a/0xc0 [sunrpc] SS:ESP 0068:eccb5d28
> ---[ end trace 8f5606cd08928ed2]---
> Kernel panic - not syncing: Fatal exception
> Pid:7131, comm: mount.nfs4 Tainted: G D -------------------2.6.32 #1
> Call Trace:
> [<c080ad18>] ? panic+0x42/0xed
> [<c080e42c>] ? oops_end+0xbc/0xd0
> [<c040b090>] ? do_invalid_op+0x0/0x90
> [<c040b10f>] ? do_invalid_op+0x7f/0x90
> [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [<f0edc433>] ? rpc_free_task+0x33/0x70[sunrpc]
> [<f0ed6508>] ? prc_call_sync+0x48/0x60[sunrpc]
> [<f0ed656e>] ? rpc_ping+0x4e/0x60[sunrpc]
> [<f0ed6eaf>] ? rpc_create+0x38f/0x4f0[sunrpc]
> [<c080d80b>] ? error_code+0x73/0x78
> [<f0ee92ea>] ? __rpc_create_common+0x8a/0xc0[sunrpc]
> [<c0532bda>] ? d_lookup+0x2a/0x40
> [<f0ee94b1>] ? rpc_mkpipe+0x111/0x1b0[sunrpc]
> [<f10a59f4>] ? nfs_create_rpc_client+0xb4/0xf0[nfs]
> [<f10d6c6d>] ? nfs_fscache_get_client_cookie+0x1d/0x50[nfs]
> [<f10d3fcb>] ? nfs_idmap_new+0x7b/0x140[nfs]
> [<c05e76aa>] ? strlcpy+0x3a/0x60
> [<f10a60ca>] ? nfs4_set_client+0xea/0x2b0[nfs]
> [<f10a6d0c>] ? nfs4_create_server+0xac/0x1b0[nfs]
> [<c04f1400>] ? krealloc+0x40/0x50
> [<f10b0e8b>] ? nfs4_remote_get_sb+0x6b/0x250[nfs]
> [<c04f14ec>] ? kstrdup+0x3c/0x60
> [<c0520739>] ? vfs_kern_mount+0x69/0x170
> [<f10b1a3c>] ? nfs_do_root_mount+0x6c/0xa0[nfs]
> [<f10b1b47>] ? nfs4_try_mount+0x37/0xa0[nfs]
> [<f10afe6d>] ? nfs4_validate_text_mount_data+-x7d/0xf0[nfs]
> [<f10b1c42>] ? nfs4_get_sb+0x92/0x2f0
> [<c0520739>] ? vfs_kern_mount+0x69/0x170
> [<c05366d2>] ? get_fs_type+0x32/0xb0
> [<c052089f>] ? do_kern_mount+0x3f/0xe0
> [<c053954f>] ? do_mount+0x2ef/0x740
> [<c0537740>] ? copy_mount_options+0xb0/0x120
> [<c0539a0e>] ? sys_mount+0x6e/0xa0

Hi,

Does the following patch fix the problem?

Cheers
Trond

--------------------------
SUNRPC: Fix a BUG in __rpc_create_common

From: Trond Myklebust <Trond.M...@netapp.com>

Mi Jinlong reports:

When testing NFSv4 at RHEL6 with kernel 2.6.32, I got a kernel panic
at NFS client's __rpc_create_common function.

The panic place is:
rpc_mkpipe
__rpc_lookup_create() <=== find pipefile *idmap*
__rpc_mkpipe() <=== pipefile is *idmap*
__rpc_create_common()
****** BUG_ON(!d_unhashed(dentry)); ****** *panic*

The test is wrong: we can find ourselves with a hashed negative dentry here
if the idmapper tried to look up the file before we got round to creating
it.

Just replace the BUG_ON() with a d_drop(dentry).

[2.6.32 background info from Jonathan below]
> Hi Willy et al,
>
> Please consider
>
> beb0f0a9fba1 kernel panic when mount NFSv4, 2010-12-20
>
> for application to kernel.org's 2.6.32.y and 2.6.34.y trees. The
> patch was applied upstream during the 2.6.38 merge window, so newer
> kernels don't need it.
>
> (Context: <http://bugs.debian.org/695872>.) Tom Downes (cc-ed)
> experienced the bug on a Debian kernel close to 2.6.32.58 and
> confirmed that the patch doesn't seem to hurt.
>
> The patch is part of Fedora 13's 2.6.34-based and Fedora 14's
> 2.6.35-based kernels[1]. It was also included in the RHEL kernel at
> some point between 2.6.32-71.29.1.el6 and 2.6.32-131.0.15.el6[2].
>
> Thoughts of all kinds welcome, as always.
>
> Regards,
> Jonathan
>
> [1] https://bugzilla.redhat.com/673207
> [2] https://oss.oracle.com/git/?p=redpatch.git;a=commit;h=8028cccdc4b1

Reported-by: Mi Jinlong <miji...@cn.fujitsu.com>
Signed-off-by: Trond Myklebust <Trond.M...@netapp.com>
(cherry picked from commit beb0f0a9fba1fa98b378329a9a5b0a73f25097ae)
Cc: Jonathan Nieder <jrni...@gmail.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sunrpc/rpc_pipe.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/rpc_pipe.c b/net/sunrpc/rpc_pipe.c
index ea1e6de..43aa601 100644
--- a/net/sunrpc/rpc_pipe.c
+++ b/net/sunrpc/rpc_pipe.c
@@ -459,7 +459,7 @@ static int __rpc_create_common(struct inode *dir, struct dentry *dentry,
{
struct inode *inode;

- BUG_ON(!d_unhashed(dentry));
+ d_drop(dentry);
inode = rpc_get_inode(dir->i_sb, mode);
if (!inode)
goto out_err;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

unmount

From: Eric Sandeen <san...@redhat.com>

commit bc178622d40d87e75abc131007342429c9b03351 upstream.

Doing this would reliably fail with -EBUSY for me:

# mount /dev/sdb2 /mnt/scratch; umount /mnt/scratch; mkfs.btrfs -f /dev/sdb2
...
unable to open /dev/sdb2: Device or resource busy

because mkfs.btrfs tries to open the device O_EXCL, and somebody still has it.

Using systemtap to track bdev gets & puts shows a kworker thread doing a
blkdev put after mkfs attempts a get; this is left over from the unmount
path:

btrfs_close_devices
__btrfs_close_devices
call_rcu(&device->rcu, free_device);
free_device
INIT_WORK(&device->rcu_work, __free_device);
schedule_work(&device->rcu_work);

so unmount might complete before __free_device fires & does its blkdev_put.

Adding an rcu_barrier() to btrfs_close_devices() causes unmount to wait
until all blkdev_put()s are done, and the device is truly free once
unmount completes.

Signed-off-by: Eric Sandeen <san...@redhat.com>
Signed-off-by: Josef Bacik <jba...@fusionio.com>
Signed-off-by: Chris Mason <chris...@fusionio.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/btrfs/volumes.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 5d56a8d..6190a10 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -557,6 +557,12 @@ int btrfs_close_devices(struct btrfs_fs_devices *fs_devices)
__btrfs_close_devices(fs_devices);
free_fs_devices(fs_devices);
}
+ /*
+ * Wait for rcu kworkers under __btrfs_close_devices
+ * to finish all blkdev_puts so device is really
+ * free when umount is done.
+ */
+ rcu_barrier();
return ret;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

files

From: Dmitry Monakhov <dmon...@openvz.org>

commit f066055a3449f0e5b0ae4f3ceab4445bead47638 upstream.

Proper block swap for inodes with full journaling enabled is
truly non obvious task. In order to be on a safe side let's
explicitly disable it for now.

Signed-off-by: Dmitry Monakhov <dmon...@openvz.org>
Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/move_extent.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index fe81390..da25617 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1208,7 +1208,12 @@ ext4_move_extents(struct file *o_filp, struct file *d_filp,
orig_inode->i_ino, donor_inode->i_ino);
return -EINVAL;
}
-
+ /* TODO: This is non obvious task to swap blocks for inodes with full
+ jornaling enabled */
+ if (ext4_should_journal_data(orig_inode) ||
+ ext4_should_journal_data(donor_inode)) {
+ return -EINVAL;
+ }
/* Protect orig and donor inodes against a truncate */
ret1 = mext_inode_double_lock(orig_inode, donor_inode);
if (ret1 < 0)

Willy Tarreau

unread,

Jun 4, 2013, 6:50:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

make_indexed_dir() fails

From: Allison Henderson <ache...@linux.vnet.ibm.com>

Fix for a null pointer bug found while running punch hole tests

Signed-off-by: Allison Henderson <ache...@us.ibm.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

(cherry picked from commit 6976a6f2acde2b0443cd64f1d08af90630e4ce81)

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/namei.c | 6 ++++--

1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index afe3148..902f69b 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1457,6 +1457,10 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
frame->at = entries;
frame->bh = bh;
bh = bh2;
+
+ ext4_handle_dirty_metadata(handle, dir, frame->bh);
+ ext4_handle_dirty_metadata(handle, dir, bh);
+
de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
if (!de) {
/*
@@ -1465,8 +1469,6 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
* with corrupted filesystem.
*/
ext4_mark_inode_dirty(handle, dir);
- ext4_handle_dirty_metadata(handle, dir, frame->bh);
- ext4_handle_dirty_metadata(handle, dir, bh);
dx_release(frames);
return retval;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

OSXSAVE bit set (CVE-2012-4461)

From: Petr Matousek <pmat...@redhat.com>

commit 6d1068b3a98519247d8ba4ec85cd40ac136dbdf9 upstream.

On hosts without the XSAVE support unprivileged local user can trigger
oops similar to the one below by setting X86_CR4_OSXSAVE bit in guest
cr4 register using KVM_SET_SREGS ioctl and later issuing KVM_RUN
ioctl.

invalid opcode: 0000 [#2] SMP
Modules linked in: tun ip6table_filter ip6_tables ebtable_nat ebtables
...
Pid: 24935, comm: zoog_kvm_monito Tainted: G D 3.2.0-3-686-pae
EIP: 0060:[<f8b9550c>] EFLAGS: 00210246 CPU: 0
EIP is at kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm]
EAX: 00000001 EBX: 000f387e ECX: 00000000 EDX: 00000000
ESI: 00000000 EDI: 00000000 EBP: ef5a0060 ESP: d7c63e70
DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Process zoog_kvm_monito (pid: 24935, ti=d7c62000 task=ed84a0c0
task.ti=d7c62000)
Stack:
00000001 f70a1200 f8b940a9 ef5a0060 00000000 00200202 f8769009 00000000
ef5a0060 000f387e eda5c020 8722f9c8 00015bae 00000000 ed84a0c0 ed84a0c0
c12bf02d 0000ae80 ef7f8740 fffffffb f359b740 ef5a0060 f8b85dc1 0000ae80
Call Trace:
[<f8b940a9>] ? kvm_arch_vcpu_ioctl_set_sregs+0x2fe/0x308 [kvm]
...
[<c12bfb44>] ? syscall_call+0x7/0xb
Code: 89 e8 e8 14 ee ff ff ba 00 00 04 00 89 e8 e8 98 48 ff ff 85 c0 74
1e 83 7d 48 00 75 18 8b 85 08 07 00 00 31 c9 8b 95 0c 07 00 00 <0f> 01
d1 c7 45 48 01 00 00 00 c7 45 1c 01 00 00 00 0f ae f0 89
EIP: [<f8b9550c>] kvm_arch_vcpu_ioctl_run+0x92a/0xd13 [kvm] SS:ESP
0068:d7c63e70

QEMU first retrieves the supported features via KVM_GET_SUPPORTED_CPUID
and then sets them later. So guest's X86_FEATURE_XSAVE should be masked
out on hosts without X86_FEATURE_XSAVE, making kvm_set_cr4 with
X86_CR4_OSXSAVE fail. Userspaces that allow specifying guest cpuid with
X86_FEATURE_XSAVE even on hosts that do not support it, might be
susceptible to this attack from inside the guest as well.

Allow setting X86_CR4_OSXSAVE bit only if host has XSAVE support.

Signed-off-by: Petr Matousek <pmat...@redhat.com>
Signed-off-by: Marcelo Tosatti <mtos...@redhat.com>
[bwh: Backported to 2.6.32: XSAVE is not supported at all, so always
deny setting OSXSAVE]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/kvm/x86.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79905f2..ec9728f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4719,6 +4719,9 @@ int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu,
int pending_vec, max_bits;
struct descriptor_table dt;

+ if (sregs->cr4 & X86_CR4_OSXSAVE)
+ return -EINVAL;
+
vcpu_load(vcpu);

dt.limit = sregs->idt.limit;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

iucv_sock_recvmsg()

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit a5598bd9c087dc0efc250a5221e5d0e6f584ee88 ]

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared
about iucv_sock_recvmsg() not filling the msg_name in case it was set.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Ursula Braun <ursula...@de.ibm.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/iucv/af_iucv.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/net/iucv/af_iucv.c b/net/iucv/af_iucv.c
index bada1b9..f605b23 100644
--- a/net/iucv/af_iucv.c
+++ b/net/iucv/af_iucv.c
@@ -1160,6 +1160,8 @@ static int iucv_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
struct sk_buff *skb, *rskb, *cskb;
int err = 0;

+ msg->msg_namelen = 0;
+
if ((sk->sk_state == IUCV_DISCONN || sk->sk_state == IUCV_SEVERED) &&
skb_queue_empty(&iucv->backlog_skb_q) &&
skb_queue_empty(&sk->sk_receive_queue) &&

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jho...@gmail.com>

commit 618aa1068df29c37a58045fe940f9106664153fd upstream.

Remove bogus disconnect test introduced by 95bef012e ("USB: more serial
drivers writing after disconnect") which prevented queued data from
being freed on disconnect.

The possible IO it was supposed to prevent is long gone.

Signed-off-by: Johan Hovold <jho...@gmail.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/garmin_gps.c | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/drivers/usb/serial/garmin_gps.c b/drivers/usb/serial/garmin_gps.c
index 867d97b..7c3ac7b 100644
--- a/drivers/usb/serial/garmin_gps.c
+++ b/drivers/usb/serial/garmin_gps.c
@@ -974,10 +974,7 @@ static void garmin_close(struct usb_serial_port *port)
if (!serial)
return;

- mutex_lock(&port->serial->disc_mutex);
-
- if (!port->serial->disconnected)
- garmin_clear(garmin_data_p);
+ garmin_clear(garmin_data_p);

/* shutdown our urbs */
usb_kill_urb(port->read_urb);
@@ -986,8 +983,6 @@ static void garmin_close(struct usb_serial_port *port)
/* keep reset state so we know that we must start a new session */
if (garmin_data_p->state != STATE_RESET)
garmin_data_p->state = STATE_DISCONNECTED;
-
- mutex_unlock(&port->serial->disc_mutex);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

when copy from user space fails

From: Tommi Rantala <tt.ra...@gmail.com>

[ Upstream commit be364c8c0f17a3dd42707b5a090b318028538eb9 ]

Trinity (the syscall fuzzer) discovered a memory leak in SCTP,
reproducible e.g. with the sendto() syscall by passing invalid
user space pointer in the second argument:

#include <string.h>
#include <arpa/inet.h>
#include <sys/socket.h>

int main(void)
{
int fd;
struct sockaddr_in sa;

fd = socket(AF_INET, SOCK_STREAM, 132 /*IPPROTO_SCTP*/);
if (fd < 0)
return 1;

memset(&sa, 0, sizeof(sa));
sa.sin_family = AF_INET;
sa.sin_addr.s_addr = inet_addr("127.0.0.1");
sa.sin_port = htons(11111);

sendto(fd, NULL, 1, 0, (struct sockaddr *)&sa, sizeof(sa));

return 0;
}

As far as I can tell, the leak has been around since ~2003.

Signed-off-by: Tommi Rantala <tt.ra...@gmail.com>
Acked-by: Vlad Yasevich <vyas...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sctp/chunk.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/sctp/chunk.c b/net/sctp/chunk.c
index acf7c4d..b29621d 100644
--- a/net/sctp/chunk.c
+++ b/net/sctp/chunk.c
@@ -272,7 +272,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
goto errout;
err = sctp_user_addto_chunk(chunk, offset, len, msgh->msg_iov);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;

offset += len;

@@ -308,7 +308,7 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,
__skb_pull(chunk->skb, (__u8 *)chunk->chunk_hdr
- (__u8 *)chunk->skb->data);
if (err < 0)
- goto errout;
+ goto errout_chunk_free;

sctp_datamsg_assign(msg, chunk);
list_add_tail(&chunk->frag_list, &msg->chunks);
@@ -316,6 +316,9 @@ struct sctp_datamsg *sctp_datamsg_from_user(struct sctp_association *asoc,

return msg;

+errout_chunk_free:
+ sctp_chunk_free(chunk);
+
errout:
list_for_each_safe(pos, temp, &msg->chunks) {
list_del_init(pos);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

tpacket_destruct_skb

From: "danbo...@iogearbox.net" <danbo...@iogearbox.net>

[ Upstream commit 7f5c3e3a80e6654cf48dfba7cf94f88c6b505467 ]

Here's a quote of the comment about the BUG macro from asm-generic/bug.h:

Don't use BUG() or BUG_ON() unless there's really no way out; one
example might be detecting data structure corruption in the middle
of an operation that can't be backed out of. If the (sub)system
can somehow continue operating, perhaps with reduced functionality,
it's probably not BUG-worthy.

If you're tempted to BUG(), think again: is completely giving up
really the *only* solution? There are usually better options, where
users don't need to reboot ASAP and can mostly shut down cleanly.

In our case, the status flag of a ring buffer slot is managed from both sides,
the kernel space and the user space. This means that even though the kernel
side might work as expected, the user space screws up and changes this flag
right between the send(2) is triggered when the flag is changed to
TP_STATUS_SENDING and a given skb is destructed after some time. Then, this
will hit the BUG macro. As David suggested, the best solution is to simply
remove this statement since it cannot be used for kernel side internal
consistency checks. I've tested it and the system still behaves /stable/ in
this case, so in accordance with the above comment, we should rather remove it.

Signed-off-by: Daniel Borkmann <daniel....@tik.ee.ethz.ch>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/packet/af_packet.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 35cfa79..728c080 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -828,7 +828,6 @@ static void tpacket_destruct_skb(struct sk_buff *skb)

if (likely(po->tx_ring.pg_vec)) {
ph = skb_shinfo(skb)->destructor_arg;
- BUG_ON(__packet_get_status(po, ph) != TP_STATUS_SENDING);
BUG_ON(atomic_read(&po->tx_ring.pending) == 0);
atomic_dec(&po->tx_ring.pending);
__packet_set_status(po, ph, TP_STATUS_AVAILABLE);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <bro...@redhat.com>

Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len. And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver. This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
Signed-off-by: Patrick McHardy <ka...@trash.net>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
(cherry picked from commit 590e3f79a21edd2e9857ac3ced25ba6b2a491ef8)

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/netfilter/ipvs/ip_vs_xmit.c | 20 +++++++++++++++-----
1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index dd7da3c..5be9140 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -64,6 +64,15 @@ __ip_vs_dst_check(struct ip_vs_dest *dest, u32 rtos, u32 cookie)
return dst;
}

+static inline bool
+__mtu_check_toobig_v6(const struct sk_buff *skb, u32 mtu)
+{
+ if (skb->len > mtu && !skb_is_gso(skb)) {
+ return true; /* Packet size violate MTU size */
+ }
+ return false;
+}
+
static struct rtable *
__ip_vs_get_out_rt(struct ip_vs_conn *cp, u32 rtos)
{
@@ -310,7 +319,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu && !skb_is_gso(skb)) {
+ if (__mtu_check_toobig_v6(skb, mtu)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -453,7 +462,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu && !skb_is_gso(skb)) {
+ if (__mtu_check_toobig_v6(skb, mtu)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL_PKT(0, pp, skb, 0,
@@ -672,7 +681,8 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
if (skb_dst(skb))
skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);

- if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) && !skb_is_gso(skb)) {
+ /* MTU checking: Notice that 'mtu' have been adjusted before hand */
+ if (__mtu_check_toobig_v6(skb, mtu)) {
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
dst_release(&rt->u.dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -814,7 +824,7 @@ ip_vs_dr_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu) {
+ if (__mtu_check_toobig_v6(skb, mtu)) {
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
dst_release(&rt->u.dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -964,7 +974,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu && !skb_is_gso(skb)) {
+ if (__mtu_check_toobig_v6(skb, mtu)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Matthew Garrett <m...@redhat.com>

commit e955a1cd086de4d165ae0f4c7be7289d84b63bdc upstream.

My test platform (Intel DX79SI) boots reliably under BIOS, but frequently
crashes when booting via UEFI. I finally tracked this down to the xhci
handoff code. It seems that reads from the device occasionally just return
0xff, resulting in xhci_find_next_cap_offset generating a value that's
larger than the resource region. We then oops when attempting to read the
value. Sanity checking that value lets us avoid the crash.

I've no idea what's causing the underlying problem, and xhci still doesn't
actually *work* even with this, but the machine at least boots which will
probably make further debugging easier.

This should be backported to kernels as old as 2.6.31, that contain the
commit 66d4eadd8d067269ea8fead1a50fe87c2979a80d "USB: xhci: BIOS handoff
and HW initialization."

Signed-off-by: Matthew Garrett <m...@redhat.com>
Signed-off-by: Sarah Sharp <sarah....@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/host/pci-quirks.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/host/pci-quirks.c b/drivers/usb/host/pci-quirks.c
index 981b604..01e7fae 100644
--- a/drivers/usb/host/pci-quirks.c
+++ b/drivers/usb/host/pci-quirks.c
@@ -418,12 +418,12 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
void __iomem *op_reg_base;
u32 val;
int timeout;
+ int len = pci_resource_len(pdev, 0);

if (!mmio_resource_enabled(pdev, 0))
return;

- base = ioremap_nocache(pci_resource_start(pdev, 0),
- pci_resource_len(pdev, 0));
+ base = ioremap_nocache(pci_resource_start(pdev, 0), len);
if (base == NULL)
return;

@@ -433,9 +433,17 @@ static void __devinit quirk_usb_handoff_xhci(struct pci_dev *pdev)
*/
ext_cap_offset = xhci_find_next_cap_offset(base, XHCI_HCC_PARAMS_OFFSET);
do {
+ if ((ext_cap_offset + sizeof(val)) > len) {
+ /* We're reading garbage from the controller */
+ dev_warn(&pdev->dev,
+ "xHCI controller failing to respond");
+ return;
+ }
+
if (!ext_cap_offset)
/* We've reached the end of the extended capabilities */
goto hc_init;
+
val = readl(base + ext_cap_offset);
if (XHCI_EXT_CAPS_ID(val) == XHCI_EXT_CAPS_LEGACY)
break;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

__reserve_region_with_split()

From: T Makphaibulchoke <tm...@hp.com>

commit 4965f5667f36a95b41cda6638875bc992bd7d18b upstream.

Using a recursive call add a non-conflicting region in
__reserve_region_with_split() could result in a stack overflow in the case
that the recursive calls are too deep. Convert the recursive calls to an
iterative loop to avoid the problem.

Tested on a machine containing 135 regions. The kernel no longer panicked
with stack overflow.

Also tested with code arbitrarily adding regions with no conflict,
embedding two consecutive conflicts and embedding two non-consecutive
conflicts.

Signed-off-by: T Makphaibulchoke <tm...@hp.com>
Reviewed-by: Ram Pai <linu...@us.ibm.com>
Cc: Paul Gortmaker <paul.go...@gmail.com>
Cc: Wei Yang <wei...@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Cc: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/resource.c | 50 ++++++++++++++++++++++++++++++++++++++------------
1 file changed, 38 insertions(+), 12 deletions(-)

diff --git a/kernel/resource.c b/kernel/resource.c
index fb11a58..207915a 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -533,6 +533,7 @@ static void __init __reserve_region_with_split(struct resource *root,
struct resource *parent = root;
struct resource *conflict;
struct resource *res = kzalloc(sizeof(*res), GFP_ATOMIC);
+ struct resource *next_res = NULL;

if (!res)
return;
@@ -542,21 +543,46 @@ static void __init __reserve_region_with_split(struct resource *root,
res->end = end;
res->flags = IORESOURCE_BUSY;

- conflict = __request_resource(parent, res);
- if (!conflict)
- return;
+ while (1) {

- /* failed, split and try again */
- kfree(res);
+ conflict = __request_resource(parent, res);
+ if (!conflict) {
+ if (!next_res)
+ break;
+ res = next_res;
+ next_res = NULL;
+ continue;
+ }

- /* conflict covered whole area */
- if (conflict->start <= start && conflict->end >= end)
- return;
+ /* conflict covered whole area */
+ if (conflict->start <= res->start &&
+ conflict->end >= res->end) {
+ kfree(res);
+ WARN_ON(next_res);
+ break;
+ }
+
+ /* failed, split and try again */
+ if (conflict->start > res->start) {
+ end = res->end;
+ res->end = conflict->start - 1;
+ if (conflict->end < end) {
+ next_res = kzalloc(sizeof(*next_res),
+ GFP_ATOMIC);
+ if (!next_res) {
+ kfree(res);
+ break;
+ }
+ next_res->name = name;
+ next_res->start = conflict->end + 1;
+ next_res->end = end;
+ next_res->flags = IORESOURCE_BUSY;
+ }
+ } else {
+ res->start = conflict->end + 1;
+ }
+ }

- if (conflict->start > start)
- __reserve_region_with_split(root, start, conflict->start-1, name);
- if (conflict->end < end)
- __reserve_region_with_split(root, conflict->end+1, end, name);
}

void __init reserve_region_with_split(struct resource *root,

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit 25389bb207987b5774182f763b9fb65ff08761c8 upstream.

Commit 09e05d48 introduced a wait for transaction commit into
journal_unmap_buffer() in the case we are truncating a buffer undergoing commit
in the page stradding i_size on a filesystem with blocksize < pagesize. Sadly
we forgot to drop buffer lock before waiting for transaction commit and thus
deadlock is possible when kjournald wants to lock the buffer.

Fix the problem by dropping the buffer lock before waiting for transaction
commit. Since we are still holding page lock (and that is OK), buffer cannot
disappear under us.

Signed-off-by: Jan Kara <ja...@suse.cz>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/jbd/transaction.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 4eff79c..1352e60 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1956,7 +1956,9 @@ retry:
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
spin_unlock(&journal->j_state_lock);
+ unlock_buffer(bh);
log_wait_commit(journal, tid);
+ lock_buffer(bh);
goto retry;
}
/*

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

!TASK_TRACED thread

From: Oleg Nesterov <ol...@redhat.com>

ptrace: ptrace_resume() shouldn't wake up !TASK_TRACED thread

CVE-2013-0871

BugLink: http://bugs.launchpad.net/bugs/1129192

It is not clear why ptrace_resume() does wake_up_process(). Unless the
caller is PTRACE_KILL the tracee should be TASK_TRACED so we can use
wake_up_state(__TASK_TRACED). If sys_ptrace() races with SIGKILL we do
not need the extra and potentionally spurious wakeup.

If the caller is PTRACE_KILL, wake_up_process() is even more wrong.
The tracee can sleep in any state in any place, and if we have a buggy
code which doesn't handle a spurious wakeup correctly PTRACE_KILL can
be used to exploit it. For example:

int main(void)
{
int child, status;

child = fork();
if (!child) {
int ret;

assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);

ret = pause();
printf("pause: %d %m\n", ret);

return 0x23;
}

sleep(1);
assert(ptrace(PTRACE_KILL, child, 0,0) == 0);

assert(child == wait(&status));
printf("wait: %x\n", status);

return 0;
}

prints "pause: -1 Unknown error 514", -ERESTARTNOHAND leaks to the
userland. In this case sys_pause() is buggy as well and should be
fixed.

I do not know what was the original rationality behind PTRACE_KILL.
The man page is simply wrong and afaics it was always wrong. Imho
it should be deprecated, or may be it should do send_sig(SIGKILL)
as Denys suggests, but in any case I do not think that the current
behaviour was intentional.

Note: there is another problem, ptrace_resume() changes ->exit_code
and this can race with SIGKILL too. Eventually we should change ptrace
to not use ->exit_code.

Signed-off-by: Oleg Nesterov <ol...@redhat.com>
(cherry picked from commit 0666fb51b1483f27506e212cc7f7b2645b5c7acc)

Signed-off-by: Luis Henriques <luis.he...@canonical.com>
Acked-by: Colin King <colin...@canonical.com>
Signed-off-by: Tim Gardner <tim.g...@canonical.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/ptrace.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/ptrace.c b/kernel/ptrace.c
index 05625f6..d8184b5 100644
--- a/kernel/ptrace.c
+++ b/kernel/ptrace.c
@@ -506,7 +506,7 @@ static int ptrace_resume(struct task_struct *child, long request, long data)
}

child->exit_code = data;
- wake_up_process(child);
+ wake_up_state(child, __TASK_TRACED);

return 0;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

list

From: Ian Abbott <abb...@mev.co.uk>

commit c8cad4c89ee3b15935c532210ae6ebb5c0a2734d upstream.

When `do_cmd_ioctl()` allocates memory for the kernel copy of a channel
list, it frees any previously allocated channel list in
`async->cmd.chanlist` and replaces it with the new one. However, if the
device is ever removed (or "detached") the cleanup code in
`cleanup_device()` in "drivers.c" does not free this memory so it is
lost.

A sensible place to free the kernel copy of the channel list is in
`do_become_nonbusy()` as at that point the comedi asynchronous command
associated with the channel list is no longer valid. Free the channel
list in `do_become_nonbusy()` instead of `do_cmd_ioctl()` and clear the
pointer to prevent it being freed more than once.

Note that `cleanup_device()` could be called at an inappropriate time
while the comedi device is open, but that's a separate bug not related
to this this patch.

Signed-off-by: Ian Abbott <abb...@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/staging/comedi/comedi_fops.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 908f25a..b83c76f 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1035,7 +1035,6 @@ static int do_cmd_ioctl(struct comedi_device *dev, void *arg, void *file)
goto cleanup;
}

- kfree(async->cmd.chanlist);
async->cmd = user_cmd;
async->cmd.data = NULL;
/* load channel/gain list */
@@ -1759,6 +1758,8 @@ void do_become_nonbusy(struct comedi_device *dev, struct comedi_subdevice *s)
if (async) {
comedi_reset_async_buf(async);
async->inttrig = NULL;
+ kfree(async->cmd.chanlist);
+ async->cmd.chanlist = NULL;
} else {
printk(KERN_ERR
"BUG: (?) do_become_nonbusy called with async=0\n");

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <ol...@redhat.com>

commit 3e63a93b987685f02421e18b2aa452d20553a88b upstream

No functional changes. Move the call_usermodehelper code from
__request_module() into the new simple helper, call_modprobe().

Signed-off-by: Oleg Nesterov <ol...@redhat.com>
Cc: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Cc: Rusty Russell <ru...@rustcorp.com.au>
Cc: Tejun Heo <t...@kernel.org>
Cc: David Rientjes <rien...@google.com>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/kmod.c | 21 ++++++++++++++-------
1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/kernel/kmod.c b/kernel/kmod.c
index f12d883..1088a8f 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -53,6 +53,18 @@ static DECLARE_RWSEM(umhelper_sem);
*/
char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";

+static int call_modprobe(char *module_name, int wait)
+{
+ static char *envp[] = { "HOME=/",
+ "TERM=linux",
+ "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
+ NULL };
+
+ char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
+
+ return call_usermodehelper(modprobe_path, argv, envp, wait);
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -74,11 +86,6 @@ int __request_module(bool wait, const char *fmt, ...)
char module_name[MODULE_NAME_LEN];
unsigned int max_modprobes;
int ret;
- char *argv[] = { modprobe_path, "-q", "--", module_name, NULL };
- static char *envp[] = { "HOME=/",
- "TERM=linux",
- "PATH=/sbin:/usr/sbin:/bin:/usr/bin",
- NULL };
static atomic_t kmod_concurrent = ATOMIC_INIT(0);
#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
@@ -121,8 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)

trace_module_request(module_name, wait, _RET_IP_);

- ret = call_usermodehelper(modprobe_path, argv, envp,
- wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+ ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+
atomic_dec(&kmod_concurrent);
return ret;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

looking at IO depths"

From: Jens Axboe <jens....@oracle.com>

This reverts commit fb1e75389bd06fd5987e9cda1b4e0305c782f854.

"Benjamin S." <sbe...@gmx.de> reports that the patch in question
causes a big drop in sequential throughput for him, dropping from
200MB/sec down to only 70MB/sec.

Needs to be investigated more fully, for now lets just revert the
offending commit.

Conflicts:

include/linux/blkdev.h

Signed-off-by: Jens Axboe <jens....@oracle.com>
(cherry picked from commit 79da0644a8e0838522828f106e4049639eea6baf)
Cc: Thomas Bork <t...@eisfair.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

block/blk-core.c | 11 ++---------
include/linux/blkdev.h | 4 +---
2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index cffd737..00ac586 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1149,7 +1149,7 @@ void init_request_from_bio(struct request *req, struct bio *bio)
*/
static inline bool queue_should_plug(struct request_queue *q)
{
- return !(blk_queue_nonrot(q) && blk_queue_queuing(q));
+ return !(blk_queue_nonrot(q) && blk_queue_tagged(q));
}

static int __make_request(struct request_queue *q, struct bio *bio)
@@ -1861,15 +1861,8 @@ void blk_dequeue_request(struct request *rq)
* and to it is freed is accounted as io that is in progress at
* the driver side.
*/
- if (blk_account_rq(rq)) {
+ if (blk_account_rq(rq))
q->in_flight[rq_is_sync(rq)]++;
- /*
- * Mark this device as supporting hardware queuing, if
- * we have more IOs in flight than 4.
- */
- if (!blk_queue_queuing(q) && queue_in_flight(q) > 4)
- set_bit(QUEUE_FLAG_CQ, &q->queue_flags);
- }
}

/**
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5eb6cb0..ec9c10b 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -456,8 +456,7 @@ struct request_queue
#define QUEUE_FLAG_NONROT 14 /* non-rotational device (SSD) */
#define QUEUE_FLAG_VIRT QUEUE_FLAG_NONROT /* paravirt device */
#define QUEUE_FLAG_IO_STAT 15 /* do IO stats */
-#define QUEUE_FLAG_CQ 16 /* hardware does queuing */
-#define QUEUE_FLAG_DISCARD 17 /* supports DISCARD */
+#define QUEUE_FLAG_DISCARD 16 /* supports DISCARD */

#define QUEUE_FLAG_DEFAULT ((1 << QUEUE_FLAG_IO_STAT) | \
(1 << QUEUE_FLAG_STACKABLE) | \
@@ -580,7 +579,6 @@ enum {

#define blk_queue_plugged(q) test_bit(QUEUE_FLAG_PLUGGED, &(q)->queue_flags)
#define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags)
-#define blk_queue_queuing(q) test_bit(QUEUE_FLAG_CQ, &(q)->queue_flags)
#define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags)
#define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags)
#define blk_queue_nonrot(q) test_bit(QUEUE_FLAG_NONROT, &(q)->queue_flags)

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit 4a184233f21645cf0b719366210ed445d1024d72 ]

The code in rose_recvmsg() does not initialize all of the members of
struct sockaddr_rose/full_sockaddr_rose when filling the sockaddr info.
Nor does it initialize the padding bytes of the structure inserted by
the compiler for alignment. This will lead to leaking uninitialized
kernel stack bytes in net/socket.c.

Fix the issue by initializing the memory used for sockaddr info with
memset(0).

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Ralf Baechle <ra...@linux-mips.org>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/rose/af_rose.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 523efbb..2984999 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -1275,6 +1275,7 @@ static int rose_recvmsg(struct kiocb *iocb, struct socket *sock,
skb_copy_datagram_iovec(skb, 0, msg->msg_iov, copied);

if (srose != NULL) {
+ memset(srose, 0, msg->msg_namelen);
srose->srose_family = AF_ROSE;
srose->srose_addr = rose->dest_addr;
srose->srose_call = rose->dest_call;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>

[ Upstream commit ae62ca7b03217be5e74759dc6d7698c95df498b3 ]

commit 35f9c09fe9c72e (tcp: tcp_sendpages() should call tcp_push() once)
added an internal flag : MSG_SENDPAGE_NOTLAST meant to be set on all
frags but the last one for a splice() call.

The condition used to set the flag in pipe_to_sendpage() relied on
splice() user passing the exact number of bytes present in the pipe,
or a smaller one.

But some programs pass an arbitrary high value, and the test fails.

The effect of this bug is a lack of tcp_push() at the end of a
splice(pipe -> socket) call, and possibly very slow or erratic TCP
sessions.

We should both test sd->total_len and fact that another fragment
is in the pipe (pipe->nrbufs > 1)

Many thanks to Willy for providing very clear bug report, bisection
and test programs.

Reported-by: Willy Tarreau <w...@1wt.eu>
Bisected-by: Willy Tarreau <w...@1wt.eu>
Tested-by: Willy Tarreau <w...@1wt.eu>
Signed-off-by: Eric Dumazet <edum...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/splice.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/fs/splice.c b/fs/splice.c
index f5d5a2b..cdad986 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -639,8 +639,10 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,
ret = buf->ops->confirm(pipe, buf);
if (!ret) {
more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
- if (sd->len < sd->total_len)
+
+ if (sd->len < sd->total_len && pipe->nrbufs > 1)
more |= MSG_SENDPAGE_NOTLAST;
+
if (file->f_op && file->f_op->sendpage)
ret = file->f_op->sendpage(file, buf->page, buf->offset,
sd->len, &pos, more);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

of extent format file

From: Lukas Czerner <lcze...@redhat.com>

commit f17722f917b2f21497deb6edc62fb1683daa08e6 upstream

Kazuya Mio reported that he was able to hit BUG_ON(next == lblock)
in ext4_ext_put_gap_in_cache() while creating a sparse file in extent
format and fill the tail of file up to its end. We will hit the BUG_ON
when we write the last block (2^32-1) into the sparse file.

The root cause of the problem lies in the fact that we specifically set
s_maxbytes so that block at s_maxbytes fit into on-disk extent format,
which is 32 bit long. However, we are not storing start and end block
number, but rather start block number and length in blocks. It means
that in order to cover extent from 0 to EXT_MAX_BLOCK we need
EXT_MAX_BLOCK+1 to fit into len (because we counting block 0 as well) -
and it does not.

The only way to fix it without changing the meaning of the struct
ext4_extent members is, as Kazuya Mio suggested, to lower s_maxbytes
by one fs block so we can cover the whole extent we can get by the
on-disk extent format.

Also in many places EXT_MAX_BLOCK is used as length instead of maximum
logical block number as the name suggests, it is all a bit messy. So
this commit renames it to EXT_MAX_BLOCKS and change its usage in some
places to actually be maximum number of blocks in the extent.

The bug which this commit fixes can be reproduced as follows:

dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-2))
sync
dd if=/dev/zero of=/mnt/mp1/file bs=<blocksize> count=1 seek=$((2**32-1))

Reported-by: Kazuya Mio <k-...@sx.jp.nec.com>
Signed-off-by: Lukas Czerner <lcze...@redhat.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

[dannf: Applied the backport from RHEL6 to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/ext4_extents.h | 7 +++++--
fs/ext4/extents.c | 39 +++++++++++++++++++--------------------
fs/ext4/move_extent.c | 10 +++++-----
fs/ext4/super.c | 15 ++++++++++++---
4 files changed, 41 insertions(+), 30 deletions(-)

diff --git a/fs/ext4/ext4_extents.h b/fs/ext4/ext4_extents.h
index bdb6ce7..24fa647 100644
--- a/fs/ext4/ext4_extents.h
+++ b/fs/ext4/ext4_extents.h
@@ -137,8 +137,11 @@ typedef int (*ext_prepare_callback)(struct inode *, struct ext4_ext_path *,
#define EXT_BREAK 1
#define EXT_REPEAT 2

-/* Maximum logical block in a file; ext4_extent's ee_block is __le32 */
-#define EXT_MAX_BLOCK 0xffffffff
+/*
+ * Maximum number of logical blocks in a file; ext4_extent's ee_block is
+ * __le32.
+ */
+#define EXT_MAX_BLOCKS 0xffffffff

/*
* EXT_INIT_MAX_LEN is the maximum number of blocks we can have in an
diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index b4402c8..f4b471d 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -1331,7 +1331,7 @@ got_index:

/*
* ext4_ext_next_allocated_block:
- * returns allocated block in subsequent extent or EXT_MAX_BLOCK.
+ * returns allocated block in subsequent extent or EXT_MAX_BLOCKS.
* NOTE: it considers block number from index entry as
* allocated block. Thus, index entries have to be consistent
* with leaves.
@@ -1345,7 +1345,7 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
depth = path->p_depth;

if (depth == 0 && path->p_ext == NULL)
- return EXT_MAX_BLOCK;
+ return EXT_MAX_BLOCKS;

while (depth >= 0) {
if (depth == path->p_depth) {
@@ -1362,12 +1362,12 @@ ext4_ext_next_allocated_block(struct ext4_ext_path *path)
depth--;
}

- return EXT_MAX_BLOCK;
+ return EXT_MAX_BLOCKS;
}

/*
* ext4_ext_next_leaf_block:
- * returns first allocated block from next leaf or EXT_MAX_BLOCK
+ * returns first allocated block from next leaf or EXT_MAX_BLOCKS
*/
static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
struct ext4_ext_path *path)
@@ -1379,7 +1379,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,

/* zero-tree has no leaf blocks at all */
if (depth == 0)
- return EXT_MAX_BLOCK;
+ return EXT_MAX_BLOCKS;

/* go to index block */
depth--;
@@ -1392,7 +1392,7 @@ static ext4_lblk_t ext4_ext_next_leaf_block(struct inode *inode,
depth--;
}

- return EXT_MAX_BLOCK;
+ return EXT_MAX_BLOCKS;
}

/*
@@ -1572,13 +1572,13 @@ unsigned int ext4_ext_check_overlap(struct inode *inode,
*/
if (b2 < b1) {
b2 = ext4_ext_next_allocated_block(path);
- if (b2 == EXT_MAX_BLOCK)
+ if (b2 == EXT_MAX_BLOCKS)
goto out;
}

/* check for wrap through zero on extent logical start block*/
if (b1 + len1 < b1) {
- len1 = EXT_MAX_BLOCK - b1;
+ len1 = EXT_MAX_BLOCKS - b1;
newext->ee_len = cpu_to_le16(len1);
ret = 1;
}
@@ -1654,7 +1654,7 @@ repeat:
fex = EXT_LAST_EXTENT(eh);
next = ext4_ext_next_leaf_block(inode, path);
if (le32_to_cpu(newext->ee_block) > le32_to_cpu(fex->ee_block)
- && next != EXT_MAX_BLOCK) {
+ && next != EXT_MAX_BLOCKS) {
ext_debug("next leaf block - %d\n", next);
BUG_ON(npath != NULL);
npath = ext4_ext_find_extent(inode, next, NULL);
@@ -1772,7 +1772,7 @@ int ext4_ext_walk_space(struct inode *inode, ext4_lblk_t block,
BUG_ON(func == NULL);
BUG_ON(inode == NULL);

- while (block < last && block != EXT_MAX_BLOCK) {
+ while (block < last && block != EXT_MAX_BLOCKS) {
num = last - block;
/* find extent for this block */
down_read(&EXT4_I(inode)->i_data_sem);
@@ -1900,7 +1900,7 @@ ext4_ext_put_gap_in_cache(struct inode *inode, struct ext4_ext_path *path,
if (ex == NULL) {
/* there is no extent yet, so gap is [0;-] */
lblock = 0;
- len = EXT_MAX_BLOCK;
+ len = EXT_MAX_BLOCKS;
ext_debug("cache gap(whole file):");
} else if (block < le32_to_cpu(ex->ee_block)) {
lblock = block;
@@ -2145,8 +2145,8 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
path[depth].p_ext = ex;

a = ex_ee_block > start ? ex_ee_block : start;
- b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCK ?
- ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCK;
+ b = ex_ee_block + ex_ee_len - 1 < EXT_MAX_BLOCKS ?
+ ex_ee_block + ex_ee_len - 1 : EXT_MAX_BLOCKS;

ext_debug(" border %u:%u\n", a, b);

@@ -3783,15 +3783,14 @@ static int ext4_ext_fiemap_cb(struct inode *inode, struct ext4_ext_path *path,
flags |= FIEMAP_EXTENT_UNWRITTEN;

/*
- * If this extent reaches EXT_MAX_BLOCK, it must be last.
+ * If this extent reaches EXT_MAX_BLOCKS, it must be last.
*
- * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCK,
+ * Or if ext4_ext_next_allocated_block is EXT_MAX_BLOCKS,
* this also indicates no more allocated blocks.
*
- * XXX this might miss a single-block extent at EXT_MAX_BLOCK
*/
- if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCK ||
- newex->ec_block + newex->ec_len - 1 == EXT_MAX_BLOCK) {
+ if (ext4_ext_next_allocated_block(path) == EXT_MAX_BLOCKS ||
+ newex->ec_block + newex->ec_len == EXT_MAX_BLOCKS) {
loff_t size = i_size_read(inode);
loff_t bs = EXT4_BLOCK_SIZE(inode->i_sb);

@@ -3871,8 +3870,8 @@ int ext4_fiemap(struct inode *inode, struct fiemap_extent_info *fieinfo,

start_blk = start >> inode->i_sb->s_blocksize_bits;
last_blk = (start + len - 1) >> inode->i_sb->s_blocksize_bits;
- if (last_blk >= EXT_MAX_BLOCK)
- last_blk = EXT_MAX_BLOCK-1;
+ if (last_blk >= EXT_MAX_BLOCKS)
+ last_blk = EXT_MAX_BLOCKS-1;
len_blks = ((ext4_lblk_t) last_blk) - start_blk + 1;

/*
diff --git a/fs/ext4/move_extent.c b/fs/ext4/move_extent.c
index a73ed78..fe81390 100644
--- a/fs/ext4/move_extent.c
+++ b/fs/ext4/move_extent.c
@@ -1001,12 +1001,12 @@ mext_check_arguments(struct inode *orig_inode,
return -EINVAL;
}

- if ((orig_start > EXT_MAX_BLOCK) ||
- (donor_start > EXT_MAX_BLOCK) ||
- (*len > EXT_MAX_BLOCK) ||
- (orig_start + *len > EXT_MAX_BLOCK)) {
+ if ((orig_start >= EXT_MAX_BLOCKS) ||
+ (donor_start >= EXT_MAX_BLOCKS) ||
+ (*len > EXT_MAX_BLOCKS) ||
+ (orig_start + *len >= EXT_MAX_BLOCKS)) {
ext4_debug("ext4 move extent: Can't handle over [%u] blocks "
- "[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCK,
+ "[ino:orig %lu, donor %lu]\n", EXT_MAX_BLOCKS,

orig_inode->i_ino, donor_inode->i_ino);
return -EINVAL;
}

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index f1e7077..3ce77c5 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1975,6 +1975,12 @@ static void ext4_orphan_cleanup(struct super_block *sb,
* in the vfs. ext4 inode has 48 bits of i_block in fsblock units,
* so that won't be a limiting factor.
*
+ * However there is other limiting factor. We do store extents in the form
+ * of starting block and length, hence the resulting length of the extent
+ * covering maximum file size must fit into on-disk format containers as
+ * well. Given that length is always by 1 unit bigger than max unit (because
+ * we count 0 as well) we have to lower the s_maxbytes by one fs block.
+ *
* Note, this does *not* consider any metadata overhead for vfs i_blocks.
*/
static loff_t ext4_max_size(int blkbits, int has_huge_files)
@@ -1996,10 +2002,13 @@ static loff_t ext4_max_size(int blkbits, int has_huge_files)
upper_limit <<= blkbits;
}

- /* 32-bit extent-start container, ee_block */
- res = 1LL << 32;
+ /*
+ * 32-bit extent-start container, ee_block. We lower the maxbytes
+ * by one fs block, so ee_len can cover the extent of maximum file
+ * size
+ */
+ res = (1LL << 32) - 1;
res <<= blkbits;
- res -= 1;

/* Sanity check against vm- & vfs- imposed limits */
if (res > upper_limit)

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Oliver Neukum <one...@suse.de>

commit c0f5ecee4e741667b2493c742b60b6218d40b3aa upstream.

The buffer for responses must not overflow.
If this would happen, set a flag, drop the data and return
an error after user space has read all remaining data.

Signed-off-by: Oliver Neukum <oli...@neukum.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[bwh: Backported to 2.6.32: adjust context]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/class/cdc-wdm.c | 23 ++++++++++++++++++++---
1 file changed, 20 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index 37f2899..01ae519 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -52,6 +52,7 @@ MODULE_DEVICE_TABLE (usb, wdm_ids);
#define WDM_READ 4
#define WDM_INT_STALL 5
#define WDM_POLL_RUNNING 6
+#define WDM_OVERFLOW 10

#define WDM_MAX 16
@@ -115,6 +116,7 @@ static void wdm_in_callback(struct urb *urb)
{
struct wdm_device *desc = urb->context;
int status = urb->status;
+ int length = urb->actual_length;

spin_lock(&desc->iuspin);

@@ -144,9 +146,17 @@ static void wdm_in_callback(struct urb *urb)
}

desc->rerr = status;
- desc->reslength = urb->actual_length;
- memmove(desc->ubuf + desc->length, desc->inbuf, desc->reslength);
- desc->length += desc->reslength;
+ if (length + desc->length > desc->wMaxCommand) {
+ /* The buffer would overflow */
+ set_bit(WDM_OVERFLOW, &desc->flags);
+ } else {
+ /* we may already be in overflow */
+ if (!test_bit(WDM_OVERFLOW, &desc->flags)) {
+ memmove(desc->ubuf + desc->length, desc->inbuf, length);
+ desc->length += length;
+ desc->reslength = length;
+ }
+ }
wake_up(&desc->wait);

set_bit(WDM_READ, &desc->flags);
@@ -398,6 +408,11 @@ retry:
rv = -ENODEV;
goto err;
}
+ if (test_bit(WDM_OVERFLOW, &desc->flags)) {
+ clear_bit(WDM_OVERFLOW, &desc->flags);
+ rv = -ENOBUFS;
+ goto err;
+ }
i++;
if (file->f_flags & O_NONBLOCK) {
if (!test_bit(WDM_READ, &desc->flags)) {
@@ -440,6 +455,7 @@ retry:
spin_unlock_irq(&desc->iuspin);
goto retry;
}
+
if (!desc->reslength) { /* zero length read */
dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
clear_bit(WDM_READ, &desc->flags);
@@ -844,6 +860,7 @@ static int wdm_post_reset(struct usb_interface *intf)
struct wdm_device *desc = usb_get_intfdata(intf);
int rv;

+ clear_bit(WDM_OVERFLOW, &desc->flags);
rv = recover_from_urb_loss(desc);
mutex_unlock(&desc->plock);
return 0;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

getsockopt(IP_VS_SO_GET_TIMEOUT)

From: Mathias Krause <min...@googlemail.com>

commit 2d8a041b7bfe1097af21441cb77d6af95f4f4680 upstream.

If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Wensong Zhang <wen...@linux-vs.org>
Cc: Simon Horman <ho...@verge.net.au>
Cc: Julian Anastasov <j...@ssi.bg>

Signed-off-by: David S. Miller <da...@davemloft.net>

[bwh: Backported to 2.6.32: adjust context]
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/netfilter/ipvs/ip_vs_ctl.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/netfilter/ipvs/ip_vs_ctl.c b/net/netfilter/ipvs/ip_vs_ctl.c
index 02b2610..9bcd972 100644
--- a/net/netfilter/ipvs/ip_vs_ctl.c
+++ b/net/netfilter/ipvs/ip_vs_ctl.c
@@ -2455,6 +2455,7 @@ do_ip_vs_get_ctl(struct sock *sk, int cmd, void __user *user, int *len)
{
struct ip_vs_timeout_user t;

+ memset(&t, 0, sizeof(t));
__ip_vs_get_timeouts(&t);
if (copy_to_user(user, &t, sizeof(t)) != 0)
ret = -EFAULT;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

irda_recvmsg_dgram()

From: Mathias Krause <min...@googlemail.com>

The current code does not fill the msg_name member in case it is set.
It also does not set the msg_namelen member to 0 and therefore makes
net/socket.c leak the local, uninitialized sockaddr_storage variable
to userland -- 128 bytes of kernel stack memory.

Fix that by simply setting msg_namelen to 0 as obviously nobody cared

about irda_recvmsg_dgram() not filling the msg_name in case it was
set.

Cc: Samuel Ortiz <sam...@sortiz.org>
Signed-off-by: Mathias Krause <min...@googlemail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf: adjusted to apply to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/irda/af_irda.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/net/irda/af_irda.c b/net/irda/af_irda.c
index 476b24e..bfb325d 100644
--- a/net/irda/af_irda.c
+++ b/net/irda/af_irda.c
@@ -1338,6 +1338,8 @@ static int irda_recvmsg_dgram(struct kiocb *iocb, struct socket *sock,
if ((err = sock_error(sk)) < 0)
return err;

+ msg->msg_namelen = 0;
+

skb = skb_recv_datagram(sk, flags & ~MSG_DONTWAIT,
flags & MSG_DONTWAIT, &err);
if (!skb)

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

snd_ac97_cvol_new()

From: Takashi Iwai <ti...@suse.de>

commit 733a48e5ae5bf28b046fad984d458c747cbb8c21 upstream.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44721

Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

sound/pci/ac97/ac97_codec.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/sound/pci/ac97/ac97_codec.c b/sound/pci/ac97/ac97_codec.c
index 78288db..5f295f7 100644
--- a/sound/pci/ac97/ac97_codec.c
+++ b/sound/pci/ac97/ac97_codec.c
@@ -1252,6 +1252,8 @@ static int snd_ac97_cvol_new(struct snd_card *card, char *name, int reg, unsigne
tmp.index = ac97->num;
kctl = snd_ctl_new1(&tmp, ac97);
}
+ if (!kctl)
+ return -ENOMEM;
if (reg >= AC97_PHONE && reg <= AC97_PCM)
set_tlv_db_scale(kctl, db_scale_5bit_12db_max);
else

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <kees...@chromium.org>

commit b66c5984017533316fd1951770302649baf1aa33 upstream

If a series of scripts are executed, each triggering module loading via
unprintable bytes in the script header, kernel stack contents can leak
into the command line.

Normally execution of binfmt_script and binfmt_misc happens recursively.
However, when modules are enabled, and unprintable bytes exist in the
bprm->buf, execution will restart after attempting to load matching
binfmt modules. Unfortunately, the logic in binfmt_script and
binfmt_misc does not expect to get restarted. They leave bprm->interp
pointing to their local stack. This means on restart bprm->interp is
left pointing into unused stack memory which can then be copied into the
userspace argv areas.

After additional study, it seems that both recursion and restart remains
the desirable way to handle exec with scripts, misc, and modules. As
such, we need to protect the changes to interp.

This changes the logic to require allocation for any changes to the
bprm->interp. To avoid adding a new kmalloc to every exec, the default
value is left as-is. Only when passing through binfmt_script or
binfmt_misc does an allocation take place.

For a proof of concept, see DoTest.sh from:

http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/

Signed-off-by: Kees Cook <kees...@chromium.org>
Cc: halfdog <m...@halfdog.net>
Cc: P J P <ppa...@redhat.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: <sta...@vger.kernel.org>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/binfmt_misc.c | 5 ++++-
fs/binfmt_script.c | 4 +++-
fs/exec.c | 15 +++++++++++++++
include/linux/binfmts.h | 1 +
4 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index 42b60b0..fb93997 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -176,7 +176,10 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
goto _error;
bprm->argc ++;

- bprm->interp = iname; /* for binfmt_script */
+ /* Update interp in case binfmt_script needs it. */
+ retval = bprm_change_interp(iname, bprm);
+ if (retval < 0)
+ goto _error;

interp_file = open_exec (iname);
retval = PTR_ERR (interp_file);
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 0834350..356568c 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -82,7 +82,9 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
retval = copy_strings_kernel(1, &i_name, bprm);
if (retval) return retval;
bprm->argc++;
- bprm->interp = interp;
+ retval = bprm_change_interp(interp, bprm);
+ if (retval < 0)
+ return retval;

/*
* OK, now restart the process with the interpreter's dentry.
diff --git a/fs/exec.c b/fs/exec.c
index 86fafc6..f9f1b11 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1108,9 +1108,24 @@ void free_bprm(struct linux_binprm *bprm)
mutex_unlock(&current->cred_guard_mutex);
abort_creds(bprm->cred);
}
+ /* If a binfmt changed the interp, free it. */
+ if (bprm->interp != bprm->filename)
+ kfree(bprm->interp);
kfree(bprm);
}

+int bprm_change_interp(char *interp, struct linux_binprm *bprm)
+{
+ /* If a binfmt changed the interp, free it first. */
+ if (bprm->interp != bprm->filename)
+ kfree(bprm->interp);
+ bprm->interp = kstrdup(interp, GFP_KERNEL);
+ if (!bprm->interp)
+ return -ENOMEM;
+ return 0;
+}
+EXPORT_SYMBOL(bprm_change_interp);
+
/*
* install the new credentials for this executable
*/
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index a3d802e..d06c3a4 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -122,6 +122,7 @@ extern int setup_arg_pages(struct linux_binprm * bprm,
unsigned long stack_top,
int executable_stack);
extern int bprm_mm_init(struct linux_binprm *bprm);
+extern int bprm_change_interp(char *interp, struct linux_binprm *bprm);
extern int copy_strings_kernel(int argc,char ** argv,struct linux_binprm *bprm);
extern int prepare_bprm_creds(struct linux_binprm *bprm);
extern void install_exec_creds(struct linux_binprm *bprm);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

instead of kfree

From: Daniel Borkmann <dbor...@redhat.com>

[ Upstream commit 6ba542a291a5e558603ac51cda9bded347ce7627 ]

In sctp_setsockopt_auth_key, we create a temporary copy of the user
passed shared auth key for the endpoint or association and after
internal setup, we free it right away. Since it's sensitive data, we
should zero out the key before returning the memory back to the
allocator. Thus, use kzfree instead of kfree, just as we do in
sctp_auth_key_put().

Signed-off-by: Daniel Borkmann <dbor...@redhat.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sctp/socket.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 1f9843e..26ffae2 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -3271,7 +3271,7 @@ static int sctp_setsockopt_auth_key(struct sock *sk,

ret = sctp_auth_set_key(sctp_sk(sk)->ep, asoc, authkey);
out:
- kfree(authkey);
+ kzfree(authkey);
return ret;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

mode

From: Anatol Pomozov <anatol....@gmail.com>

Instead of checking whether the handle is valid, we check if journal
is enabled. This avoids taking the s_orphan_lock mutex in all cases
when there is no journal in use, including the error paths where
ext4_orphan_del() is called with a handle set to NULL.

Signed-off-by: Anatol Pomozov <anatol....@gmail.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/namei.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 828c9c9..230bef5 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -2001,7 +2001,7 @@ int ext4_orphan_add(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0, rc;

- if (!ext4_handle_valid(handle))
+ if (!EXT4_SB(sb)->s_journal)
return 0;

mutex_lock(&EXT4_SB(sb)->s_orphan_lock);
@@ -2082,8 +2082,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
struct ext4_iloc iloc;
int err = 0;

- /* ext4_handle_valid() assumes a valid handle_t pointer */
- if (handle && !ext4_handle_valid(handle))
+ if (!EXT4_SB(inode->i_sb)->s_journal)
return 0;

mutex_lock(&EXT4_SB(inode->i_sb)->s_orphan_lock);
@@ -2102,7 +2101,7 @@ int ext4_orphan_del(handle_t *handle, struct inode *inode)
* transaction handle with which to update the orphan list on
* disk, but we still need to remove the inode from the linked
* list in memory. */
- if (sbi->s_journal && !handle)
+ if (!handle)
goto out;

err = ext4_reserve_inode_write(handle, inode, &iloc);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

not EXPERT

From: Romain Francoise <rom...@orebokech.com>

Before v2.6.38 CONFIG_EXPERT was known as CONFIG_EMBEDDED but the
Kconfig entry was not changed to match when upstream commit
628c6246d47b85f5357298601df2444d7f4dd3fd ("x86, random: Architectural
inlines to get random integers with RDRAND") was backported.

Signed-off-by: Romain Francoise <rom...@orebokech.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/Kconfig | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index aa889d6..ee0168d 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1430,7 +1430,7 @@ config ARCH_USES_PG_UNCACHED

config ARCH_RANDOM
def_bool y
- prompt "x86 architectural random number generator" if EXPERT
+ prompt "x86 architectural random number generator" if EMBEDDED
---help---
Enable the x86 architectural RDRAND instruction
(Intel Bull Mountain technology) to generate random numbers.

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit 1f86840f897717f86d523a13e99a447e6a5d2fa5 ]

The memory used for the template copy is a local stack variable. As
struct xfrm_user_tmpl contains multiple holes added by the compiler for
alignment, not initializing the memory will lead to leaking stack bytes
to userland. Add an explicit memset(0) to avoid the info leak.

Initial version of the patch by Brad Spengler.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Brad Spengler <spe...@grsecurity.net>
Acked-by: Steffen Klassert <steffen....@secunet.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/xfrm/xfrm_user.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 3de81fe..a8d83c4 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1178,6 +1178,7 @@ static int copy_to_user_tmpl(struct xfrm_policy *xp, struct sk_buff *skb)
struct xfrm_user_tmpl *up = &vec[i];
struct xfrm_tmpl *kp = &xp->xfrm_vec[i];

+ memset(up, 0, sizeof(*up));
memcpy(&up->id, &kp->id, sizeof(up->id));
up->family = kp->encap_family;
memcpy(&up->saddr, &kp->saddr, sizeof(up->saddr));

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

readers

From: Steven Rostedt <sros...@redhat.com>

commit 9366c1ba13fbc41bdb57702e75ca4382f209c82f upstream.

The function rb_check_pages() was added to make sure the ring buffer's
pages were sane. This check is done when the ring buffer size is modified
as well as when the iterator is released (closing the "trace" file),
as that was considered a non fast path and a good place to do a sanity
check.

The problem is that the check does not have any locks around it.
If one process were to read the trace file, and another were to read
the raw binary file, the check could happen while the reader is reading
the file.

The issues with this is that the check requires to clear the HEAD page
before doing the full check and it restores it afterward. But readers
require the HEAD page to exist before it can read the buffer, otherwise
it gives a nasty warning and disables the buffer.

By adding the reader lock around the check, this keeps the race from
happening.

Signed-off-by: Steven Rostedt <ros...@goodmis.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/trace/ring_buffer.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
index e749a05..6024960 100644
--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -2876,6 +2876,8 @@ rb_get_reader_page(struct ring_buffer_per_cpu *cpu_buffer)
* Splice the empty reader page into the list around the head.
*/
reader = rb_set_head_page(cpu_buffer);
+ if (!reader)
+ goto out;
cpu_buffer->reader_page->list.next = reader->list.next;
cpu_buffer->reader_page->list.prev = reader->list.prev;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

error path

From: Eugene Shatokhin <eugene.s...@rosalab.ru>

commit 24ec19b0ae83a385ad9c55520716da671274b96c upstream.

In ext4_xattr_set_acl(), if ext4_journal_start() returns an error,
posix_acl_release() will not be called for 'acl' which may result in a
memory leak.

This patch fixes that.

Reviewed-by: Lukas Czerner <lcze...@redhat.com>
Signed-off-by: Eugene Shatokhin <eugene.s...@rosalab.ru>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/acl.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/acl.c b/fs/ext4/acl.c
index 0df88b2..d29a06b 100644
--- a/fs/ext4/acl.c
+++ b/fs/ext4/acl.c
@@ -454,8 +454,10 @@ ext4_xattr_set_acl(struct inode *inode, int type, const void *value,

retry:
handle = ext4_journal_start(inode, EXT4_DATA_TRANS_BLOCKS(inode->i_sb));
- if (IS_ERR(handle))
- return PTR_ERR(handle);
+ if (IS_ERR(handle)) {
+ error = PTR_ERR(handle);
+ goto release_and_out;
+ }
error = ext4_set_acl(handle, inode, type, acl);
ext4_journal_stop(handle);
if (error == -ENOSPC && ext4_should_retry_alloc(inode->i_sb, &retries))

Willy Tarreau

unread,

Jun 4, 2013, 6:50:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.d...@gmail.com>

commit f6d8bd051c391c1c0458a30b2a7abcd939329259 upstream.

We lack proper synchronization to manipulate inet->opt ip_options

Problem is ip_make_skb() calls ip_setup_cork() and
ip_setup_cork() possibly makes a copy of ipc->opt (struct ip_options),
without any protection against another thread manipulating inet->opt.

Another thread can change inet->opt pointer and free old one under us.

Use RCU to protect inet->opt (changed to inet->inet_opt).

Instead of handling atomic refcounts, just copy ip_options when
necessary, to avoid cache line dirtying.

We cant insert an rcu_head in struct ip_options since its included in
skb->cb[], so this patch is large because I had to introduce a new
ip_options_rcu structure.

Signed-off-by: Eric Dumazet <eric.d...@gmail.com>
Cc: Herbert Xu <her...@gondor.apana.org.au>

Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf/bwh: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/net/inet_sock.h | 14 +++--
include/net/ip.h | 11 ++--
net/dccp/ipv4.c | 15 +++---
net/dccp/ipv6.c | 2 +-
net/ipv4/af_inet.c | 16 ++++--
net/ipv4/cipso_ipv4.c | 113 ++++++++++++++++++++++------------------
net/ipv4/icmp.c | 23 ++++----
net/ipv4/inet_connection_sock.c | 8 +--
net/ipv4/ip_options.c | 38 +++++++-------
net/ipv4/ip_output.c | 50 +++++++++---------
net/ipv4/ip_sockglue.c | 33 ++++++++----
net/ipv4/raw.c | 19 +++++--
net/ipv4/syncookies.c | 4 +-
net/ipv4/tcp_ipv4.c | 33 +++++++-----
net/ipv4/udp.c | 21 ++++++--
net/ipv6/tcp_ipv6.c | 2 +-
16 files changed, 235 insertions(+), 167 deletions(-)

diff --git a/include/net/inet_sock.h b/include/net/inet_sock.h
index 47004f3..cf65e77 100644
--- a/include/net/inet_sock.h
+++ b/include/net/inet_sock.h
@@ -56,7 +56,15 @@ struct ip_options {
unsigned char __data[0];
};

-#define optlength(opt) (sizeof(struct ip_options) + opt->optlen)
+struct ip_options_rcu {
+ struct rcu_head rcu;
+ struct ip_options opt;
+};
+
+struct ip_options_data {
+ struct ip_options_rcu opt;
+ char data[40];
+};

struct inet_request_sock {
struct request_sock req;
@@ -77,7 +85,7 @@ struct inet_request_sock {
acked : 1,
no_srccheck: 1;
kmemcheck_bitfield_end(flags);
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
};

static inline struct inet_request_sock *inet_rsk(const struct request_sock *sk)
@@ -122,7 +130,7 @@ struct inet_sock {
__be32 saddr;
__s16 uc_ttl;
__u16 cmsg_flags;
- struct ip_options *opt;
+ struct ip_options_rcu *inet_opt;
__be16 sport;
__u16 id;
__u8 tos;
diff --git a/include/net/ip.h b/include/net/ip.h
index 69db943..a7d4675 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -54,7 +54,7 @@ struct ipcm_cookie
{
__be32 addr;
int oif;
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
union skb_shared_tx shtx;
};

@@ -92,7 +92,7 @@ extern int igmp_mc_proc_init(void);

extern int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
__be32 saddr, __be32 daddr,
- struct ip_options *opt);
+ struct ip_options_rcu *opt);
extern int ip_rcv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *pt, struct net_device *orig_dev);
extern int ip_local_deliver(struct sk_buff *skb);
@@ -362,14 +362,15 @@ extern int ip_forward(struct sk_buff *skb);
* Functions provided by ip_options.c
*/

-extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt, __be32 daddr, struct rtable *rt, int is_frag);
+extern void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
+ __be32 daddr, struct rtable *rt, int is_frag);
extern int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb);
extern void ip_options_fragment(struct sk_buff *skb);
extern int ip_options_compile(struct net *net,
struct ip_options *opt, struct sk_buff *skb);
-extern int ip_options_get(struct net *net, struct ip_options **optp,
+extern int ip_options_get(struct net *net, struct ip_options_rcu **optp,
unsigned char *data, int optlen);
-extern int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+extern int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
unsigned char __user *data, int optlen);
extern void ip_options_undo(struct ip_options * opt);
extern void ip_forward_options(struct sk_buff *skb);
diff --git a/net/dccp/ipv4.c b/net/dccp/ipv4.c
index d14c0a3..cef3656 100644
--- a/net/dccp/ipv4.c
+++ b/net/dccp/ipv4.c
@@ -47,6 +47,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
__be32 daddr, nexthop;
int tmp;
int err;
+ struct ip_options_rcu *inet_opt;

dp->dccps_role = DCCP_ROLE_CLIENT;

@@ -57,10 +58,12 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -EAFNOSUPPORT;

nexthop = daddr = usin->sin_addr.s_addr;
- if (inet->opt != NULL && inet->opt->srr) {
+
+ inet_opt = inet->inet_opt;
+ if (inet_opt != NULL && inet_opt->opt.srr) {
if (daddr == 0)
return -EINVAL;
- nexthop = inet->opt->faddr;
+ nexthop = inet_opt->opt.faddr;
}

tmp = ip_route_connect(&rt, nexthop, inet->saddr,
@@ -75,7 +78,7 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -ENETUNREACH;
}

- if (inet->opt == NULL || !inet->opt->srr)
+ if (inet_opt == NULL || !inet_opt->opt.srr)
daddr = rt->rt_dst;

if (inet->saddr == 0)
@@ -86,8 +89,8 @@ int dccp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
inet->daddr = daddr;

inet_csk(sk)->icsk_ext_hdr_len = 0;
- if (inet->opt != NULL)
- inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+ if (inet_opt)
+ inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
/*
* Socket identity is still unknown (sport may be zero).
* However we set state to DCCP_REQUESTING and not releasing socket
@@ -397,7 +400,7 @@ struct sock *dccp_v4_request_recv_sock(struct sock *sk, struct sk_buff *skb,
newinet->daddr = ireq->rmt_addr;
newinet->rcv_saddr = ireq->loc_addr;
newinet->saddr = ireq->loc_addr;
- newinet->opt = ireq->opt;
+ newinet->inet_opt = ireq->opt;
ireq->opt = NULL;
newinet->mc_index = inet_iif(skb);
newinet->mc_ttl = ip_hdr(skb)->ttl;
diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 9ed1962..2f11de7 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -600,7 +600,7 @@ static struct sock *dccp_v6_request_recv_sock(struct sock *sk,

First: no IPv4 options.
*/
- newinet->opt = NULL;
+ newinet->inet_opt = NULL;

/* Clone RX bits */
newnp->rxopt.all = np->rxopt.all;
diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c
index a289878..d1992a4 100644
--- a/net/ipv4/af_inet.c
+++ b/net/ipv4/af_inet.c
@@ -152,7 +152,7 @@ void inet_sock_destruct(struct sock *sk)
WARN_ON(sk->sk_wmem_queued);
WARN_ON(sk->sk_forward_alloc);

- kfree(inet->opt);
+ kfree(inet->inet_opt);
dst_release(sk->sk_dst_cache);
sk_refcnt_debug_dec(sk);
}
@@ -1065,9 +1065,11 @@ static int inet_sk_reselect_saddr(struct sock *sk)
__be32 old_saddr = inet->saddr;
__be32 new_saddr;
__be32 daddr = inet->daddr;
+ struct ip_options_rcu *inet_opt;

- if (inet->opt && inet->opt->srr)
- daddr = inet->opt->faddr;
+ inet_opt = inet->inet_opt;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;

/* Query new route. */
err = ip_route_connect(&rt, daddr, 0,
@@ -1109,6 +1111,7 @@ int inet_sk_rebuild_header(struct sock *sk)
struct inet_sock *inet = inet_sk(sk);
struct rtable *rt = (struct rtable *)__sk_dst_check(sk, 0);
__be32 daddr;
+ struct ip_options_rcu *inet_opt;
int err;

/* Route is OK, nothing to do. */
@@ -1116,9 +1119,12 @@ int inet_sk_rebuild_header(struct sock *sk)
return 0;

/* Reroute. */
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
daddr = inet->daddr;
- if (inet->opt && inet->opt->srr)
- daddr = inet->opt->faddr;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;
+ rcu_read_unlock();
{
struct flowi fl = {
.oif = sk->sk_bound_dev_if,
diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 10f8f8d..b6d06d6 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1860,6 +1860,11 @@ static int cipso_v4_genopt(unsigned char *buf, u32 buf_len,
return CIPSO_V4_HDR_LEN + ret_val;
}

+static void opt_kfree_rcu(struct rcu_head *head)
+{
+ kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
/**
* cipso_v4_sock_setattr - Add a CIPSO option to a socket
* @sk: the socket
@@ -1882,7 +1887,7 @@ int cipso_v4_sock_setattr(struct sock *sk,
unsigned char *buf = NULL;
u32 buf_len;
u32 opt_len;
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *old, *opt = NULL;
struct inet_sock *sk_inet;
struct inet_connection_sock *sk_conn;

@@ -1918,22 +1923,25 @@ int cipso_v4_sock_setattr(struct sock *sk,
ret_val = -ENOMEM;
goto socket_setattr_failure;
}
- memcpy(opt->__data, buf, buf_len);
- opt->optlen = opt_len;
- opt->cipso = sizeof(struct iphdr);
+ memcpy(opt->opt.__data, buf, buf_len);
+ opt->opt.optlen = opt_len;
+ opt->opt.cipso = sizeof(struct iphdr);
kfree(buf);
buf = NULL;

sk_inet = inet_sk(sk);
+
+ old = sk_inet->inet_opt;
if (sk_inet->is_icsk) {
sk_conn = inet_csk(sk);
- if (sk_inet->opt)
- sk_conn->icsk_ext_hdr_len -= sk_inet->opt->optlen;
- sk_conn->icsk_ext_hdr_len += opt->optlen;
+ if (old)
+ sk_conn->icsk_ext_hdr_len -= old->opt.optlen;
+ sk_conn->icsk_ext_hdr_len += opt->opt.optlen;
sk_conn->icsk_sync_mss(sk, sk_conn->icsk_pmtu_cookie);
}
- opt = xchg(&sk_inet->opt, opt);
- kfree(opt);
+ rcu_assign_pointer(sk_inet->inet_opt, opt);
+ if (old)
+ call_rcu(&old->rcu, opt_kfree_rcu);

return 0;

@@ -1963,7 +1971,7 @@ int cipso_v4_req_setattr(struct request_sock *req,
unsigned char *buf = NULL;
u32 buf_len;
u32 opt_len;
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *opt = NULL;
struct inet_request_sock *req_inet;

/* We allocate the maximum CIPSO option size here so we are probably
@@ -1991,15 +1999,16 @@ int cipso_v4_req_setattr(struct request_sock *req,
ret_val = -ENOMEM;
goto req_setattr_failure;
}
- memcpy(opt->__data, buf, buf_len);
- opt->optlen = opt_len;
- opt->cipso = sizeof(struct iphdr);
+ memcpy(opt->opt.__data, buf, buf_len);
+ opt->opt.optlen = opt_len;
+ opt->opt.cipso = sizeof(struct iphdr);
kfree(buf);
buf = NULL;

req_inet = inet_rsk(req);
opt = xchg(&req_inet->opt, opt);
- kfree(opt);
+ if (opt)
+ call_rcu(&opt->rcu, opt_kfree_rcu);

return 0;

@@ -2019,34 +2028,34 @@ req_setattr_failure:
* values on failure.
*
*/
-int cipso_v4_delopt(struct ip_options **opt_ptr)
+int cipso_v4_delopt(struct ip_options_rcu **opt_ptr)
{
int hdr_delta = 0;
- struct ip_options *opt = *opt_ptr;
+ struct ip_options_rcu *opt = *opt_ptr;

- if (opt->srr || opt->rr || opt->ts || opt->router_alert) {
+ if (opt->opt.srr || opt->opt.rr || opt->opt.ts || opt->opt.router_alert) {
u8 cipso_len;
u8 cipso_off;
unsigned char *cipso_ptr;
int iter;
int optlen_new;

- cipso_off = opt->cipso - sizeof(struct iphdr);
- cipso_ptr = &opt->__data[cipso_off];
+ cipso_off = opt->opt.cipso - sizeof(struct iphdr);
+ cipso_ptr = &opt->opt.__data[cipso_off];
cipso_len = cipso_ptr[1];

- if (opt->srr > opt->cipso)
- opt->srr -= cipso_len;
- if (opt->rr > opt->cipso)
- opt->rr -= cipso_len;
- if (opt->ts > opt->cipso)
- opt->ts -= cipso_len;
- if (opt->router_alert > opt->cipso)
- opt->router_alert -= cipso_len;
- opt->cipso = 0;
+ if (opt->opt.srr > opt->opt.cipso)
+ opt->opt.srr -= cipso_len;
+ if (opt->opt.rr > opt->opt.cipso)
+ opt->opt.rr -= cipso_len;
+ if (opt->opt.ts > opt->opt.cipso)
+ opt->opt.ts -= cipso_len;
+ if (opt->opt.router_alert > opt->opt.cipso)
+ opt->opt.router_alert -= cipso_len;
+ opt->opt.cipso = 0;

memmove(cipso_ptr, cipso_ptr + cipso_len,
- opt->optlen - cipso_off - cipso_len);
+ opt->opt.optlen - cipso_off - cipso_len);

/* determining the new total option length is tricky because of
* the padding necessary, the only thing i can think to do at
@@ -2055,21 +2064,21 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
* from there we can determine the new total option length */
iter = 0;
optlen_new = 0;
- while (iter < opt->optlen)
- if (opt->__data[iter] != IPOPT_NOP) {
- iter += opt->__data[iter + 1];
+ while (iter < opt->opt.optlen)
+ if (opt->opt.__data[iter] != IPOPT_NOP) {
+ iter += opt->opt.__data[iter + 1];
optlen_new = iter;
} else
iter++;
- hdr_delta = opt->optlen;
- opt->optlen = (optlen_new + 3) & ~3;
- hdr_delta -= opt->optlen;
+ hdr_delta = opt->opt.optlen;
+ opt->opt.optlen = (optlen_new + 3) & ~3;
+ hdr_delta -= opt->opt.optlen;
} else {
/* only the cipso option was present on the socket so we can
* remove the entire option struct */
*opt_ptr = NULL;
- hdr_delta = opt->optlen;
- kfree(opt);
+ hdr_delta = opt->opt.optlen;
+ call_rcu(&opt->rcu, opt_kfree_rcu);
}

return hdr_delta;
@@ -2086,15 +2095,15 @@ int cipso_v4_delopt(struct ip_options **opt_ptr)
void cipso_v4_sock_delattr(struct sock *sk)
{
int hdr_delta;
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
struct inet_sock *sk_inet;

sk_inet = inet_sk(sk);
- opt = sk_inet->opt;
- if (opt == NULL || opt->cipso == 0)
+ opt = sk_inet->inet_opt;
+ if (opt == NULL || opt->opt.cipso == 0)
return;

- hdr_delta = cipso_v4_delopt(&sk_inet->opt);
+ hdr_delta = cipso_v4_delopt(&sk_inet->inet_opt);
if (sk_inet->is_icsk && hdr_delta > 0) {
struct inet_connection_sock *sk_conn = inet_csk(sk);
sk_conn->icsk_ext_hdr_len -= hdr_delta;
@@ -2112,12 +2121,12 @@ void cipso_v4_sock_delattr(struct sock *sk)
*/
void cipso_v4_req_delattr(struct request_sock *req)
{
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
struct inet_request_sock *req_inet;

req_inet = inet_rsk(req);
opt = req_inet->opt;
- if (opt == NULL || opt->cipso == 0)
+ if (opt == NULL || opt->opt.cipso == 0)
return;

cipso_v4_delopt(&req_inet->opt);
@@ -2187,14 +2196,18 @@ getattr_return:
*/
int cipso_v4_sock_getattr(struct sock *sk, struct netlbl_lsm_secattr *secattr)
{
- struct ip_options *opt;
+ struct ip_options_rcu *opt;
+ int res = -ENOMSG;

- opt = inet_sk(sk)->opt;
- if (opt == NULL || opt->cipso == 0)
- return -ENOMSG;
-
- return cipso_v4_getattr(opt->__data + opt->cipso - sizeof(struct iphdr),
- secattr);
+ rcu_read_lock();
+ opt = rcu_dereference(inet_sk(sk)->inet_opt);
+ if (opt && opt->opt.cipso)
+ res = cipso_v4_getattr(opt->opt.__data +
+ opt->opt.cipso -
+ sizeof(struct iphdr),
+ secattr);
+ rcu_read_unlock();
+ return res;
}

/**
diff --git a/net/ipv4/icmp.c b/net/ipv4/icmp.c
index 5bc13fe..859d781 100644
--- a/net/ipv4/icmp.c
+++ b/net/ipv4/icmp.c
@@ -107,8 +107,7 @@ struct icmp_bxm {
__be32 times[3];
} data;
int head_len;
- struct ip_options replyopts;
- unsigned char optbuf[40];
+ struct ip_options_data replyopts;
};

/* An array of errno for error messages from dest unreach. */
@@ -362,7 +361,7 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
struct inet_sock *inet;
__be32 daddr;

- if (ip_options_echo(&icmp_param->replyopts, skb))
+ if (ip_options_echo(&icmp_param->replyopts.opt.opt, skb))
return;

sk = icmp_xmit_lock(net);
@@ -376,10 +375,10 @@ static void icmp_reply(struct icmp_bxm *icmp_param, struct sk_buff *skb)
daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
ipc.shtx.flags = 0;
- if (icmp_param->replyopts.optlen) {
- ipc.opt = &icmp_param->replyopts;
- if (ipc.opt->srr)
- daddr = icmp_param->replyopts.faddr;
+ if (icmp_param->replyopts.opt.opt.optlen) {
+ ipc.opt = &icmp_param->replyopts.opt;
+ if (ipc.opt->opt.srr)
+ daddr = icmp_param->replyopts.opt.opt.faddr;
}
{
struct flowi fl = { .nl_u = { .ip4_u =
@@ -516,7 +515,7 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
IPTOS_PREC_INTERNETCONTROL) :
iph->tos;

- if (ip_options_echo(&icmp_param.replyopts, skb_in))
+ if (ip_options_echo(&icmp_param.replyopts.opt.opt, skb_in))
goto out_unlock;

@@ -532,15 +531,15 @@ void icmp_send(struct sk_buff *skb_in, int type, int code, __be32 info)
icmp_param.offset = skb_network_offset(skb_in);
inet_sk(sk)->tos = tos;
ipc.addr = iph->saddr;
- ipc.opt = &icmp_param.replyopts;
+ ipc.opt = &icmp_param.replyopts.opt;
ipc.shtx.flags = 0;

{
struct flowi fl = {
.nl_u = {
.ip4_u = {
- .daddr = icmp_param.replyopts.srr ?
- icmp_param.replyopts.faddr :
+ .daddr = icmp_param.replyopts.opt.opt.srr ?
+ icmp_param.replyopts.opt.opt.faddr :
iph->saddr,
.saddr = saddr,
.tos = RT_TOS(tos)
@@ -629,7 +628,7 @@ route_done:
room = dst_mtu(&rt->u.dst);
if (room > 576)
room = 576;
- room -= sizeof(struct iphdr) + icmp_param.replyopts.optlen;
+ room -= sizeof(struct iphdr) + icmp_param.replyopts.opt.opt.optlen;
room -= sizeof(struct icmphdr);

icmp_param.data_len = skb_in->len - icmp_param.offset;
diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 537731b..a3bf986 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -356,11 +356,11 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
{
struct rtable *rt;
const struct inet_request_sock *ireq = inet_rsk(req);
- struct ip_options *opt = inet_rsk(req)->opt;
+ struct ip_options_rcu *opt = inet_rsk(req)->opt;
struct flowi fl = { .oif = sk->sk_bound_dev_if,
.nl_u = { .ip4_u =
- { .daddr = ((opt && opt->srr) ?
- opt->faddr :
+ { .daddr = ((opt && opt->opt.srr) ?
+ opt->opt.faddr :
ireq->rmt_addr),
.saddr = ireq->loc_addr,
.tos = RT_CONN_FLAGS(sk) } },
@@ -374,7 +374,7 @@ struct dst_entry *inet_csk_route_req(struct sock *sk,
security_req_classify_flow(req, &fl);
if (ip_route_output_flow(net, &rt, &fl, sk, 0))
goto no_route;
- if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+ if (opt && opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
goto route_err;
return &rt->u.dst;

diff --git a/net/ipv4/ip_options.c b/net/ipv4/ip_options.c
index 94bf105..8a95972 100644
--- a/net/ipv4/ip_options.c
+++ b/net/ipv4/ip_options.c
@@ -35,7 +35,7 @@
* saddr is address of outgoing interface.
*/

-void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
+void ip_options_build(struct sk_buff *skb, struct ip_options *opt,
__be32 daddr, struct rtable *rt, int is_frag)
{
unsigned char *iph = skb_network_header(skb);
@@ -82,9 +82,9 @@ void ip_options_build(struct sk_buff * skb, struct ip_options * opt,
* NOTE: dopt cannot point to skb.
*/

-int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
+int ip_options_echo(struct ip_options *dopt, struct sk_buff *skb)
{
- struct ip_options *sopt;
+ const struct ip_options *sopt;
unsigned char *sptr, *dptr;
int soffset, doffset;
int optlen;
@@ -94,10 +94,8 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)

sopt = &(IPCB(skb)->opt);

- if (sopt->optlen == 0) {
- dopt->optlen = 0;
+ if (sopt->optlen == 0)
return 0;
- }

sptr = skb_network_header(skb);
dptr = dopt->__data;
@@ -156,7 +154,7 @@ int ip_options_echo(struct ip_options * dopt, struct sk_buff * skb)
dopt->optlen += optlen;
}
if (sopt->srr) {
- unsigned char * start = sptr+sopt->srr;
+ unsigned char *start = sptr+sopt->srr;
__be32 faddr;

optlen = start[1];
@@ -499,19 +497,19 @@ void ip_options_undo(struct ip_options * opt)
}
}

-static struct ip_options *ip_options_get_alloc(const int optlen)
+static struct ip_options_rcu *ip_options_get_alloc(const int optlen)
{
- return kzalloc(sizeof(struct ip_options) + ((optlen + 3) & ~3),
+ return kzalloc(sizeof(struct ip_options_rcu) + ((optlen + 3) & ~3),
GFP_KERNEL);
}

-static int ip_options_get_finish(struct net *net, struct ip_options **optp,
- struct ip_options *opt, int optlen)
+static int ip_options_get_finish(struct net *net, struct ip_options_rcu **optp,
+ struct ip_options_rcu *opt, int optlen)
{
while (optlen & 3)
- opt->__data[optlen++] = IPOPT_END;
- opt->optlen = optlen;
- if (optlen && ip_options_compile(net, opt, NULL)) {
+ opt->opt.__data[optlen++] = IPOPT_END;
+ opt->opt.optlen = optlen;
+ if (optlen && ip_options_compile(net, &opt->opt, NULL)) {
kfree(opt);
return -EINVAL;
}
@@ -520,29 +518,29 @@ static int ip_options_get_finish(struct net *net, struct ip_options **optp,
return 0;
}

-int ip_options_get_from_user(struct net *net, struct ip_options **optp,
+int ip_options_get_from_user(struct net *net, struct ip_options_rcu **optp,
unsigned char __user *data, int optlen)
{
- struct ip_options *opt = ip_options_get_alloc(optlen);
+ struct ip_options_rcu *opt = ip_options_get_alloc(optlen);

if (!opt)
return -ENOMEM;
- if (optlen && copy_from_user(opt->__data, data, optlen)) {
+ if (optlen && copy_from_user(opt->opt.__data, data, optlen)) {
kfree(opt);
return -EFAULT;
}
return ip_options_get_finish(net, optp, opt, optlen);
}

-int ip_options_get(struct net *net, struct ip_options **optp,
+int ip_options_get(struct net *net, struct ip_options_rcu **optp,
unsigned char *data, int optlen)
{
- struct ip_options *opt = ip_options_get_alloc(optlen);
+ struct ip_options_rcu *opt = ip_options_get_alloc(optlen);

if (!opt)
return -ENOMEM;
if (optlen)
- memcpy(opt->__data, data, optlen);
+ memcpy(opt->opt.__data, data, optlen);
return ip_options_get_finish(net, optp, opt, optlen);
}

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 44b7910..7dde039 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -137,14 +137,14 @@ static inline int ip_select_ttl(struct inet_sock *inet, struct dst_entry *dst)
*
*/
int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
- __be32 saddr, __be32 daddr, struct ip_options *opt)
+ __be32 saddr, __be32 daddr, struct ip_options_rcu *opt)
{
struct inet_sock *inet = inet_sk(sk);
struct rtable *rt = skb_rtable(skb);
struct iphdr *iph;

/* Build the IP header. */
- skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+ skb_push(skb, sizeof(struct iphdr) + (opt ? opt->opt.optlen : 0));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
iph->version = 4;
@@ -160,9 +160,9 @@ int ip_build_and_send_pkt(struct sk_buff *skb, struct sock *sk,
iph->protocol = sk->sk_protocol;
ip_select_ident(iph, &rt->u.dst, sk);

- if (opt && opt->optlen) {
- iph->ihl += opt->optlen>>2;
- ip_options_build(skb, opt, daddr, rt, 0);
+ if (opt && opt->opt.optlen) {
+ iph->ihl += opt->opt.optlen>>2;
+ ip_options_build(skb, &opt->opt, daddr, rt, 0);
}

skb->priority = sk->sk_priority;
@@ -312,9 +312,10 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
{
struct sock *sk = skb->sk;
struct inet_sock *inet = inet_sk(sk);
- struct ip_options *opt = inet->opt;
+ struct ip_options_rcu *inet_opt = NULL;
struct rtable *rt;
struct iphdr *iph;
+ int res;

/* Skip all of this if the packet is already routed,
* f.e. by something like SCTP.
@@ -325,13 +326,15 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)

/* Make sure we can route this packet. */
rt = (struct rtable *)__sk_dst_check(sk, 0);
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
if (rt == NULL) {
__be32 daddr;

/* Use correct destination address if we have options. */
daddr = inet->daddr;
- if(opt && opt->srr)
- daddr = opt->faddr;
+ if (inet_opt && inet_opt->opt.srr)
+ daddr = inet_opt->opt.faddr;

{
struct flowi fl = { .oif = sk->sk_bound_dev_if,
@@ -359,11 +362,11 @@ int ip_queue_xmit(struct sk_buff *skb, int ipfragok)
skb_dst_set(skb, dst_clone(&rt->u.dst));

packet_routed:
- if (opt && opt->is_strictroute && rt->rt_dst != rt->rt_gateway)
+ if (inet_opt && inet_opt->opt.is_strictroute && rt->rt_dst != rt->rt_gateway)
goto no_route;

/* OK, we know where to send it, allocate and build IP header. */
- skb_push(skb, sizeof(struct iphdr) + (opt ? opt->optlen : 0));
+ skb_push(skb, sizeof(struct iphdr) + (inet_opt ? inet_opt->opt.optlen : 0));
skb_reset_network_header(skb);
iph = ip_hdr(skb);
*((__be16 *)iph) = htons((4 << 12) | (5 << 8) | (inet->tos & 0xff));
@@ -377,9 +380,9 @@ packet_routed:
iph->daddr = rt->rt_dst;
/* Transport layer set skb->h.foo itself. */

- if (opt && opt->optlen) {
- iph->ihl += opt->optlen >> 2;
- ip_options_build(skb, opt, inet->daddr, rt, 0);
+ if (inet_opt && inet_opt->opt.optlen) {
+ iph->ihl += inet_opt->opt.optlen >> 2;
+ ip_options_build(skb, &inet_opt->opt, inet->daddr, rt, 0);
}

ip_select_ident_more(iph, &rt->u.dst, sk,
@@ -387,10 +390,12 @@ packet_routed:

skb->priority = sk->sk_priority;
skb->mark = sk->sk_mark;
-
- return ip_local_out(skb);
+ res = ip_local_out(skb);
+ rcu_read_unlock();
+ return res;

no_route:
+ rcu_read_unlock();
IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES);
kfree_skb(skb);
return -EHOSTUNREACH;
@@ -809,7 +814,7 @@ int ip_append_data(struct sock *sk,
/*
* setup for corking.
*/
- opt = ipc->opt;
+ opt = ipc->opt ? &ipc->opt->opt : NULL;
if (opt) {
if (inet->cork.opt == NULL) {
inet->cork.opt = kmalloc(sizeof(struct ip_options) + 40, sk->sk_allocation);
@@ -1367,26 +1372,23 @@ void ip_send_reply(struct sock *sk, struct sk_buff *skb, struct ip_reply_arg *ar
unsigned int len)
{
struct inet_sock *inet = inet_sk(sk);
- struct {
- struct ip_options opt;
- char data[40];
- } replyopts;
+ struct ip_options_data replyopts;
struct ipcm_cookie ipc;
__be32 daddr;
struct rtable *rt = skb_rtable(skb);

- if (ip_options_echo(&replyopts.opt, skb))
+ if (ip_options_echo(&replyopts.opt.opt, skb))
return;

daddr = ipc.addr = rt->rt_src;
ipc.opt = NULL;
ipc.shtx.flags = 0;

- if (replyopts.opt.optlen) {
+ if (replyopts.opt.opt.optlen) {
ipc.opt = &replyopts.opt;

- if (ipc.opt->srr)
- daddr = replyopts.opt.faddr;
+ if (replyopts.opt.opt.srr)
+ daddr = replyopts.opt.opt.faddr;
}

{
diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index 184a7ad..099e6c3 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -434,6 +434,11 @@ out:
}

+static void opt_kfree_rcu(struct rcu_head *head)
+{
+ kfree(container_of(head, struct ip_options_rcu, rcu));
+}
+
/*
* Socket option code for IP. This is the end of the line after any
* TCP,UDP etc options on an IP socket.
@@ -479,13 +484,15 @@ static int do_ip_setsockopt(struct sock *sk, int level,
switch (optname) {
case IP_OPTIONS:
{
- struct ip_options *opt = NULL;
+ struct ip_options_rcu *old, *opt = NULL;
+
if (optlen > 40 || optlen < 0)
goto e_inval;
err = ip_options_get_from_user(sock_net(sk), &opt,
optval, optlen);
if (err)
break;
+ old = inet->inet_opt;
if (inet->is_icsk) {
struct inet_connection_sock *icsk = inet_csk(sk);
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
@@ -494,17 +501,18 @@ static int do_ip_setsockopt(struct sock *sk, int level,
(TCPF_LISTEN | TCPF_CLOSE)) &&
inet->daddr != LOOPBACK4_IPV6)) {
#endif
- if (inet->opt)
- icsk->icsk_ext_hdr_len -= inet->opt->optlen;
+ if (old)
+ icsk->icsk_ext_hdr_len -= old->opt.optlen;
if (opt)
- icsk->icsk_ext_hdr_len += opt->optlen;
+ icsk->icsk_ext_hdr_len += opt->opt.optlen;
icsk->icsk_sync_mss(sk, icsk->icsk_pmtu_cookie);
#if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)
}
#endif
}
- opt = xchg(&inet->opt, opt);
- kfree(opt);
+ rcu_assign_pointer(inet->inet_opt, opt);
+ if (old)
+ call_rcu(&old->rcu, opt_kfree_rcu);
break;
}
case IP_PKTINFO:
@@ -1032,12 +1040,15 @@ static int do_ip_getsockopt(struct sock *sk, int level, int optname,
case IP_OPTIONS:
{
unsigned char optbuf[sizeof(struct ip_options)+40];
- struct ip_options * opt = (struct ip_options *)optbuf;
+ struct ip_options *opt = (struct ip_options *)optbuf;
+ struct ip_options_rcu *inet_opt;
+
+ inet_opt = inet->inet_opt;
opt->optlen = 0;
- if (inet->opt)
- memcpy(optbuf, inet->opt,
- sizeof(struct ip_options)+
- inet->opt->optlen);
+ if (inet_opt)
+ memcpy(optbuf, &inet_opt->opt,
+ sizeof(struct ip_options) +
+ inet_opt->opt.optlen);
release_sock(sk);

if (opt->optlen == 0)
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index ab996f9..07ab583 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -459,6 +459,7 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
__be32 saddr;
u8 tos;
int err;
+ struct ip_options_data opt_copy;

err = -EMSGSIZE;
if (len > 0xFFFF)
@@ -519,8 +520,18 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
saddr = ipc.addr;
ipc.addr = daddr;

- if (!ipc.opt)
- ipc.opt = inet->opt;
+ if (!ipc.opt) {
+ struct ip_options_rcu *inet_opt;
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ if (inet_opt) {
+ memcpy(&opt_copy, inet_opt,
+ sizeof(*inet_opt) + inet_opt->opt.optlen);
+ ipc.opt = &opt_copy.opt;
+ }
+ rcu_read_unlock();
+ }

if (ipc.opt) {
err = -EINVAL;
@@ -529,10 +540,10 @@ static int raw_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
*/
if (inet->hdrincl)
goto done;
- if (ipc.opt->srr) {
+ if (ipc.opt->opt.srr) {
if (!daddr)
goto done;
- daddr = ipc.opt->faddr;
+ daddr = ipc.opt->opt.faddr;
}
}
tos = RT_CONN_FLAGS(sk);
diff --git a/net/ipv4/syncookies.c b/net/ipv4/syncookies.c
index a6e0e07..0a94b64 100644
--- a/net/ipv4/syncookies.c
+++ b/net/ipv4/syncookies.c
@@ -309,10 +309,10 @@ struct sock *cookie_v4_check(struct sock *sk, struct sk_buff *skb,
* the ACK carries the same options again (see RFC1122 4.2.3.8)
*/
if (opt && opt->optlen) {
- int opt_size = sizeof(struct ip_options) + opt->optlen;
+ int opt_size = sizeof(struct ip_options_rcu) + opt->optlen;

ireq->opt = kmalloc(opt_size, GFP_ATOMIC);
- if (ireq->opt != NULL && ip_options_echo(ireq->opt, skb)) {
+ if (ireq->opt != NULL && ip_options_echo(&ireq->opt->opt, skb)) {
kfree(ireq->opt);
ireq->opt = NULL;
}
diff --git a/net/ipv4/tcp_ipv4.c b/net/ipv4/tcp_ipv4.c
index 6a4e832..d746d3b3 100644
--- a/net/ipv4/tcp_ipv4.c
+++ b/net/ipv4/tcp_ipv4.c
@@ -152,6 +152,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
__be32 daddr, nexthop;
int tmp;
int err;
+ struct ip_options_rcu *inet_opt;

if (addr_len < sizeof(struct sockaddr_in))
return -EINVAL;
@@ -160,10 +161,11 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -EAFNOSUPPORT;

nexthop = daddr = usin->sin_addr.s_addr;
- if (inet->opt && inet->opt->srr) {
+ inet_opt = inet->inet_opt;
+ if (inet_opt && inet_opt->opt.srr) {
if (!daddr)
return -EINVAL;
- nexthop = inet->opt->faddr;
+ nexthop = inet_opt->opt.faddr;
}

tmp = ip_route_connect(&rt, nexthop, inet->saddr,
@@ -181,7 +183,7 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
return -ENETUNREACH;
}

- if (!inet->opt || !inet->opt->srr)
+ if (!inet_opt || !inet_opt->opt.srr)
daddr = rt->rt_dst;

if (!inet->saddr)
@@ -215,8 +217,8 @@ int tcp_v4_connect(struct sock *sk, struct sockaddr *uaddr, int addr_len)
inet->daddr = daddr;

inet_csk(sk)->icsk_ext_hdr_len = 0;
- if (inet->opt)
- inet_csk(sk)->icsk_ext_hdr_len = inet->opt->optlen;
+ if (inet_opt)
+ inet_csk(sk)->icsk_ext_hdr_len = inet_opt->opt.optlen;

tp->rx_opt.mss_clamp = 536;

@@ -802,17 +804,18 @@ static void syn_flood_warning(struct sk_buff *skb)
/*
* Save and compile IPv4 options into the request_sock if needed.
*/
-static struct ip_options *tcp_v4_save_options(struct sock *sk,
- struct sk_buff *skb)
+static struct ip_options_rcu *tcp_v4_save_options(struct sock *sk,
+ struct sk_buff *skb)
{
- struct ip_options *opt = &(IPCB(skb)->opt);
- struct ip_options *dopt = NULL;
+ const struct ip_options *opt = &(IPCB(skb)->opt);
+ struct ip_options_rcu *dopt = NULL;

if (opt && opt->optlen) {
- int opt_size = optlength(opt);
+ int opt_size = sizeof(*dopt) + opt->optlen;
+
dopt = kmalloc(opt_size, GFP_ATOMIC);
if (dopt) {
- if (ip_options_echo(dopt, skb)) {
+ if (ip_options_echo(&dopt->opt, skb)) {
kfree(dopt);
dopt = NULL;
}
@@ -1362,6 +1365,7 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
#ifdef CONFIG_TCP_MD5SIG
struct tcp_md5sig_key *key;
#endif
+ struct ip_options_rcu *inet_opt;

if (sk_acceptq_is_full(sk))
goto exit_overflow;
@@ -1382,13 +1386,14 @@ struct sock *tcp_v4_syn_recv_sock(struct sock *sk, struct sk_buff *skb,
newinet->daddr = ireq->rmt_addr;
newinet->rcv_saddr = ireq->loc_addr;
newinet->saddr = ireq->loc_addr;
- newinet->opt = ireq->opt;
+ inet_opt = ireq->opt;
+ rcu_assign_pointer(newinet->inet_opt, inet_opt);
ireq->opt = NULL;
newinet->mc_index = inet_iif(skb);
newinet->mc_ttl = ip_hdr(skb)->ttl;
inet_csk(newsk)->icsk_ext_hdr_len = 0;
- if (newinet->opt)
- inet_csk(newsk)->icsk_ext_hdr_len = newinet->opt->optlen;
+ if (inet_opt)
+ inet_csk(newsk)->icsk_ext_hdr_len = inet_opt->opt.optlen;
newinet->id = newtp->write_seq ^ jiffies;

tcp_mtup_init(newsk);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 8e28770..af559e0 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -592,6 +592,7 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
int err, is_udplite = IS_UDPLITE(sk);
int corkreq = up->corkflag || msg->msg_flags&MSG_MORE;
int (*getfrag)(void *, char *, int, int, int, struct sk_buff *);
+ struct ip_options_data opt_copy;

if (len > 0xFFFF)
return -EMSGSIZE;
@@ -663,22 +664,32 @@ int udp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg,
free = 1;
connected = 0;
}
- if (!ipc.opt)
- ipc.opt = inet->opt;
+ if (!ipc.opt) {
+ struct ip_options_rcu *inet_opt;
+
+ rcu_read_lock();
+ inet_opt = rcu_dereference(inet->inet_opt);
+ if (inet_opt) {
+ memcpy(&opt_copy, inet_opt,
+ sizeof(*inet_opt) + inet_opt->opt.optlen);
+ ipc.opt = &opt_copy.opt;
+ }
+ rcu_read_unlock();
+ }

saddr = ipc.addr;
ipc.addr = faddr = daddr;

- if (ipc.opt && ipc.opt->srr) {
+ if (ipc.opt && ipc.opt->opt.srr) {
if (!daddr)
return -EINVAL;
- faddr = ipc.opt->faddr;
+ faddr = ipc.opt->opt.faddr;
connected = 0;
}
tos = RT_TOS(inet->tos);
if (sock_flag(sk, SOCK_LOCALROUTE) ||
(msg->msg_flags & MSG_DONTROUTE) ||
- (ipc.opt && ipc.opt->is_strictroute)) {
+ (ipc.opt && ipc.opt->opt.is_strictroute)) {
tos |= RTO_ONLINK;
connected = 0;
}
diff --git a/net/ipv6/tcp_ipv6.c b/net/ipv6/tcp_ipv6.c
index faae6df..1b25191 100644
--- a/net/ipv6/tcp_ipv6.c
+++ b/net/ipv6/tcp_ipv6.c
@@ -1391,7 +1391,7 @@ static struct sock * tcp_v6_syn_recv_sock(struct sock *sk, struct sk_buff *skb,

First: no IPv4 options.
*/
- newinet->opt = NULL;
+ newinet->inet_opt = NULL;
newnp->ipv6_fl_list = NULL;

/* Clone RX bits */

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit 43da5f2e0d0c69ded3d51907d9552310a6b545e8 upstream.

The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[bwh: Backported to 2.6.32: adjust filename, context]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/compat_ioctl.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
index 0dd21a4..98d3c58 100644
--- a/fs/compat_ioctl.c
+++ b/fs/compat_ioctl.c
@@ -352,6 +352,7 @@ static int dev_ifconf(unsigned int fd, unsigned int cmd, unsigned long arg)
if (copy_from_user(&ifc32, compat_ptr(arg), sizeof(struct ifconf32)))
return -EFAULT;

+ memset(&ifc, 0, sizeof(ifc));
if (ifc32.ifcbuf == 0) {
ifc32.ifc_len = 0;
ifc.ifc_len = 0;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

and write()

From: Ian Abbott <abb...@mev.co.uk>

commit cc400e185c07c15a42d2635995f422de5b94b696 upstream.

Some low-level comedi drivers (incorrectly) point `dev->read_subdev` or
`dev->write_subdev` to a subdevice that does not support asynchronous
commands. Comedi's poll(), read() and write() file operation handlers
assume these subdevices do support asynchronous commands. In
particular, they assume `s->async` is valid (where `s` points to the
read or write subdevice), which it won't be if it has been set
incorrectly. This can lead to a NULL pointer dereference.

Check `s->async` is non-NULL in `comedi_poll()`, `comedi_read()` and
`comedi_write()` to avoid the bug.

Signed-off-by: Ian Abbott <abb...@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/staging/comedi/comedi_fops.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index 193b836..90810e8 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -1498,7 +1498,7 @@ static unsigned int comedi_poll(struct file *file, poll_table * wait)

mask = 0;
read_subdev = comedi_get_read_subdevice(dev_file_info);
- if (read_subdev) {
+ if (read_subdev && read_subdev->async) {
poll_wait(file, &read_subdev->async->wait_head, wait);
if (!read_subdev->busy
|| comedi_buf_read_n_available(read_subdev->async) > 0
@@ -1508,7 +1508,7 @@ static unsigned int comedi_poll(struct file *file, poll_table * wait)
}
}
write_subdev = comedi_get_write_subdevice(dev_file_info);
- if (write_subdev) {
+ if (write_subdev && write_subdev->async) {
poll_wait(file, &write_subdev->async->wait_head, wait);
comedi_buf_write_alloc(write_subdev->async,
write_subdev->async->prealloc_bufsz);
@@ -1550,7 +1550,7 @@ static ssize_t comedi_write(struct file *file, const char *buf, size_t nbytes,
}

s = comedi_get_write_subdevice(dev_file_info);
- if (s == NULL) {
+ if (s == NULL || s->async == NULL) {
retval = -EIO;
goto done;
}
@@ -1658,7 +1658,7 @@ static ssize_t comedi_read(struct file *file, char *buf, size_t nbytes,
}

s = comedi_get_read_subdevice(dev_file_info);
- if (s == NULL) {
+ if (s == NULL || s->async == NULL) {
retval = -EIO;
goto done;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

data exposure

From: Jamie Iles <jamie...@oracle.com>

CVE-2012-4508 kernel: ext4: AIO vs fallocate stale data exposure
[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/extents.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
index f4b471d..3f022ea 100644
--- a/fs/ext4/extents.c
+++ b/fs/ext4/extents.c
@@ -62,6 +62,7 @@ ext4_fsblk_t ext_pblock(struct ext4_extent *ex)
* idx_pblock:
* combine low and high parts of a leaf physical block number into ext4_fsblk_t
*/
+#define EXT4_EXT_DATA_VALID 0x8 /* extent contains valid data */
ext4_fsblk_t idx_pblock(struct ext4_extent_idx *ix)
{
ext4_fsblk_t block;
@@ -2933,6 +2934,30 @@ static int ext4_split_unwritten_extents(handle_t *handle,
ext4_ext_mark_uninitialized(ex3);
err = ext4_ext_insert_extent(handle, inode, path, ex3, flags);
if (err == -ENOSPC && may_zeroout) {
+ /*
+ * This is different from the upstream, because we
+ * need only a flag to say that the extent contains
+ * the actual data.
+ *
+ * If the extent contains valid data, which can only
+ * happen if AIO races with fallocate, then we got
+ * here from ext4_convert_unwritten_extents_dio().
+ * So we have to be careful not to zeroout valid data
+ * in the extent.
+ *
+ * To avoid it, we only zeroout the ex3 and extend the
+ * extent which is going to become initialized to cover
+ * ex3 as well. and continue as we would if only
+ * split in two was required.
+ */
+ if (flags & EXT4_EXT_DATA_VALID) {
+ err = ext4_ext_zeroout(inode, ex3);
+ if (err)
+ goto fix_extent_len;
+ max_blocks = allocated;
+ ex2->ee_len = cpu_to_le16(max_blocks);
+ goto skip;
+ }
err = ext4_ext_zeroout(inode, &orig_ex);
if (err)
goto fix_extent_len;
@@ -2978,6 +3003,7 @@ static int ext4_split_unwritten_extents(handle_t *handle,

allocated = max_blocks;
}
+skip:
/*
* If there was a change of depth as part of the
* insertion of ex3 above, we need to update the length
@@ -3030,11 +3056,16 @@ fix_extent_len:
ext4_ext_dirty(handle, inode, path + depth);
return err;
}
+
static int ext4_convert_unwritten_extents_dio(handle_t *handle,
struct inode *inode,
+ ext4_lblk_t iblock,
+ unsigned int max_blocks,
struct ext4_ext_path *path)
{
struct ext4_extent *ex;
+ ext4_lblk_t ee_block;
+ unsigned int ee_len;
struct ext4_extent_header *eh;
int depth;
int err = 0;
@@ -3043,6 +3074,30 @@ static int ext4_convert_unwritten_extents_dio(handle_t *handle,
depth = ext_depth(inode);
eh = path[depth].p_hdr;
ex = path[depth].p_ext;
+ ee_block = le32_to_cpu(ex->ee_block);
+ ee_len = ext4_ext_get_actual_len(ex);
+
+ ext_debug("ext4_convert_unwritten_extents_endio: inode %lu, logical"
+ "block %llu, max_blocks %u\n", inode->i_ino,
+ (unsigned long long)ee_block, ee_len);
+
+ /* If extent is larger than requested then split is required */
+
+ if (ee_block != iblock || ee_len > max_blocks) {
+ err = ext4_split_unwritten_extents(handle, inode, path,
+ iblock, max_blocks,
+ EXT4_EXT_DATA_VALID);
+ if (err < 0)
+ goto out;
+ ext4_ext_drop_refs(path);
+ path = ext4_ext_find_extent(inode, iblock, path);
+ if (IS_ERR(path)) {
+ err = PTR_ERR(path);
+ goto out;
+ }
+ depth = ext_depth(inode);
+ ex = path[depth].p_ext;
+ }

err = ext4_ext_get_access(handle, inode, path + depth);
if (err)
@@ -3129,7 +3184,8 @@ ext4_ext_handle_uninitialized_extents(handle_t *handle, struct inode *inode,
/* async DIO end_io complete, convert the filled extent to written */
if (flags == EXT4_GET_BLOCKS_DIO_CONVERT_EXT) {
ret = ext4_convert_unwritten_extents_dio(handle, inode,
- path);
+ iblock, max_blocks,
+ path);
if (ret >= 0)
ext4_update_inode_fsync_trans(handle, inode, 1);
goto out2;
@@ -3498,6 +3554,12 @@ void ext4_ext_truncate(struct inode *inode)
int err = 0;

/*
+ * finish any pending end_io work so we won't run the risk of
+ * converting any truncated blocks to initialized later
+ */
+ flush_aio_dio_completed_IO(inode);
+
+ /*
* probably first extent we're gonna free will be last in block
*/
err = ext4_writepage_trans_blocks(inode);
@@ -3630,6 +3692,9 @@ long ext4_fallocate(struct inode *inode, int mode, loff_t offset, loff_t len)
mutex_unlock(&inode->i_mutex);
return ret;
}
+
+ /* Prevent race condition between unwritten */
+ flush_aio_dio_completed_IO(inode);
retry:
while (ret >= 0 && ret < max_blocks) {
block = block + ret;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Patrick McHardy <ka...@trash.net>

commit bea1e22df494a729978e7f2c54f7bda328f74bc3 upstream.

Fix a crash in ipoib_mcast_join_task(). (with help from Or Gerlitz)

Commit c8c2afe360b7 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries. This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called. To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <ka...@trash.net>
Signed-off-by: Roland Dreier <rol...@purestorage.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
drivers/infiniband/ulp/ipoib/ipoib_multicast.c | 19 ++++++++++---------
2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index b4b2257..f6a23ec 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -157,7 +157,7 @@ static int ipoib_stop(struct net_device *dev)

netif_stop_queue(dev);

- ipoib_ib_dev_down(dev, 0);
+ ipoib_ib_dev_down(dev, 1);
ipoib_ib_dev_stop(dev, 0);

if (!test_bit(IPOIB_FLAG_SUBINTERFACE, &priv->flags)) {
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
index 8763c1e..bd656a7 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_multicast.c
@@ -188,7 +188,9 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,

mcast->mcmember = *mcmember;

- /* Set the cached Q_Key before we attach if it's the broadcast group */
+ /* Set the multicast MTU and cached Q_Key before we attach if it's
+ * the broadcast group.
+ */
if (!memcmp(mcast->mcmember.mgid.raw, priv->dev->broadcast + 4,
sizeof (union ib_gid))) {
spin_lock_irq(&priv->lock);
@@ -196,10 +198,17 @@ static int ipoib_mcast_join_finish(struct ipoib_mcast *mcast,
spin_unlock_irq(&priv->lock);
return -EAGAIN;
}
+ priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
priv->qkey = be32_to_cpu(priv->broadcast->mcmember.qkey);
spin_unlock_irq(&priv->lock);
priv->tx_wr.wr.ud.remote_qkey = priv->qkey;
set_qkey = 1;
+
+ if (!ipoib_cm_admin_enabled(dev)) {
+ rtnl_lock();
+ dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
+ rtnl_unlock();
+ }
}

if (!test_bit(IPOIB_MCAST_FLAG_SENDONLY, &mcast->flags)) {
@@ -588,14 +597,6 @@ void ipoib_mcast_join_task(struct work_struct *work)
return;
}

- priv->mcast_mtu = IPOIB_UD_MTU(ib_mtu_enum_to_int(priv->broadcast->mcmember.mtu));
-
- if (!ipoib_cm_admin_enabled(dev)) {
- rtnl_lock();
- dev_set_mtu(dev, min(priv->mcast_mtu, priv->admin_mtu));
- rtnl_unlock();
- }
-
ipoib_dbg_mcast(priv, "successfully joined all multicast groups\n");

clear_bit(IPOIB_MCAST_RUN, &priv->flags);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <kees...@chromium.org>

commit d740269867021faf4ce38a449353d2b986c34a67 upstream

To avoid an explosion of request_module calls on a chain of abusive
scripts, fail maximum recursion with -ELOOP instead of -ENOEXEC. As soon
as maximum recursion depth is hit, the error will fail all the way back
up the chain, aborting immediately.

This also has the side-effect of stopping the user's shell from attempting
to reexecute the top-level file as a shell script. As seen in the
dash source:

if (cmd != path_bshell && errno == ENOEXEC) {
*argv-- = cmd;
*argv = cmd = path_bshell;
goto repeat;
}

The above logic was designed for running scripts automatically that lacked
the "#!" header, not to re-try failed recursion. On a legitimate -ENOEXEC,
things continue to behave as the shell expects.

Additionally, when tracking recursion, the binfmt handlers should not be
involved. The recursion being tracked is the depth of calls through
search_binary_handler(), so that function should be exclusively responsible
for tracking the depth.

Signed-off-by: Kees Cook <kees...@chromium.org>
Cc: halfdog <m...@halfdog.net>
Cc: P J P <ppa...@redhat.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/binfmt_em86.c | 1 -
fs/binfmt_misc.c | 6 ------
fs/binfmt_script.c | 4 +---
fs/exec.c | 10 +++++-----
include/linux/binfmts.h | 2 --
5 files changed, 6 insertions(+), 17 deletions(-)

diff --git a/fs/binfmt_em86.c b/fs/binfmt_em86.c
index 32fb00b..416dcae 100644
--- a/fs/binfmt_em86.c
+++ b/fs/binfmt_em86.c
@@ -43,7 +43,6 @@ static int load_em86(struct linux_binprm *bprm,struct pt_regs *regs)
return -ENOEXEC;
}

- bprm->recursion_depth++; /* Well, the bang-shell is implicit... */
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index fb93997..258c5ca 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -116,10 +116,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (!enabled)
goto _ret;

- retval = -ENOEXEC;
- if (bprm->recursion_depth > BINPRM_MAX_RECURSION)
- goto _ret;
-
/* to keep locking time low, we copy the interpreter string */
read_lock(&entries_lock);
fmt = check_file(bprm);
@@ -200,8 +196,6 @@ static int load_misc_binary(struct linux_binprm *bprm, struct pt_regs *regs)
if (retval < 0)
goto _error;

- bprm->recursion_depth++;
-
retval = search_binary_handler (bprm, regs);
if (retval < 0)
goto _error;
diff --git a/fs/binfmt_script.c b/fs/binfmt_script.c
index 356568c..4fe6b8a 100644
--- a/fs/binfmt_script.c
+++ b/fs/binfmt_script.c
@@ -22,15 +22,13 @@ static int load_script(struct linux_binprm *bprm,struct pt_regs *regs)
char interp[BINPRM_BUF_SIZE];
int retval;

- if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!') ||
- (bprm->recursion_depth > BINPRM_MAX_RECURSION))
+ if ((bprm->buf[0] != '#') || (bprm->buf[1] != '!'))
return -ENOEXEC;
/*
* This section does the #! interpretation.
* Sorta complicated, but hopefully it will work. -TYT
*/

- bprm->recursion_depth++;
allow_write_access(bprm->file);
fput(bprm->file);
bprm->file = NULL;
diff --git a/fs/exec.c b/fs/exec.c
index f9f1b11..feb2435 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -1285,6 +1285,10 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
int try,retval;
struct linux_binfmt *fmt;

+ /* This allows 4 levels of binfmt rewrites before failing hard. */
+ if (depth > 5)
+ return -ELOOP;
+
retval = security_bprm_check(bprm);
if (retval)
return retval;
@@ -1306,12 +1310,8 @@ int search_binary_handler(struct linux_binprm *bprm,struct pt_regs *regs)
if (!try_module_get(fmt->module))
continue;
read_unlock(&binfmt_lock);
+ bprm->recursion_depth = depth + 1;
retval = fn(bprm, regs);
- /*
- * Restore the depth counter to its starting value
- * in this call, so we don't have to rely on every
- * load_binary function to restore it on return.
- */
bprm->recursion_depth = depth;
if (retval >= 0) {
if (depth == 0)
diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h
index d06c3a4..9ffffec 100644
--- a/include/linux/binfmts.h
+++ b/include/linux/binfmts.h
@@ -71,8 +71,6 @@ extern struct page *get_arg_page(struct linux_binprm *bprm, unsigned long pos,
#define BINPRM_FLAGS_EXECFD_BIT 1
#define BINPRM_FLAGS_EXECFD (1 << BINPRM_FLAGS_EXECFD_BIT)

-#define BINPRM_MAX_RECURSION 4
-
/*
* This structure defines the functions that are used to load the binary formats that
* linux accepts.

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

returning from kswapd()

From: Takamori Yamaguchi <takamori....@jp.sony.com>

commit b0a8cc58e6b9aaae3045752059e5e6260c0b94bc upstream.

In kswapd(), set current->reclaim_state to NULL before returning, as
current->reclaim_state holds reference to variable on kswapd()'s stack.

In rare cases, while returning from kswapd() during memory offlining,
__free_slab() and freepages() can access the dangling pointer of
current->reclaim_state.

Signed-off-by: Takamori Yamaguchi <takamori....@jp.sony.com>
Signed-off-by: Aaditya Kumar <aadity...@ap.sony.com>
Acked-by: David Rientjes <rien...@google.com>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

mm/vmscan.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 4649929..738db2b 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2241,6 +2241,8 @@ static int kswapd(void *p)
balance_pgdat(pgdat, order);
}
}
+
+ current->reclaim_state = NULL;
return 0;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: "J. Bruce Fields" <bfi...@redhat.com>

commit d5f50b0c290431c65377c4afa1c764e2c3fe5305 upstream.

If the argument and reply together exceed the maximum payload size, then
a reply with a read-like operation can overlow the rq_pages array.

Signed-off-by: J. Bruce Fields <bfi...@redhat.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/nfsd/nfs4xdr.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4xdr.c b/fs/nfsd/nfs4xdr.c
index 6d27757..ab87b05 100644
--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -2610,11 +2610,16 @@ nfsd4_encode_read(struct nfsd4_compoundres *resp, __be32 nfserr,
len = maxcount;
v = 0;
while (len > 0) {
- pn = resp->rqstp->rq_resused++;
+ pn = resp->rqstp->rq_resused;
+ if (!resp->rqstp->rq_respages[pn]) { /* ran out of pages */
+ maxcount -= len;
+ break;
+ }
resp->rqstp->rq_vec[v].iov_base =
page_address(resp->rqstp->rq_respages[pn]);
resp->rqstp->rq_vec[v].iov_len =
len < PAGE_SIZE ? len : PAGE_SIZE;
+ resp->rqstp->rq_resused++;
v++;
len -= PAGE_SIZE;
}
@@ -2662,6 +2667,8 @@ nfsd4_encode_readlink(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd
return nfserr;
if (resp->xbuf->page_len)
return nfserr_resource;
+ if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+ return nfserr_resource;

page = page_address(resp->rqstp->rq_respages[resp->rqstp->rq_resused++]);

@@ -2711,6 +2718,8 @@ nfsd4_encode_readdir(struct nfsd4_compoundres *resp, __be32 nfserr, struct nfsd4
return nfserr;
if (resp->xbuf->page_len)
return nfserr_resource;
+ if (!resp->rqstp->rq_respages[resp->rqstp->rq_resused])
+ return nfserr_resource;

RESERVE_SPACE(8); /* verifier */
savep = p;

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abb...@mev.co.uk>

commit e1878957b4676a17cf398f7f5723b365e9a2ca48 upstream.

Correct a direct dereference of I/O memory to use an appropriate I/O
memory access function. Note that the pointer being dereferenced is not
currently tagged with `__iomem` but I plan to correct that for 3.7.

Signed-off-by: Ian Abbott <abb...@mev.co.uk>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/staging/comedi/drivers/jr3_pci.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/jr3_pci.c b/drivers/staging/comedi/drivers/jr3_pci.c
index 1d6385a..ae6f40c 100644
--- a/drivers/staging/comedi/drivers/jr3_pci.c
+++ b/drivers/staging/comedi/drivers/jr3_pci.c
@@ -917,7 +917,7 @@ static int jr3_pci_attach(struct comedi_device *dev,
}

/* Reset DSP card */
- devpriv->iobase->channel[0].reset = 0;
+ writel(0, &devpriv->iobase->channel[0].reset);

result = comedi_load_firmware(dev, "jr3pci.idm", jr3_download_firmware);
printk("Firmare load %d\n", result);

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

hfsplus_file_truncate()

From: Vyacheslav Dubeyko <sl...@dubeyko.com>

commit 12f267a20aecf8b84a2a9069b9011f1661c779b4 upstream.

Change a u32 to loff_t hfsplus_file_truncate().

Signed-off-by: Vyacheslav Dubeyko <sl...@dubeyko.com>
Cc: Christoph Hellwig <h...@infradead.org>
Cc: Al Viro <vi...@zeniv.linux.org.uk>
Cc: Hin-Tak Leung <ht...@users.sourceforge.net>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/hfsplus/extents.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/hfsplus/extents.c b/fs/hfsplus/extents.c
index 0022eec..b3d234e 100644
--- a/fs/hfsplus/extents.c
+++ b/fs/hfsplus/extents.c
@@ -447,7 +447,7 @@ void hfsplus_file_truncate(struct inode *inode)
struct address_space *mapping = inode->i_mapping;
struct page *page;
void *fsdata;
- u32 size = inode->i_size;
+ loff_t size = inode->i_size;
int res;

res = pagecache_write_begin(NULL, mapping, size, 0,

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

mp_register_ioapic()

From: Suresh Siddha <suresh....@intel.com>

Lin Bao reported that one of the HP platforms failed to boot
2.6.32 kernel, when the BIOS enabled interrupt-remapping and
x2apic before handing over the control to the Linux kernel.

During boot, Linux kernel masks all the interrupt sources
(8259, IO-APIC RTE's), setup the interrupt-remapping hardware
with the OS controlled table and unmasks the 8259 interrupts
but not the IO-APIC RTE's (as the newly setup interrupt-remapping
table and the IO-APIC RTE's are not yet programmed by the kernel).

Shortly after this, IO-APIC RTE's and the interrupt-remapping table
entries are programmed based on the ACPI tables etc. So the
expectation is that any interrupt during this window will be dropped
and not see the intermediate configuration.

In the reported problematic case, BIOS has configured the IO-APIC
in virtual wire-B mode. Between the window of the kernel setting up
new interrupt-remapping table and the IO-APIC RTE's are properly
configured, an interrupt gets routed by the IO-APIC RTE (setup
by the virtual wire-B configuration) and sees the empty
interrupt-remapping table entry, resulting in vt-d fault causing
the platform to generate NMI. And the OS panics on this unexpected NMI.

This problem doesn't happen with more recent kernels and closer
look at the 2.6.32 kernel shows that the code which masks
the IO-APIC RTE's is not working as expected as the nr_ioapic_registers
for each IO-APIC is not yet initialized at this point. In the later
kernels we initialize nr_ioapic_registers much before and
everything works as expected.

For 2.6.[32..34] kernels, fix this issue by initializing
nr_ioapic_registers early in mp_register_ioapic()

[ Relevant upstream commit info:
commit 7716a5c4ff5f1f3dc5e9edcab125cbf7fceef0af
Author: Eric W. Biederman <ebie...@xmission.com>
Date: Tue Mar 30 01:07:12 2010 -0700

x86, ioapic: Move nr_ioapic_registers calculation to mp_register_ioapic.

As the upstream commit depends on quite a few prior commits
and some followup fixes in the mainline, we just picked
the smallest relevant hunk for fixing the issue at hand.
Problematic platform uses ACPI for IO-APIC, VT-d enumeration etc
and this hunk only touches the ACPI based platforms.

nr_ioapic_reigsters initialization in enable_IO_APIC() is still
retained, so that other configurations like legacy MPS table based
enumeration etc works with no change.
]

Reported-and-tested-by: Zhang, Lin-Bao <linbao...@hp.com>
Signed-off-by: Suresh Siddha <suresh....@intel.com>
Cc: sta...@vger.kernel.org
Reviewed-by: Jonathan Nieder <jrni...@gmail.com>
Acked-by: "Eric W. Biederman" <ebie...@xmission.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/kernel/apic/io_apic.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/apic/io_apic.c b/arch/x86/kernel/apic/io_apic.c
index 8928d97..d256bc3 100644
--- a/arch/x86/kernel/apic/io_apic.c
+++ b/arch/x86/kernel/apic/io_apic.c
@@ -4262,6 +4262,7 @@ static int bad_ioapic(unsigned long address)
void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
{
int idx = 0;
+ int entries;

if (bad_ioapic(address))
return;
@@ -4280,10 +4281,14 @@ void __init mp_register_ioapic(int id, u32 address, u32 gsi_base)
* Build basic GSI lookup table to facilitate gsi->io_apic lookups
* and to prevent reprogramming of IOAPIC pins (PCI GSIs).
*/
+ entries = io_apic_get_redir_entries(idx);
mp_gsi_routing[idx].gsi_base = gsi_base;
- mp_gsi_routing[idx].gsi_end = gsi_base +
- io_apic_get_redir_entries(idx);
+ mp_gsi_routing[idx].gsi_end = gsi_base + entries;

+ /*
+ * The number of IO-APIC IRQ registers (== #pins):
+ */
+ nr_ioapic_registers[idx] = entries + 1;
printk(KERN_INFO "IOAPIC[%d]: apic_id %d, version %d, address 0x%x, "
"GSI %d-%d\n", idx, mp_ioapics[idx].apicid,
mp_ioapics[idx].apicver, mp_ioapics[idx].apicaddr,

Willy Tarreau

unread,

Jun 4, 2013, 6:50:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.d...@gmail.com>

[ Backport of upstream commit 87c48fa3b4630905f98268dde838ee43626a060c ]

Fernando Gont reported current IPv6 fragment identification generation
was not secure, because using a very predictable system-wide generator,
allowing various attacks.

IPv4 uses inetpeer cache to address this problem and to get good
performance. We'll use this mechanism when IPv6 inetpeer is stable
enough in linux-3.1

For the time being, we use jhash on destination address to provide less
predictable identifications. Also remove a spinlock and use cmpxchg() to
get better SMP performance.

Reported-by: Fernando Gont <fern...@gont.com.ar>
Signed-off-by: Eric Dumazet <eric.d...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
[bwh: Backport further to 2.6.32]
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/net/ipv6.h | 12 +-----------
include/net/transp_v6.h | 2 ++
net/ipv6/af_inet6.c | 2 ++
net/ipv6/ip6_output.c | 40 +++++++++++++++++++++++++++++++++++-----
net/ipv6/udp.c | 2 +-
5 files changed, 41 insertions(+), 17 deletions(-)

diff --git a/include/net/ipv6.h b/include/net/ipv6.h
index 639bbf0..52d86da 100644
--- a/include/net/ipv6.h
+++ b/include/net/ipv6.h
@@ -449,17 +449,7 @@ static inline int ipv6_addr_diff(const struct in6_addr *a1, const struct in6_add
return __ipv6_addr_diff(a1, a2, sizeof(struct in6_addr));
}

-static __inline__ void ipv6_select_ident(struct frag_hdr *fhdr)
-{
- static u32 ipv6_fragmentation_id = 1;
- static DEFINE_SPINLOCK(ip6_id_lock);
-
- spin_lock_bh(&ip6_id_lock);
- fhdr->identification = htonl(ipv6_fragmentation_id);
- if (++ipv6_fragmentation_id == 0)
- ipv6_fragmentation_id = 1;
- spin_unlock_bh(&ip6_id_lock);
-}
+extern void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt);

/*
* Prototypes exported by ipv6
diff --git a/include/net/transp_v6.h b/include/net/transp_v6.h
index d65381c..8beefe1 100644
--- a/include/net/transp_v6.h
+++ b/include/net/transp_v6.h
@@ -16,6 +16,8 @@ extern struct proto tcpv6_prot;

struct flowi;

+extern void initialize_hashidentrnd(void);
+
/* extention headers */
extern int ipv6_exthdrs_init(void);
extern void ipv6_exthdrs_exit(void);
diff --git a/net/ipv6/af_inet6.c b/net/ipv6/af_inet6.c
index e127a32..835590d 100644
--- a/net/ipv6/af_inet6.c
+++ b/net/ipv6/af_inet6.c
@@ -1073,6 +1073,8 @@ static int __init inet6_init(void)
goto out;
}

+ initialize_hashidentrnd();
+
err = proto_register(&tcpv6_prot, 1);
if (err)
goto out;
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 9ad5792..6ba0fe2 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -604,6 +604,35 @@ int ip6_find_1stfragopt(struct sk_buff *skb, u8 **nexthdr)
return offset;
}

+static u32 hashidentrnd __read_mostly;
+#define FID_HASH_SZ 16
+static u32 ipv6_fragmentation_id[FID_HASH_SZ];
+
+void __init initialize_hashidentrnd(void)
+{
+ get_random_bytes(&hashidentrnd, sizeof(hashidentrnd));
+}
+
+static u32 __ipv6_select_ident(const struct in6_addr *addr)
+{
+ u32 newid, oldid, hash = jhash2((u32 *)addr, 4, hashidentrnd);
+ u32 *pid = &ipv6_fragmentation_id[hash % FID_HASH_SZ];
+
+ do {
+ oldid = *pid;
+ newid = oldid + 1;
+ if (!(hash + newid))
+ newid++;
+ } while (cmpxchg(pid, oldid, newid) != oldid);
+
+ return hash + newid;
+}
+
+void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt)
+{
+ fhdr->identification = htonl(__ipv6_select_ident(&rt->rt6i_dst.addr));
+}
+
static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
{
struct sk_buff *frag;
@@ -689,7 +718,7 @@ static int ip6_fragment(struct sk_buff *skb, int (*output)(struct sk_buff *))
skb_reset_network_header(skb);
memcpy(skb_network_header(skb), tmp_hdr, hlen);

- ipv6_select_ident(fh);
+ ipv6_select_ident(fh, rt);
fh->nexthdr = nexthdr;
fh->reserved = 0;
fh->frag_off = htons(IP6_MF);
@@ -835,7 +864,7 @@ slow_path:
fh->nexthdr = nexthdr;
fh->reserved = 0;
if (!frag_id) {
- ipv6_select_ident(fh);
+ ipv6_select_ident(fh, rt);
frag_id = fh->identification;
} else
fh->identification = frag_id;
@@ -1039,7 +1068,8 @@ static inline int ip6_ufo_append_data(struct sock *sk,
int getfrag(void *from, char *to, int offset, int len,
int odd, struct sk_buff *skb),
void *from, int length, int hh_len, int fragheaderlen,
- int transhdrlen, int mtu,unsigned int flags)
+ int transhdrlen, int mtu,unsigned int flags,
+ struct rt6_info *rt)

{
struct sk_buff *skb;
@@ -1084,7 +1114,7 @@ static inline int ip6_ufo_append_data(struct sock *sk,
skb_shinfo(skb)->gso_size = (mtu - fragheaderlen -
sizeof(struct frag_hdr)) & ~7;
skb_shinfo(skb)->gso_type = SKB_GSO_UDP;
- ipv6_select_ident(&fhdr);
+ ipv6_select_ident(&fhdr, rt);
skb_shinfo(skb)->ip6_frag_id = fhdr.identification;
__skb_queue_tail(&sk->sk_write_queue, skb);

@@ -1233,7 +1263,7 @@ int ip6_append_data(struct sock *sk, int getfrag(void *from, char *to,

err = ip6_ufo_append_data(sk, getfrag, from, length, hh_len,
fragheaderlen, transhdrlen, mtu,
- flags);
+ flags, rt);
if (err)
goto error;
return 0;
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 9cc6289..d8c0374 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1162,7 +1162,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb, int features)
fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
fptr->nexthdr = nexthdr;
fptr->reserved = 0;
- ipv6_select_ident(fptr);
+ ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));

/* Fragment the skb. ipv6 header and the remaining fields of the
* fragment header are updated in ipv6_gso_segment()

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin LaHaise <bc...@kvack.org>

commit d11a4dc18bf41719c9f0d7ed494d295dd2973b92
Author: Timo Teräs <timo....@iki.fi>
Date: Thu Mar 18 23:20:20 2010 +0000

ipv4: check rt_genid in dst_check

Xfrm_dst keeps a reference to ipv4 rtable entries on each
cached bundle. The only way to renew xfrm_dst when the underlying
route has changed, is to implement dst_check for this. This is
what ipv6 side does too.

The problems started after 87c1e12b5eeb7b30b4b41291bef8e0b41fc3dde9
("ipsec: Fix bogus bundle flowi") which fixed a bug causing xfrm_dst
to not get reused, until that all lookups always generated new
xfrm_dst with new route reference and path mtu worked. But after the
fix, the old routes started to get reused even after they were expired
causing pmtu to break (well it would occationally work if the rtable
gc had run recently and marked the route obsolete causing dst_check to
get called).

Signed-off-by: Timo Teras <timo....@iki.fi>
Acked-by: Herbert Xu <her...@gondor.apana.org.au>

Signed-off-by: David S. Miller <da...@davemloft.net>

This commit is based on the above, with the addition of verifying blackhole
routes in the same manner.

Signed-off-by: Benjamin LaHaise <bc...@kvack.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/route.c | 17 ++++++++++++-----
1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 58f141b..f16d19b 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1412,7 +1412,7 @@ void ip_rt_redirect(__be32 old_gw, __be32 daddr, __be32 new_gw,
dev_hold(rt->u.dst.dev);
if (rt->idev)
in_dev_hold(rt->idev);
- rt->u.dst.obsolete = 0;
+ rt->u.dst.obsolete = -1;
rt->u.dst.lastuse = jiffies;
rt->u.dst.path = &rt->u.dst;
rt->u.dst.neighbour = NULL;
@@ -1477,7 +1477,7 @@ static struct dst_entry *ipv4_negative_advice(struct dst_entry *dst)
struct dst_entry *ret = dst;

if (rt) {
- if (dst->obsolete) {
+ if (dst->obsolete > 0) {
ip_rt_put(rt);
ret = NULL;
} else if ((rt->rt_flags & RTCF_REDIRECTED) ||
@@ -1700,7 +1700,9 @@ static void ip_rt_update_pmtu(struct dst_entry *dst, u32 mtu)

static struct dst_entry *ipv4_dst_check(struct dst_entry *dst, u32 cookie)
{
- return NULL;
+ if (rt_is_expired((struct rtable *)dst))
+ return NULL;
+ return dst;
}

static void ipv4_dst_destroy(struct dst_entry *dst)
@@ -1862,7 +1864,8 @@ static int ip_route_input_mc(struct sk_buff *skb, __be32 daddr, __be32 saddr,
if (!rth)
goto e_nobufs;

- rth->u.dst.output= ip_rt_bug;
+ rth->u.dst.output = ip_rt_bug;
+ rth->u.dst.obsolete = -1;

atomic_set(&rth->u.dst.__refcnt, 1);
rth->u.dst.flags= DST_HOST;
@@ -2023,6 +2026,7 @@ static int __mkroute_input(struct sk_buff *skb,
rth->fl.oif = 0;
rth->rt_spec_dst= spec_dst;

+ rth->u.dst.obsolete = -1;
rth->u.dst.input = ip_forward;
rth->u.dst.output = ip_output;
rth->rt_genid = rt_genid(dev_net(rth->u.dst.dev));
@@ -2187,6 +2191,7 @@ local_input:
goto e_nobufs;

rth->u.dst.output= ip_rt_bug;
+ rth->u.dst.obsolete = -1;
rth->rt_genid = rt_genid(net);

atomic_set(&rth->u.dst.__refcnt, 1);
@@ -2411,7 +2416,8 @@ static int __mkroute_output(struct rtable **result,
rth->rt_gateway = fl->fl4_dst;
rth->rt_spec_dst= fl->fl4_src;

- rth->u.dst.output=ip_output;
+ rth->u.dst.output = ip_output;
+ rth->u.dst.obsolete = -1;
rth->rt_genid = rt_genid(dev_net(dev_out));

RT_CACHE_STAT_INC(out_slow_tot);
@@ -2741,6 +2747,7 @@ static int ipv4_dst_blackhole(struct net *net, struct rtable **rp, struct flowi
if (rt) {
struct dst_entry *new = &rt->u.dst;

+ new->obsolete = -1;
atomic_set(&new->__refcnt, 1);
new->__use = 1;
new->input = dst_discard;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Lachlan McIlroy <lmci...@redhat.com>

commit e6155736ad76b2070652745f9e54cdea3f0d8567 upstream.

In the case where we are allocating for a non-extent file,
we must limit the groups we allocate from to those below
2^32 blocks, and ext4_mb_regular_allocator() attempts to
do this initially by putting a cap on ngroups for the
subsequent search loop.

However, the initial target group comes in from the
allocation context (ac), and it may already be beyond
the artificially limited ngroups. In this case,
the limit

if (group == ngroups)
group = 0;

at the top of the loop is never true, and the loop will
run away.

Catch this case inside the loop and reset the search to
start at group 0.

[san...@redhat.com: add commit msg & comments]

Signed-off-by: Lachlan McIlroy <lmci...@redhat.com>
Signed-off-by: Eric Sandeen <san...@redhat.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/mballoc.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
index c7e8bdb..cecf2a5 100644
--- a/fs/ext4/mballoc.c
+++ b/fs/ext4/mballoc.c
@@ -2070,7 +2070,11 @@ repeat:
group = ac->ac_g_ex.fe_group;

for (i = 0; i < ngroups; group++, i++) {
- if (group == ngroups)
+ /*
+ * Artificially restricted ngroups for non-extent
+ * files makes group > ngroups possible on first loop.
+ */
+ if (group >= ngroups)
group = 0;

/* This now checks without needing the buddy page */

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

install_user_keyrings()

From: David Howells <dhow...@redhat.com>

commit 0da9dfdd2cd9889201bc6f6f43580c99165cd087 upstream.

This fixes CVE-2013-1792.

There is a race in install_user_keyrings() that can cause a NULL pointer
dereference when called concurrently for the same user if the uid and
uid-session keyrings are not yet created. It might be possible for an
unprivileged user to trigger this by calling keyctl() from userspace in
parallel immediately after logging in.

Assume that we have two threads both executing lookup_user_key(), both
looking for KEY_SPEC_USER_SESSION_KEYRING.

THREAD A THREAD B
=============================== ===============================
==>call install_user_keyrings();
if (!cred->user->session_keyring)
==>call install_user_keyrings()
...
user->uid_keyring = uid_keyring;
if (user->uid_keyring)
return 0;
<==
key = cred->user->session_keyring [== NULL]
user->session_keyring = session_keyring;
atomic_inc(&key->usage); [oops]

At the point thread A dereferences cred->user->session_keyring, thread B
hasn't updated user->session_keyring yet, but thread A assumes it is
populated because install_user_keyrings() returned ok.

The race window is really small but can be exploited if, for example,
thread B is interrupted or preempted after initializing uid_keyring, but
before doing setting session_keyring.

This couldn't be reproduced on a stock kernel. However, after placing
systemtap probe on 'user->session_keyring = session_keyring;' that
introduced some delay, the kernel could be crashed reliably.

Fix this by checking both pointers before deciding whether to return.
Alternatively, the test could be done away with entirely as it is checked
inside the mutex - but since the mutex is global, that may not be the best
way.

Signed-off-by: David Howells <dhow...@redhat.com>
Reported-by: Mateusz Guzik <mgu...@redhat.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: James Morris <james.l...@oracle.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

security/keys/process_keys.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/keys/process_keys.c b/security/keys/process_keys.c
index 931cfda..75fb18c 100644
--- a/security/keys/process_keys.c
+++ b/security/keys/process_keys.c
@@ -56,7 +56,7 @@ int install_user_keyrings(void)

kenter("%p{%u}", user, user->uid);

- if (user->uid_keyring) {
+ if (user->uid_keyring && user->session_keyring) {
kleave(" = 0 [exist]");
return 0;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

changes

From: Jan Kara <ja...@suse.cz>

commit b71fc079b5d8f42b2a52743c8d2f1d35d655b1c5 upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <knie...@knielsen-hq.org>
Signed-off-by: Jan Kara <ja...@suse.cz>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/inode.c | 8 ++++++--
1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index efe6363..babf448 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -5121,6 +5121,7 @@ static int ext4_do_update_inode(handle_t *handle,
struct ext4_inode_info *ei = EXT4_I(inode);
struct buffer_head *bh = iloc->bh;
int err = 0, rc, block;
+ int need_datasync = 0;

/* For fields not not tracking in the in-memory inode,
* initialise them to zero for new inodes. */
@@ -5169,7 +5170,10 @@ static int ext4_do_update_inode(handle_t *handle,
raw_inode->i_file_acl_high =
cpu_to_le16(ei->i_file_acl >> 32);
raw_inode->i_file_acl_lo = cpu_to_le32(ei->i_file_acl);
- ext4_isize_set(raw_inode, ei->i_disksize);
+ if (ei->i_disksize != ext4_isize(raw_inode)) {
+ ext4_isize_set(raw_inode, ei->i_disksize);
+ need_datasync = 1;
+ }
if (ei->i_disksize > 0x7fffffffULL) {
struct super_block *sb = inode->i_sb;
if (!EXT4_HAS_RO_COMPAT_FEATURE(sb,
@@ -5222,7 +5226,7 @@ static int ext4_do_update_inode(handle_t *handle,
err = rc;
ext4_clear_inode_state(inode, EXT4_STATE_NEW);

- ext4_update_inode_fsync_trans(handle, inode, 0);
+ ext4_update_inode_fsync_trans(handle, inode, need_datasync);
out_brelse:
brelse(bh);
ext4_std_error(inode->i_sb, err);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

kfree

From: Daniel Borkmann <dbor...@redhat.com>

[ Upstream commit 586c31f3bf04c290dc0a0de7fc91d20aa9a5ee53 ]

For sensitive data like keying material, it is common practice to zero
out keys before returning the memory back to the allocator. Thus, use
kzfree instead of kfree.

Signed-off-by: Daniel Borkmann <dbor...@redhat.com>
Acked-by: Neil Horman <nho...@tuxdriver.com>
Acked-by: Vlad Yasevich <vyas...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sctp/auth.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sctp/auth.c b/net/sctp/auth.c
index 914c419..7363b9f 100644
--- a/net/sctp/auth.c
+++ b/net/sctp/auth.c
@@ -70,7 +70,7 @@ void sctp_auth_key_put(struct sctp_auth_bytes *key)
return;

if (atomic_dec_and_test(&key->refcnt)) {
- kfree(key);
+ kzfree(key);
SCTP_DBG_OBJCNT_DEC(keys);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Ian Abbott <abb...@mev.co.uk>

commit b655c2c4782ed3e2e71d2608154e295a3e860311 upstream.

`s626_enc_insn_config()` is incorrectly dereferencing `insn->data` which
is a pointer to user memory. It should be dereferencing the separate
`data` parameter that points to a copy of the data in kernel memory.

Signed-off-by: Ian Abbott <abb...@mev.co.uk>
Reviewed-by: H Hartley Sweeten <hswe...@visionengravers.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/staging/comedi/drivers/s626.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/drivers/s626.c b/drivers/staging/comedi/drivers/s626.c
index 80d2787..7a7c29f 100644
--- a/drivers/staging/comedi/drivers/s626.c
+++ b/drivers/staging/comedi/drivers/s626.c
@@ -2330,7 +2330,7 @@ static int s626_enc_insn_config(struct comedi_device *dev,
/* (data==NULL) ? (Preloadvalue=0) : (Preloadvalue=data[0]); */

k->SetMode(dev, k, Setup, TRUE);
- Preload(dev, k, *(insn->data));
+ Preload(dev, k, data[0]);
k->PulseIndex(dev, k);
SetLatchSource(dev, k, valueSrclatch);
k->SetEnable(dev, k, (uint16_t) (enab != 0));

Willy Tarreau

unread,

Jun 4, 2013, 7:00:01 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

instead of strcat()

From: Geert Uytterhoeven <ge...@linux-m68k.org>

commit 66081a72517a131430dcf986775f3268aafcb546 upstream.

The warning check for duplicate sysfs entries can cause a buffer overflow
when printing the warning, as strcat() doesn't check buffer sizes.
Use strlcat() instead.

Since strlcat() doesn't return a pointer to the passed buffer, unlike
strcat(), I had to convert the nested concatenation in sysfs_add_one() to
an admittedly more obscure comma operator construct, to avoid emitting code
for the concatenation if CONFIG_BUG is disabled.

Signed-off-by: Geert Uytterhoeven <ge...@linux-m68k.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/sysfs/dir.c | 16 ++++++++--------
1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/fs/sysfs/dir.c b/fs/sysfs/dir.c
index e020183..5e7279a 100644
--- a/fs/sysfs/dir.c
+++ b/fs/sysfs/dir.c
@@ -440,20 +440,18 @@ int __sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
/**
* sysfs_pathname - return full path to sysfs dirent
* @sd: sysfs_dirent whose path we want
- * @path: caller allocated buffer
+ * @path: caller allocated buffer of size PATH_MAX
*
* Gives the name "/" to the sysfs_root entry; any path returned
* is relative to wherever sysfs is mounted.
- *
- * XXX: does no error checking on @path size
*/
static char *sysfs_pathname(struct sysfs_dirent *sd, char *path)
{
if (sd->s_parent) {
sysfs_pathname(sd->s_parent, path);
- strcat(path, "/");
+ strlcat(path, "/", PATH_MAX);
}
- strcat(path, sd->s_name);
+ strlcat(path, sd->s_name, PATH_MAX);
return path;
}

@@ -486,9 +484,11 @@ int sysfs_add_one(struct sysfs_addrm_cxt *acxt, struct sysfs_dirent *sd)
char *path = kzalloc(PATH_MAX, GFP_KERNEL);
WARN(1, KERN_WARNING
"sysfs: cannot create duplicate filename '%s'\n",
- (path == NULL) ? sd->s_name :
- strcat(strcat(sysfs_pathname(acxt->parent_sd, path), "/"),
- sd->s_name));
+ (path == NULL) ? sd->s_name
+ : (sysfs_pathname(acxt->parent_sd, path),
+ strlcat(path, "/", PATH_MAX),
+ strlcat(path, sd->s_name, PATH_MAX),
+ path));
kfree(path);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Alan Stern <st...@rowland.harvard.edu>

commit 0720a06a7518c9d0c0125bd5d1f3b6264c55c3dd upstream.

The utf8s_to_utf16s conversion routine needs to be improved. Unlike
its utf16s_to_utf8s sibling, it doesn't accept arguments specifying
the maximum length of the output buffer or the endianness of its
16-bit output.

This patch (as1501) adds the two missing arguments, and adjusts the
only two places in the kernel where the function is called. A
follow-on patch will add a third caller that does utilize the new
capabilities.

The two conversion routines are still annoyingly inconsistent in the
way they handle invalid byte combinations. But that's a subject for a
different patch.

Signed-off-by: Alan Stern <st...@rowland.harvard.edu>
CC: Clemens Ladisch <cle...@ladisch.de>
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
[bwh: Bakckported to 2.6.32: drop Hyper-V change]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/fat/namei_vfat.c | 3 ++-
fs/nls/nls_base.c | 43 +++++++++++++++++++++++++++++++++----------
include/linux/nls.h | 5 +++--
3 files changed, 38 insertions(+), 13 deletions(-)

diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 67b3df1..4251f35 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -499,7 +499,8 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
int charlen;

if (utf8) {
- *outlen = utf8s_to_utf16s(name, len, (wchar_t *)outname);
+ *outlen = utf8s_to_utf16s(name, len, UTF16_HOST_ENDIAN,
+ (wchar_t *) outname, FAT_LFN_LEN + 2);
if (*outlen < 0)
return *outlen;
else if (*outlen > FAT_LFN_LEN)
diff --git a/fs/nls/nls_base.c b/fs/nls/nls_base.c
index 44a88a9..0eb059e 100644
--- a/fs/nls/nls_base.c
+++ b/fs/nls/nls_base.c
@@ -114,34 +114,57 @@ int utf32_to_utf8(unicode_t u, u8 *s, int maxlen)
}
EXPORT_SYMBOL(utf32_to_utf8);

-int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs)
+static inline void put_utf16(wchar_t *s, unsigned c, enum utf16_endian endian)
+{
+ switch (endian) {
+ default:
+ *s = (wchar_t) c;
+ break;
+ case UTF16_LITTLE_ENDIAN:
+ *s = __cpu_to_le16(c);
+ break;
+ case UTF16_BIG_ENDIAN:
+ *s = __cpu_to_be16(c);
+ break;
+ }
+}
+
+int utf8s_to_utf16s(const u8 *s, int len, enum utf16_endian endian,
+ wchar_t *pwcs, int maxlen)
{
u16 *op;
int size;
unicode_t u;

op = pwcs;
- while (*s && len > 0) {
+ while (len > 0 && maxlen > 0 && *s) {
if (*s & 0x80) {
size = utf8_to_utf32(s, len, &u);
if (size < 0)
return -EINVAL;
+ s += size;
+ len -= size;

if (u >= PLANE_SIZE) {
+ if (maxlen < 2)
+ break;
u -= PLANE_SIZE;
- *op++ = (wchar_t) (SURROGATE_PAIR |
- ((u >> 10) & SURROGATE_BITS));
- *op++ = (wchar_t) (SURROGATE_PAIR |
+ put_utf16(op++, SURROGATE_PAIR |
+ ((u >> 10) & SURROGATE_BITS),
+ endian);
+ put_utf16(op++, SURROGATE_PAIR |
SURROGATE_LOW |
- (u & SURROGATE_BITS));
+ (u & SURROGATE_BITS),
+ endian);
+ maxlen -= 2;
} else {
- *op++ = (wchar_t) u;
+ put_utf16(op++, u, endian);
+ maxlen--;
}
- s += size;
- len -= size;
} else {
- *op++ = *s++;
+ put_utf16(op++, *s++, endian);
len--;
+ maxlen--;
}
}
return op - pwcs;
diff --git a/include/linux/nls.h b/include/linux/nls.h
index d47beef..5dc635f 100644
--- a/include/linux/nls.h
+++ b/include/linux/nls.h
@@ -43,7 +43,7 @@ enum utf16_endian {
UTF16_BIG_ENDIAN
};

-/* nls.c */
+/* nls_base.c */
extern int register_nls(struct nls_table *);
extern int unregister_nls(struct nls_table *);
extern struct nls_table *load_nls(char *);
@@ -52,7 +52,8 @@ extern struct nls_table *load_nls_default(void);

extern int utf8_to_utf32(const u8 *s, int len, unicode_t *pu);
extern int utf32_to_utf8(unicode_t u, u8 *s, int maxlen);
-extern int utf8s_to_utf16s(const u8 *s, int len, wchar_t *pwcs);
+extern int utf8s_to_utf16s(const u8 *s, int len,
+ enum utf16_endian endian, wchar_t *pwcs, int maxlen);
extern int utf16s_to_utf8s(const wchar_t *pwcs, int len,
enum utf16_endian endian, u8 *s, int maxlen);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Wong <normal...@yhbt.net>

commit 128dd1759d96ad36c379240f8b9463e8acfd37a1 upstream.

EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
ensure events are not missed. Since the modifications to the interest
mask are not protected by the same lock as ep_poll_callback, we need to
ensure the change is visible to other CPUs calling ep_poll_callback.

We also need to ensure f_op->poll() has an up-to-date view of past
events which occured before we modified the interest mask. So this
barrier also pairs with the barrier in wq_has_sleeper().

This should guarantee either ep_poll_callback or f_op->poll() (or both)
will notice the readiness of a recently-ready/modified item.

This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
http://thread.gmane.org/gmane.linux.kernel/1408782/

Signed-off-by: Eric Wong <normal...@yhbt.net>
Cc: Hans Verkuil <hans.v...@cisco.com>
Cc: Jiri Olsa <jo...@redhat.com>
Cc: Jonathan Corbet <cor...@lwn.net>
Cc: Al Viro <vi...@zeniv.linux.org.uk>
Cc: Davide Libenzi <dav...@xmailserver.org>
Cc: Hans de Goede <hdeg...@redhat.com>
Cc: Mauro Carvalho Chehab <mch...@infradead.org>
Cc: David Miller <da...@davemloft.net>
Cc: Eric Dumazet <eric.d...@gmail.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Andreas Voellmy <andreas...@yale.edu>
Tested-by: "Junchang(Jason) Wang" <juncha...@yale.edu>
Cc: net...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[bwh: Backported to 3.2: adjust context]
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/eventpoll.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index ff57421..83fbd64 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -1183,10 +1183,30 @@ static int ep_modify(struct eventpoll *ep, struct epitem *epi, struct epoll_even
* otherwise we might miss an event that happens between the
* f_op->poll() call and the new event set registering.
*/
- epi->event.events = event->events;
+ epi->event.events = event->events; /* need barrier below */
epi->event.data = event->data; /* protected by mtx */

/*
+ * The following barrier has two effects:
+ *
+ * 1) Flush epi changes above to other CPUs. This ensures
+ * we do not miss events from ep_poll_callback if an
+ * event occurs immediately after we call f_op->poll().
+ * We need this because we did not take ep->lock while
+ * changing epi above (but ep_poll_callback does take
+ * ep->lock).
+ *
+ * 2) We also need to ensure we do not miss _past_ events
+ * when calling f_op->poll(). This barrier also
+ * pairs with the barrier in wq_has_sleeper (see
+ * comments for wq_has_sleeper).
+ *
+ * This barrier will now guarantee ep_poll_callback or f_op->poll
+ * (or both) will notice the readiness of an item.
+ */
+ smp_mb();
+
+ /*
* Get current event bits. We can safely use the file* here because
* its usage count has been increased by the caller of this function.
*/

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit 792039c73cf176c8e39a6e8beef2c94ff46522ed upstream.

The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit

memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>

Cc: Marcel Holtmann <mar...@holtmann.org>
Cc: Gustavo Padovan <gus...@padovan.org>
Cc: Johan Hedberg <johan....@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[bwh: Backported to 2.6.32: adjust filename]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/bluetooth/l2cap.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/l2cap.c b/net/bluetooth/l2cap.c
index 71120ee..1c20bd9 100644
--- a/net/bluetooth/l2cap.c
+++ b/net/bluetooth/l2cap.c
@@ -1184,6 +1184,7 @@ static int l2cap_sock_getname(struct socket *sock, struct sockaddr *addr, int *l

BT_DBG("sock %p, sk %p", sock, sk);

+ memset(la, 0, sizeof(struct sockaddr_l2));
addr->sa_family = AF_BLUETOOTH;
*len = sizeof(struct sockaddr_l2);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>

[ Upstream commit f4541d60a449afd40448b06496dcd510f505928e ]

A long standing problem with TSO is the fact that tcp_tso_should_defer()
rearms the deferred timer, while it should not.

Current code leads to following bad bursty behavior :

20:11:24.484333 IP A > B: . 297161:316921(19760) ack 1 win 119
20:11:24.484337 IP B > A: . ack 263721 win 1117
20:11:24.485086 IP B > A: . ack 265241 win 1117
20:11:24.485925 IP B > A: . ack 266761 win 1117
20:11:24.486759 IP B > A: . ack 268281 win 1117
20:11:24.487594 IP B > A: . ack 269801 win 1117
20:11:24.488430 IP B > A: . ack 271321 win 1117
20:11:24.489267 IP B > A: . ack 272841 win 1117
20:11:24.490104 IP B > A: . ack 274361 win 1117
20:11:24.490939 IP B > A: . ack 275881 win 1117
20:11:24.491775 IP B > A: . ack 277401 win 1117
20:11:24.491784 IP A > B: . 316921:332881(15960) ack 1 win 119
20:11:24.492620 IP B > A: . ack 278921 win 1117
20:11:24.493448 IP B > A: . ack 280441 win 1117
20:11:24.494286 IP B > A: . ack 281961 win 1117
20:11:24.495122 IP B > A: . ack 283481 win 1117
20:11:24.495958 IP B > A: . ack 285001 win 1117
20:11:24.496791 IP B > A: . ack 286521 win 1117
20:11:24.497628 IP B > A: . ack 288041 win 1117
20:11:24.498459 IP B > A: . ack 289561 win 1117
20:11:24.499296 IP B > A: . ack 291081 win 1117
20:11:24.500133 IP B > A: . ack 292601 win 1117
20:11:24.500970 IP B > A: . ack 294121 win 1117
20:11:24.501388 IP B > A: . ack 295641 win 1117
20:11:24.501398 IP A > B: . 332881:351881(19000) ack 1 win 119

While the expected behavior is more like :

20:19:49.259620 IP A > B: . 197601:202161(4560) ack 1 win 119
20:19:49.260446 IP B > A: . ack 154281 win 1212
20:19:49.261282 IP B > A: . ack 155801 win 1212
20:19:49.262125 IP B > A: . ack 157321 win 1212
20:19:49.262136 IP A > B: . 202161:206721(4560) ack 1 win 119
20:19:49.262958 IP B > A: . ack 158841 win 1212
20:19:49.263795 IP B > A: . ack 160361 win 1212
20:19:49.264628 IP B > A: . ack 161881 win 1212
20:19:49.264637 IP A > B: . 206721:211281(4560) ack 1 win 119
20:19:49.265465 IP B > A: . ack 163401 win 1212
20:19:49.265886 IP B > A: . ack 164921 win 1212
20:19:49.266722 IP B > A: . ack 166441 win 1212
20:19:49.266732 IP A > B: . 211281:215841(4560) ack 1 win 119
20:19:49.267559 IP B > A: . ack 167961 win 1212
20:19:49.268394 IP B > A: . ack 169481 win 1212
20:19:49.269232 IP B > A: . ack 171001 win 1212
20:19:49.269241 IP A > B: . 215841:221161(5320) ack 1 win 119

Signed-off-by: Eric Dumazet <edum...@google.com>
Cc: Yuchung Cheng <ych...@google.com>
Cc: Van Jacobson <va...@google.com>
Cc: Neal Cardwell <ncar...@google.com>
Cc: Nandita Dukkipati <nand...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/tcp_output.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index af83bdf..38a23e4 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1391,8 +1391,11 @@ static int tcp_tso_should_defer(struct sock *sk, struct sk_buff *skb)
goto send_now;
}

- /* Ok, it looks like it is advisable to defer. */
- tp->tso_deferred = 1 | (jiffies << 1);
+ /* Ok, it looks like it is advisable to defer.
+ * Do not rearm the timer if already set to not break TCP ACK clocking.
+ */
+ if (!tp->tso_deferred)
+ tp->tso_deferred = 1 | (jiffies << 1);

return 1;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>

In various network workloads, __do_softirq() latencies can be up
to 20 ms if HZ=1000, and 200 ms if HZ=100.

This is because we iterate 10 times in the softirq dispatcher,
and some actions can consume a lot of cycles.

This patch changes the fallback to ksoftirqd condition to :

- A time limit of 2 ms.
- need_resched() being set on current task

When one of this condition is met, we wakeup ksoftirqd for further
softirq processing if we still have pending softirqs.

Using need_resched() as the only condition can trigger RCU stalls,
as we can keep BH disabled for too long.

I ran several benchmarks and got no significant difference in
throughput, but a very significant reduction of latencies (one order
of magnitude) :

In following bench, 200 antagonist "netperf -t TCP_RR" are started in
background, using all available cpus.

Then we start one "netperf -t TCP_RR", bound to the cpu handling the NIC
IRQ (hard+soft)

Before patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=550110.424
MIN_LATENCY=146858
MAX_LATENCY=997109
P50_LATENCY=305000
P90_LATENCY=550000
P99_LATENCY=710000
MEAN_LATENCY=376989.12
STDDEV_LATENCY=184046.92

After patch :

RT_LATENCY,MIN_LATENCY,MAX_LATENCY,P50_LATENCY,P90_LATENCY,P99_LATENCY,MEAN_LATENCY,STDDEV_LATENCY
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET
to 7.7.7.84 () port 0 AF_INET : first burst 0 : cpu bind
RT_LATENCY=40545.492
MIN_LATENCY=9834
MAX_LATENCY=78366
P50_LATENCY=33583
P90_LATENCY=59000
P99_LATENCY=69000
MEAN_LATENCY=38364.67
STDDEV_LATENCY=12865.26

Signed-off-by: Eric Dumazet <edum...@google.com>
Cc: David Miller <da...@davemloft.net>
Cc: Tom Herbert <ther...@google.com>
Cc: Ben Hutchings <bhutc...@solarflare.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

(cherry picked from commit c10d73671ad30f54692f7f69f0e09e75d3a8926a)

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/softirq.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/kernel/softirq.c b/kernel/softirq.c
index 04a0252..d75c136 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -194,21 +194,21 @@ void local_bh_enable_ip(unsigned long ip)
EXPORT_SYMBOL(local_bh_enable_ip);

/*
- * We restart softirq processing MAX_SOFTIRQ_RESTART times,
- * and we fall back to softirqd after that.
+ * We restart softirq processing for at most 2 ms,
+ * and if need_resched() is not set.
*
- * This number has been established via experimentation.
+ * These limits have been established via experimentation.
* The two things to balance is latency against fairness -
* we want to handle softirqs as soon as possible, but they
* should not be able to lock up the box.
*/
-#define MAX_SOFTIRQ_RESTART 10
+#define MAX_SOFTIRQ_TIME msecs_to_jiffies(2)

asmlinkage void __do_softirq(void)
{
struct softirq_action *h;
__u32 pending;
- int max_restart = MAX_SOFTIRQ_RESTART;
+ unsigned long end = jiffies + MAX_SOFTIRQ_TIME;
int cpu;

pending = local_softirq_pending();
@@ -253,11 +253,12 @@ restart:
local_irq_disable();

pending = local_softirq_pending();
- if (pending && --max_restart)
- goto restart;
+ if (pending) {
+ if (time_before(jiffies, end) && !need_resched())
+ goto restart;

- if (pending)
wakeup_softirqd();
+ }

lockdep_softirq_exit();

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

invalid

From: Jozsef Kadlecsik <kad...@blackhole.kfki.hu>

commit 07153c6ec074257ade76a461429b567cff2b3a1e upstream.

It was reported that the Linux kernel sometimes logs:

klogd: [2629147.402413] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 447!
klogd: [1072212.887368] kernel BUG at net / netfilter /
nf_conntrack_proto_tcp.c: 392

ipv4_get_l4proto() in nf_conntrack_l3proto_ipv4.c and tcp_error() in
nf_conntrack_proto_tcp.c should catch malformed packets, so the errors
at the indicated lines - TCP options parsing - should not happen.
However, tcp_error() relies on the "dataoff" offset to the TCP header,
calculated by ipv4_get_l4proto(). But ipv4_get_l4proto() does not check
bogus ihl values in IPv4 packets, which then can slip through tcp_error()
and get caught at the TCP options parsing routines.

The patch fixes ipv4_get_l4proto() by invalidating packets with bogus
ihl value.

The patch closes netfilter bugzilla id 771.

Signed-off-by: Jozsef Kadlecsik <kad...@blackhole.kfki.hu>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
Acked-by: David Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c | 8 ++++++++
1 file changed, 8 insertions(+)

diff --git a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
index 1032a15..c6437d5 100644
--- a/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
+++ b/net/ipv4/netfilter/nf_conntrack_l3proto_ipv4.c
@@ -83,6 +83,14 @@ static int ipv4_get_l4proto(const struct sk_buff *skb, unsigned int nhoff,
*dataoff = nhoff + (iph->ihl << 2);
*protonum = iph->protocol;

+ /* Check bogus IP headers */
+ if (*dataoff > skb->len) {
+ pr_debug("nf_conntrack_ipv4: bogus IPv4 packet: "
+ "nhoff %u, ihl %u, skblen %u\n",
+ nhoff, iph->ihl << 2, skb->len);
+ return -NF_ACCEPT;
+ }
+
return NF_ACCEPT;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Stefan Hasko <hasko...@gmail.com>

[ Upstream commit d2fe85da52e89b8012ffad010ef352a964725d5f ]

Fixed integer overflow in function htb_dequeue

Signed-off-by: Stefan Hasko <hasko...@gmail.com>
Acked-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sched/sch_htb.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sched/sch_htb.c b/net/sched/sch_htb.c
index 85acab9..2f074d6 100644
--- a/net/sched/sch_htb.c
+++ b/net/sched/sch_htb.c
@@ -865,7 +865,7 @@ static struct sk_buff *htb_dequeue(struct Qdisc *sch)
q->now = psched_get_time();
start_at = jiffies;

- next_event = q->now + 5 * PSCHED_TICKS_PER_SEC;
+ next_event = q->now + 5LLU * PSCHED_TICKS_PER_SEC;

for (level = 0; level < TC_HTB_MAXDEPTH; level++) {
/* common case optimization - skip event handler quickly */

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Wolfgang Frisch <wf...@roembden.net>

commit 1ee0a224bc9aad1de496c795f96bc6ba2c394811 upstream

The tty is NULL when the port is hanging up.
chase_port() needs to check for this.

This patch is intended for stable series.
The behavior was observed and tested in Linux 3.2 and 3.7.1.

Johan Hovold submitted a more elaborate patch for the mainline kernel.

[ 56.277883] usb 1-1: edge_bulk_in_callback - nonzero read bulk status received: -84
[ 56.278811] usb 1-1: USB disconnect, device number 3
[ 56.278856] usb 1-1: edge_bulk_in_callback - stopping read!
[ 56.279562] BUG: unable to handle kernel NULL pointer dereference at 00000000000001c8
[ 56.280536] IP: [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.281212] PGD 1dc1b067 PUD 1e0f7067 PMD 0
[ 56.282085] Oops: 0002 [#1] SMP
[ 56.282744] Modules linked in:
[ 56.283512] CPU 1
[ 56.283512] Pid: 25, comm: khubd Not tainted 3.7.1 #1 innotek GmbH VirtualBox/VirtualBox
[ 56.283512] RIP: 0010:[<ffffffff8144e62a>] [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.283512] RSP: 0018:ffff88001fa99ab0 EFLAGS: 00010046
[ 56.283512] RAX: 0000000000000046 RBX: 00000000000001c8 RCX: 0000000000640064
[ 56.283512] RDX: 0000000000010000 RSI: ffff88001fa99b20 RDI: 00000000000001c8
[ 56.283512] RBP: ffff88001fa99b20 R08: 0000000000000000 R09: 0000000000000000
[ 56.283512] R10: 0000000000000000 R11: ffffffff812fcb4c R12: ffff88001ddf53c0
[ 56.283512] R13: 0000000000000000 R14: 00000000000001c8 R15: ffff88001e19b9f4
[ 56.283512] FS: 0000000000000000(0000) GS:ffff88001fd00000(0000) knlGS:0000000000000000
[ 56.283512] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 56.283512] CR2: 00000000000001c8 CR3: 000000001dc51000 CR4: 00000000000006e0
[ 56.283512] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 56.283512] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 56.283512] Process khubd (pid: 25, threadinfo ffff88001fa98000, task ffff88001fa94f80)
[ 56.283512] Stack:
[ 56.283512] 0000000000000046 00000000000001c8 ffffffff810578ec ffffffff812fcb4c
[ 56.283512] ffff88001e19b980 0000000000002710 ffffffff812ffe81 0000000000000001
[ 56.283512] ffff88001fa94f80 0000000000000202 ffffffff00000001 0000000000000296
[ 56.283512] Call Trace:
[ 56.283512] [<ffffffff810578ec>] ? add_wait_queue+0x12/0x3c
[ 56.283512] [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[ 56.283512] [<ffffffff812ffe81>] ? chase_port+0x84/0x2d6
[ 56.283512] [<ffffffff81063f27>] ? try_to_wake_up+0x199/0x199
[ 56.283512] [<ffffffff81263a5c>] ? tty_ldisc_hangup+0x222/0x298
[ 56.283512] [<ffffffff81300171>] ? edge_close+0x64/0x129
[ 56.283512] [<ffffffff810612f7>] ? __wake_up+0x35/0x46
[ 56.283512] [<ffffffff8106135b>] ? should_resched+0x5/0x23
[ 56.283512] [<ffffffff81264916>] ? tty_port_shutdown+0x39/0x44
[ 56.283512] [<ffffffff812fcb4c>] ? usb_serial_port_work+0x28/0x28
[ 56.283512] [<ffffffff8125d38c>] ? __tty_hangup+0x307/0x351
[ 56.283512] [<ffffffff812e6ddc>] ? usb_hcd_flush_endpoint+0xde/0xed
[ 56.283512] [<ffffffff8144e625>] ? _raw_spin_lock_irqsave+0x14/0x35
[ 56.283512] [<ffffffff812fd361>] ? usb_serial_disconnect+0x57/0xc2
[ 56.283512] [<ffffffff812ea99b>] ? usb_unbind_interface+0x5c/0x131
[ 56.283512] [<ffffffff8128d738>] ? __device_release_driver+0x7f/0xd5
[ 56.283512] [<ffffffff8128d9cd>] ? device_release_driver+0x1a/0x25
[ 56.283512] [<ffffffff8128d393>] ? bus_remove_device+0xd2/0xe7
[ 56.283512] [<ffffffff8128b7a3>] ? device_del+0x119/0x167
[ 56.283512] [<ffffffff812e8d9d>] ? usb_disable_device+0x6a/0x180
[ 56.283512] [<ffffffff812e2ae0>] ? usb_disconnect+0x81/0xe6
[ 56.283512] [<ffffffff812e4435>] ? hub_thread+0x577/0xe82
[ 56.283512] [<ffffffff8144daa7>] ? __schedule+0x490/0x4be
[ 56.283512] [<ffffffff8105798f>] ? abort_exclusive_wait+0x79/0x79
[ 56.283512] [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[ 56.283512] [<ffffffff812e3ebe>] ? usb_remote_wakeup+0x2f/0x2f
[ 56.283512] [<ffffffff810570b4>] ? kthread+0x81/0x89
[ 56.283512] [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[ 56.283512] [<ffffffff8145387c>] ? ret_from_fork+0x7c/0xb0
[ 56.283512] [<ffffffff81057033>] ? __kthread_parkme+0x5c/0x5c
[ 56.283512] Code: 8b 7c 24 08 e8 17 0b c3 ff 48 8b 04 24 48 83 c4 10 c3 53 48 89 fb 41 50 e8 e0 0a c3 ff 48 89 04 24 e8 e7 0a c3 ff ba 00 00 01 00
<f0> 0f c1 13 48 8b 04 24 89 d1 c1 ea 10 66 39 d1 74 07 f3 90 66
[ 56.283512] RIP [<ffffffff8144e62a>] _raw_spin_lock_irqsave+0x19/0x35
[ 56.283512] RSP <ffff88001fa99ab0>
[ 56.283512] CR2: 00000000000001c8
[ 56.283512] ---[ end trace 49714df27e1679ce ]---

Signed-off-by: Wolfgang Frisch <wf...@roembden.net>
Cc: Johan Hovold <jho...@gmail.com>
Cc: stable <sta...@vger.kernel.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/io_ti.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/serial/io_ti.c b/drivers/usb/serial/io_ti.c
index 14d51e6..cf515f0 100644
--- a/drivers/usb/serial/io_ti.c
+++ b/drivers/usb/serial/io_ti.c
@@ -574,6 +574,9 @@ static void chase_port(struct edgeport_port *port, unsigned long timeout,
wait_queue_t wait;
unsigned long flags;

+ if (!tty)
+ return;
+
if (!timeout)
timeout = (HZ * EDGE_CLOSING_WAIT)/100;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit 7b789836f434c87168eab067cfbed1ec4783dffd ]

The memory reserved to dump the xfrm policy includes multiple padding
bytes added by the compiler for alignment (padding bytes in struct
xfrm_selector and struct xfrm_userpolicy_info). Add an explicit
memset(0) before filling the buffer to avoid the heap info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Acked-by: Steffen Klassert <steffen....@secunet.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/xfrm/xfrm_user.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index 4823a15..3de81fe 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1076,6 +1076,7 @@ static void copy_from_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy

static void copy_to_user_policy(struct xfrm_policy *xp, struct xfrm_userpolicy_info *p, int dir)
{
+ memset(p, 0, sizeof(*p));
memcpy(&p->sel, &xp->selector, sizeof(p->sel));
memcpy(&p->lft, &xp->lft, sizeof(p->lft));
memcpy(&p->curlft, &xp->curlft, sizeof(p->curlft));

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Kevin Dankwardt <k...@kcomputing.com>

commit eeb5b4ae81f4a750355fa0c15f4fea22fdf83be1 upstream.

I found that the length of a file name when created cannot exceed 255
characters, yet, pathconf(), via statfs(), returns the maximum as 260.

Signed-off-by: Kevin Dankwardt <k...@kcomputing.com>
Signed-off-by: OGAWA Hirofumi <hiro...@mail.parknet.co.jp>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/fat/inode.c | 2 +-
fs/fat/namei_vfat.c | 6 +++---
include/linux/msdos_fs.h | 3 ++-
3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/fs/fat/inode.c b/fs/fat/inode.c
index 76b7961..c187e92 100644
--- a/fs/fat/inode.c
+++ b/fs/fat/inode.c
@@ -558,7 +558,7 @@ static int fat_statfs(struct dentry *dentry, struct kstatfs *buf)
buf->f_bavail = sbi->free_clusters;
buf->f_fsid.val[0] = (u32)id;
buf->f_fsid.val[1] = (u32)(id >> 32);
- buf->f_namelen = sbi->options.isvfat ? 260 : 12;
+ buf->f_namelen = sbi->options.isvfat ? FAT_LFN_LEN : 12;

return 0;
}
diff --git a/fs/fat/namei_vfat.c b/fs/fat/namei_vfat.c
index 72646e2..67b3df1 100644
--- a/fs/fat/namei_vfat.c
+++ b/fs/fat/namei_vfat.c
@@ -502,14 +502,14 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,

*outlen = utf8s_to_utf16s(name, len, (wchar_t *)outname);

if (*outlen < 0)
return *outlen;

- else if (*outlen > 255)
+ else if (*outlen > FAT_LFN_LEN)
return -ENAMETOOLONG;

op = &outname[*outlen * sizeof(wchar_t)];
} else {
if (nls) {
for (i = 0, ip = name, op = outname, *outlen = 0;
- i < len && *outlen <= 255;
+ i < len && *outlen <= FAT_LFN_LEN;
*outlen += 1)
{
if (escape && (*ip == ':')) {
@@ -549,7 +549,7 @@ xlate_to_uni(const unsigned char *name, int len, unsigned char *outname,
return -ENAMETOOLONG;
} else {
for (i = 0, ip = name, op = outname, *outlen = 0;
- i < len && *outlen <= 255;
+ i < len && *outlen <= FAT_LFN_LEN;
i++, *outlen += 1)
{
*op++ = *ip++;
diff --git a/include/linux/msdos_fs.h b/include/linux/msdos_fs.h
index ce38f1c..34066e6 100644
--- a/include/linux/msdos_fs.h
+++ b/include/linux/msdos_fs.h
@@ -15,6 +15,7 @@
#define MSDOS_DPB_BITS 4 /* log2(MSDOS_DPB) */
#define MSDOS_DPS (SECTOR_SIZE / sizeof(struct msdos_dir_entry))
#define MSDOS_DPS_BITS 4 /* log2(MSDOS_DPS) */
+#define MSDOS_LONGNAME 256 /* maximum name length */
#define CF_LE_W(v) le16_to_cpu(v)
#define CF_LE_L(v) le32_to_cpu(v)
#define CT_LE_W(v) cpu_to_le16(v)
@@ -47,8 +48,8 @@
#define DELETED_FLAG 0xe5 /* marks file as deleted when in name[0] */
#define IS_FREE(n) (!*(n) || *(n) == DELETED_FLAG)

+#define FAT_LFN_LEN 255 /* maximum long name length */
#define MSDOS_NAME 11 /* maximum name length */
-#define MSDOS_LONGNAME 256 /* maximum name length */
#define MSDOS_SLOTS 21 /* max # of slots for short and long names */
#define MSDOS_DOT ". " /* ".", padded to MSDOS_NAME chars */
#define MSDOS_DOTDOT ".. " /* "..", padded to MSDOS_NAME chars */

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.ca...@oracle.com>

[Not needed in 3.8 or newer as this driver is removed there. - gregkh]

We get this from user space and nothing has been done to ensure that
these strings are NUL terminated.

Reported-by: Chen Gang <gang...@asianux.com>
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/telephony/ixj.c | 24 ++++++++++++------------
1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/drivers/telephony/ixj.c b/drivers/telephony/ixj.c
index 40de151..56eb6cc 100644
--- a/drivers/telephony/ixj.c
+++ b/drivers/telephony/ixj.c
@@ -3190,12 +3190,12 @@ static void ixj_write_cid(IXJ *j)

ixj_fsk_alloc(j);

- strcpy(sdmf1, j->cid_send.month);
- strcat(sdmf1, j->cid_send.day);
- strcat(sdmf1, j->cid_send.hour);
- strcat(sdmf1, j->cid_send.min);
- strcpy(sdmf2, j->cid_send.number);
- strcpy(sdmf3, j->cid_send.name);
+ strlcpy(sdmf1, j->cid_send.month, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.day, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.hour, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.min, sizeof(sdmf1));
+ strlcpy(sdmf2, j->cid_send.number, sizeof(sdmf2));
+ strlcpy(sdmf3, j->cid_send.name, sizeof(sdmf3));

len1 = strlen(sdmf1);
len2 = strlen(sdmf2);
@@ -3340,12 +3340,12 @@ static void ixj_write_cidcw(IXJ *j)
ixj_pre_cid(j);
}
j->flags.cidcw_ack = 0;
- strcpy(sdmf1, j->cid_send.month);
- strcat(sdmf1, j->cid_send.day);
- strcat(sdmf1, j->cid_send.hour);
- strcat(sdmf1, j->cid_send.min);
- strcpy(sdmf2, j->cid_send.number);
- strcpy(sdmf3, j->cid_send.name);
+ strlcpy(sdmf1, j->cid_send.month, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.day, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.hour, sizeof(sdmf1));
+ strlcat(sdmf1, j->cid_send.min, sizeof(sdmf1));
+ strlcpy(sdmf2, j->cid_send.number, sizeof(sdmf2));
+ strlcpy(sdmf3, j->cid_send.name, sizeof(sdmf3));

len1 = strlen(sdmf1);
len2 = strlen(sdmf2);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit 29cd8ae0e1a39e239a3a7b67da1986add1199fc0 upstream.

The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
struct,

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <min...@googlemail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[bwh: Backported to 2.6.32: no support for IEEE or CEE commands, so only
deal with perm_addr]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/dcb/dcbnl.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/dcb/dcbnl.c b/net/dcb/dcbnl.c
index ac1205d..813fe4b 100644
--- a/net/dcb/dcbnl.c
+++ b/net/dcb/dcbnl.c
@@ -307,6 +307,7 @@ static int dcbnl_getperm_hwaddr(struct net_device *netdev, struct nlattr **tb,
dcb->dcb_family = AF_UNSPEC;
dcb->cmd = DCB_CMD_GPERM_HWADDR;

+ memset(perm_addr, 0, sizeof(perm_addr));
netdev->dcbnl_ops->getpermhwaddr(netdev, perm_addr);

ret = nla_put(dcbnl_skb, DCB_ATTR_PERM_HWADDR, sizeof(perm_addr),

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Weiping Pan <wp...@redhat.com>

commit 06b6a1cf6e776426766298d055bb3991957d90a7 upstream

Jay Fenlason (fenl...@redhat.com) found a bug,
that recvfrom() on an RDS socket can return the contents of random kernel
memory to userspace if it was called with a address length larger than
sizeof(struct sockaddr_in).
rds_recvmsg() also fails to set the addr_len paramater properly before
returning, but that's just a bug.
There are also a number of cases wher recvfrom() can return an entirely bogus
address. Anything in rds_recvmsg() that returns a non-negative value but does
not go through the "sin = (struct sockaddr_in *)msg->msg_name;" code path
at the end of the while(1) loop will return up to 128 bytes of kernel memory
to userspace.

And I write two test programs to reproduce this bug, you will see that in
rds_server, fromAddr will be overwritten and the following sock_fd will be
destroyed.
Yes, it is the programmer's fault to set msg_namelen incorrectly, but it is
better to make the kernel copy the real length of address to user space in
such case.

How to run the test programs ?
I test them on 32bit x86 system, 3.5.0-rc7.

1 compile
gcc -o rds_client rds_client.c
gcc -o rds_server rds_server.c

2 run ./rds_server on one console

3 run ./rds_client on another console

4 you will see something like:
server is waiting to receive data...
old socket fd=3
server received data from client:data from client
msg.msg_namelen=32
new socket fd=-1067277685
sendmsg()
: Bad file descriptor

/***************** rds_client.c ********************/

int main(void)
{
int sock_fd;
struct sockaddr_in serverAddr;
struct sockaddr_in toAddr;
char recvBuffer[128] = "data from client";
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if (sock_fd < 0) {
perror("create socket error\n");
exit(1);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4001);

if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind() error\n");
close(sock_fd);
exit(1);
}

memset(&toAddr, 0, sizeof(toAddr));
toAddr.sin_family = AF_INET;
toAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
toAddr.sin_port = htons(4000);
msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = strlen(recvBuffer) + 1;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendto() error\n");
close(sock_fd);
exit(1);
}

printf("client send data:%s\n", recvBuffer);

memset(recvBuffer, '\0', 128);

msg.msg_name = &toAddr;
msg.msg_namelen = sizeof(toAddr);
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}

printf("receive data from server:%s\n", recvBuffer);

close(sock_fd);

return 0;
}

/***************** rds_server.c ********************/

int main(void)
{
struct sockaddr_in fromAddr;
int sock_fd;
struct sockaddr_in serverAddr;
unsigned int addrLen;
char recvBuffer[128];
struct msghdr msg;
struct iovec iov;

sock_fd = socket(AF_RDS, SOCK_SEQPACKET, 0);
if(sock_fd < 0) {
perror("create socket error\n");
exit(0);
}

memset(&serverAddr, 0, sizeof(serverAddr));
serverAddr.sin_family = AF_INET;
serverAddr.sin_addr.s_addr = inet_addr("127.0.0.1");
serverAddr.sin_port = htons(4000);
if (bind(sock_fd, (struct sockaddr*)&serverAddr, sizeof(serverAddr)) < 0) {
perror("bind error\n");
close(sock_fd);
exit(1);
}

printf("server is waiting to receive data...\n");
msg.msg_name = &fromAddr;

/*
* I add 16 to sizeof(fromAddr), ie 32,
* and pay attention to the definition of fromAddr,
* recvmsg() will overwrite sock_fd,
* since kernel will copy 32 bytes to userspace.
*
* If you just use sizeof(fromAddr), it works fine.
* */
msg.msg_namelen = sizeof(fromAddr) + 16;
/* msg.msg_namelen = sizeof(fromAddr); */
msg.msg_iov = &iov;
msg.msg_iovlen = 1;
msg.msg_iov->iov_base = recvBuffer;
msg.msg_iov->iov_len = 128;
msg.msg_control = 0;
msg.msg_controllen = 0;
msg.msg_flags = 0;

while (1) {
printf("old socket fd=%d\n", sock_fd);
if (recvmsg(sock_fd, &msg, 0) == -1) {
perror("recvmsg() error\n");
close(sock_fd);
exit(1);
}
printf("server received data from client:%s\n", recvBuffer);
printf("msg.msg_namelen=%d\n", msg.msg_namelen);
printf("new socket fd=%d\n", sock_fd);
strcat(recvBuffer, "--data from server");
if (sendmsg(sock_fd, &msg, 0) == -1) {
perror("sendmsg()\n");
close(sock_fd);
exit(1);
}
}

close(sock_fd);
return 0;
}

Signed-off-by: Weiping Pan <wp...@redhat.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf: Adjusted to apply to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/rds/recv.c | 3 +++

1 file changed, 3 insertions(+)

diff --git a/net/rds/recv.c b/net/rds/recv.c
index 6a2654a..c45a881c 100644
--- a/net/rds/recv.c
+++ b/net/rds/recv.c
@@ -410,6 +410,8 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,

rdsdebug("size %zu flags 0x%x timeo %ld\n", size, msg_flags, timeo);

+ msg->msg_namelen = 0;
+
if (msg_flags & MSG_OOB)
goto out;

@@ -486,6 +488,7 @@ int rds_recvmsg(struct kiocb *iocb, struct socket *sock, struct msghdr *msg,
sin->sin_port = inc->i_hdr.h_sport;
sin->sin_addr.s_addr = inc->i_saddr;
memset(sin->sin_zero, 0, sizeof(sin->sin_zero));
+ msg->msg_namelen = sizeof(*sin);
}
break;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Cong Wang <xiyou.w...@gmail.com>

[ Upstream commit c9be4a5c49cf51cc70a993f004c5bb30067a65ce ]

A regression is introduced by the following commit:

commit 4d52cfbef6266092d535237ba5a4b981458ab171
Author: Eric Dumazet <eric.d...@gmail.com>
Date: Tue Jun 2 00:42:16 2009 -0700

net: ipv4/ip_sockglue.c cleanups

Pure cleanups

but it is not a pure cleanup...

- if (val != -1 && (val < 1 || val>255))
+ if (val != -1 && (val < 0 || val > 255))

Since there is no reason provided to allow ttl=0, change it back.

Reported-by: nitin padalia <padali...@gmail.com>
Cc: nitin padalia <padali...@gmail.com>
Cc: Eric Dumazet <eric.d...@gmail.com>
Cc: David S. Miller <da...@davemloft.net>
Signed-off-by: Cong Wang <xiyou.w...@gmail.com>
Acked-by: Eric Dumazet <edum...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/ip_sockglue.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c
index e982b5c..184a7ad 100644
--- a/net/ipv4/ip_sockglue.c
+++ b/net/ipv4/ip_sockglue.c
@@ -563,7 +563,7 @@ static int do_ip_setsockopt(struct sock *sk, int level,
case IP_TTL:
if (optlen < 1)
goto e_inval;
- if (val != -1 && (val < 0 || val > 255))
+ if (val != -1 && (val < 1 || val > 255))
goto e_inval;
inet->uc_ttl = val;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit 3592aaeb80290bda0f2cf0b5456c97bfc638b192 ]

The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128

bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Arnaldo Carvalho de Melo <ac...@ghostprotocols.net>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/llc/af_llc.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 2da8d14..606b6ad 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -912,14 +912,13 @@ static int llc_ui_getname(struct socket *sock, struct sockaddr *uaddr,
struct sockaddr_llc sllc;
struct sock *sk = sock->sk;
struct llc_sock *llc = llc_sk(sk);
- int rc = 0;
+ int rc = -EBADF;

memset(&sllc, 0, sizeof(sllc));
lock_sock(sk);
if (sock_flag(sk, SOCK_ZAPPED))
goto out;
*uaddrlen = sizeof(sllc);
- memset(uaddr, 0, *uaddrlen);
if (peer) {
rc = -ENOTCONN;
if (sk->sk_state != TCP_ESTABLISHED)

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Bernd Schubert <bernd.s...@itwm.fraunhofer.de>

commit 6a08f447facb4f9e29fcc30fb68060bb5a0d21c2 upstream.

ext4_special_inode_operations have their own ifdef CONFIG_EXT4_FS_XATTR
to mask those methods. And ext4_iget also always sets it, so there is
an inconsistency.

Signed-off-by: Bernd Schubert <bernd.s...@itwm.fraunhofer.de>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/namei.c | 2 --
1 file changed, 2 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 902f69b..828c9c9 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1828,9 +1828,7 @@ retry:
err = PTR_ERR(inode);
if (!IS_ERR(inode)) {
init_special_inode(inode, inode->i_mode, rdev);
-#ifdef CONFIG_EXT4_FS_XATTR
inode->i_op = &ext4_special_inode_operations;
-#endif
err = ext4_add_nondir(handle, dentry, inode);
}
ext4_journal_stop(handle);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit fe685aabf7c8c9f138e5ea900954d295bf229175 upstream.

For type 1 the parent_offset member in struct isofs_fid gets copied
uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Signed-off-by: Jan Kara <ja...@suse.cz>
Cc: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/isofs/export.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/fs/isofs/export.c b/fs/isofs/export.c
index e81a305..caec670 100644
--- a/fs/isofs/export.c
+++ b/fs/isofs/export.c
@@ -131,6 +131,7 @@ isofs_export_encode_fh(struct dentry *dentry,
len = 3;
fh32[0] = ei->i_iget5_block;
fh16[2] = (__u16)ei->i_iget5_offset; /* fh16 [sic] */
+ fh16[3] = 0; /* avoid leaking uninitialized data */
fh32[2] = inode->i_generation;
if (connectable && !S_ISDIR(inode->i_mode)) {
struct inode *parent;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit 0143fc5e9f6f5aad4764801015bc8d4b4a278200 upstream.

For type 0x51 the udf.parent_partref member in struct fid gets copied

uninitialized to userland. Fix this by initializing it to 0.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Signed-off-by: Jan Kara <ja...@suse.cz>
Cc: Ben Hutchings <b...@decadent.org.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/udf/namei.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/fs/udf/namei.c b/fs/udf/namei.c
index 21dad8c..b754151 100644
--- a/fs/udf/namei.c
+++ b/fs/udf/namei.c
@@ -1331,6 +1331,7 @@ static int udf_encode_fh(struct dentry *de, __u32 *fh, int *lenp,
*lenp = 3;
fid->udf.block = location.logicalBlockNum;
fid->udf.partref = location.partitionReferenceNum;
+ fid->udf.parent_partref = 0;
fid->udf.generation = inode->i_generation;

if (connectable && !S_ISDIR(inode->i_mode)) {

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jesper Dangaard Brouer <bro...@redhat.com>

commit 8f363b77ee4fbf7c3bbcf5ec2c5ca482d396d664 upstream

Reading TCP stats when using TCP Illinois congestion control algorithm
can cause a divide by zero kernel oops.

The division by zero occur in tcp_illinois_info() at:
do_div(t, ca->cnt_rtt);
where ca->cnt_rtt can become zero (when rtt_reset is called)

Steps to Reproduce:
1. Register tcp_illinois:
# sysctl -w net.ipv4.tcp_congestion_control=illinois
2. Monitor internal TCP information via command "ss -i"
# watch -d ss -i
3. Establish new TCP conn to machine

Either it fails at the initial conn, or else it needs to wait
for a loss or a reset.

This is only related to reading stats. The function avg_delay() also
performs the same divide, but is guarded with a (ca->cnt_rtt > 0) at its
calling point in update_params(). Thus, simply fix tcp_illinois_info().

Function tcp_illinois_info() / get_info() is called without
socket lock. Thus, eliminate any race condition on ca->cnt_rtt
by using a local stack variable. Simply reuse info.tcpv_rttcnt,
as its already set to ca->cnt_rtt.
Function avg_delay() is not affected by this race condition, as
its called with the socket lock.

Cc: Petr Matousek <pmat...@redhat.com>
Signed-off-by: Jesper Dangaard Brouer <bro...@redhat.com>
Acked-by: Eric Dumazet <edum...@google.com>
Acked-by: Stephen Hemminger <shemm...@vyatta.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/tcp_illinois.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/net/ipv4/tcp_illinois.c b/net/ipv4/tcp_illinois.c
index 1eba160..c35d91f 100644
--- a/net/ipv4/tcp_illinois.c
+++ b/net/ipv4/tcp_illinois.c
@@ -313,11 +313,13 @@ static void tcp_illinois_info(struct sock *sk, u32 ext,
.tcpv_rttcnt = ca->cnt_rtt,
.tcpv_minrtt = ca->base_rtt,
};
- u64 t = ca->sum_rtt;

- do_div(t, ca->cnt_rtt);
- info.tcpv_rtt = t;
+ if (info.tcpv_rttcnt > 0) {
+ u64 t = ca->sum_rtt;

+ do_div(t, info.tcpv_rttcnt);
+ info.tcpv_rtt = t;
+ }
nla_put(skb, INET_DIAG_VEGASINFO, sizeof(info), &info);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

journal_unmap_buffer

From: Jan Kara <ja...@suse.cz>

Delay discarding buffers in journal_unmap_buffer until
we know that "add to orphan" operation has definitely been
committed, otherwise the log space of committing transation
may be freed and reused before truncate get committed, updates
may get lost if crash happens.

This patch is a backport of JBD2 fix by dingdinghua <dingd...@nrchpc.ac.cn>.

Signed-off-by: Jan Kara <ja...@suse.cz>
(cherry picked from commit 86963918965eb8fe0c8ae009e7c1b4c630f533d5)

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/jbd/commit.c | 10 +++++-----
fs/jbd/transaction.c | 43 +++++++++++++++++++++++++++++++------------
2 files changed, 36 insertions(+), 17 deletions(-)

diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index 17d29a8..2a5cdd0 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -867,12 +867,12 @@ restart_loop:
/* A buffer which has been freed while still being
* journaled by a previous transaction may end up still
* being dirty here, but we want to avoid writing back
- * that buffer in the future now that the last use has
- * been committed. That's not only a performance gain,
- * it also stops aliasing problems if the buffer is left
- * behind for writeback and gets reallocated for another
+ * that buffer in the future after the "add to orphan"
+ * operation been committed, That's not only a performance
+ * gain, it also stops aliasing problems if the buffer is
+ * left behind for writeback and gets reallocated for another
* use in a different page. */
- if (buffer_freed(bh)) {
+ if (buffer_freed(bh) && !jh->b_next_transaction) {
clear_buffer_freed(bh);
clear_buffer_jbddirty(bh);
}
diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c
index 006f9ad..99e9fea 100644
--- a/fs/jbd/transaction.c
+++ b/fs/jbd/transaction.c
@@ -1864,6 +1864,21 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
if (!jh)
goto zap_buffer_no_jh;

+ /*
+ * We cannot remove the buffer from checkpoint lists until the
+ * transaction adding inode to orphan list (let's call it T)
+ * is committed. Otherwise if the transaction changing the
+ * buffer would be cleaned from the journal before T is
+ * committed, a crash will cause that the correct contents of
+ * the buffer will be lost. On the other hand we have to
+ * clear the buffer dirty bit at latest at the moment when the
+ * transaction marking the buffer as freed in the filesystem
+ * structures is committed because from that moment on the
+ * buffer can be reallocated and used by a different page.
+ * Since the block hasn't been freed yet but the inode has
+ * already been added to orphan list, it is safe for us to add
+ * the buffer to BJ_Forget list of the newest transaction.
+ */
transaction = jh->b_transaction;
if (transaction == NULL) {
/* First case: not on any transaction. If it
@@ -1929,16 +1944,15 @@ static int journal_unmap_buffer(journal_t *journal, struct buffer_head *bh)
goto zap_buffer;
}
/*
- * If it is committing, we simply cannot touch it. We
- * can remove it's next_transaction pointer from the
- * running transaction if that is set, but nothing
- * else. */
+ * The buffer is committing, we simply cannot touch
+ * it. So we just set j_next_transaction to the
+ * running transaction (if there is one) and mark
+ * buffer as freed so that commit code knows it should
+ * clear dirty bits when it is done with the buffer.
+ */
set_buffer_freed(bh);
- if (jh->b_next_transaction) {
- J_ASSERT(jh->b_next_transaction ==
- journal->j_running_transaction);
- jh->b_next_transaction = NULL;
- }
+ if (journal->j_running_transaction && buffer_jbddirty(bh))
+ jh->b_next_transaction = journal->j_running_transaction;
journal_put_journal_head(jh);
spin_unlock(&journal->j_list_lock);
jbd_unlock_bh_state(bh);
@@ -2120,7 +2134,7 @@ void journal_file_buffer(struct journal_head *jh,
*/
void __journal_refile_buffer(struct journal_head *jh)
{
- int was_dirty;
+ int was_dirty, jlist;
struct buffer_head *bh = jh2bh(jh);

J_ASSERT_JH(jh, jbd_is_locked_bh_state(bh));
@@ -2142,8 +2156,13 @@ void __journal_refile_buffer(struct journal_head *jh)
__journal_temp_unlink_buffer(jh);
jh->b_transaction = jh->b_next_transaction;
jh->b_next_transaction = NULL;
- __journal_file_buffer(jh, jh->b_transaction,
- jh->b_modified ? BJ_Metadata : BJ_Reserved);
+ if (buffer_freed(bh))
+ jlist = BJ_Forget;
+ else if (jh->b_modified)
+ jlist = BJ_Metadata;
+ else
+ jlist = BJ_Reserved;
+ __journal_file_buffer(jh, jh->b_transaction, jlist);
J_ASSERT_JH(jh, jh->b_transaction->t_state == T_RUNNING);

if (was_dirty)

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Simon Horman <ho...@verge.net.au>

Attempt at allowing LVS to transmit skbs of greater than MTU length that
have been aggregated by GRO and can thus be deaggregated by GSO.

Cc: Julian Anastasov <j...@ssi.bg>
Cc: Herbert Xu <her...@gondor.apana.org.au>
Signed-off-by: Simon Horman <ho...@verge.net.au>
(cherry picked from commit 8f1b03a4c18e8f3f0801447b62330faa8ed3bb37)

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/netfilter/ipvs/ip_vs_xmit.c | 21 +++++++++++----------
1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/net/netfilter/ipvs/ip_vs_xmit.c b/net/netfilter/ipvs/ip_vs_xmit.c
index 30b3189..dd7da3c 100644
--- a/net/netfilter/ipvs/ip_vs_xmit.c
+++ b/net/netfilter/ipvs/ip_vs_xmit.c
@@ -245,7 +245,8 @@ ip_vs_bypass_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+ if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) &&
+ !skb_is_gso(skb)) {
ip_rt_put(rt);
icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -309,7 +310,7 @@ ip_vs_bypass_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu) {
+ if (skb->len > mtu && !skb_is_gso(skb)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -376,7 +377,7 @@ ip_vs_nat_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF))) {
+ if ((skb->len > mtu) && (iph->frag_off & htons(IP_DF)) && !skb_is_gso(skb)) {
ip_rt_put(rt);
icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
IP_VS_DBG_RL_PKT(0, pp, skb, 0, "ip_vs_nat_xmit(): frag needed for");
@@ -452,7 +453,7 @@ ip_vs_nat_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu) {
+ if (skb->len > mtu && !skb_is_gso(skb)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL_PKT(0, pp, skb, 0,
@@ -561,8 +562,8 @@ ip_vs_tunnel_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,

df |= (old_iph->frag_off & htons(IP_DF));

- if ((old_iph->frag_off & htons(IP_DF))
- && mtu < ntohs(old_iph->tot_len)) {
+ if ((old_iph->frag_off & htons(IP_DF) &&
+ mtu < ntohs(old_iph->tot_len) && !skb_is_gso(skb))) {
icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
ip_rt_put(rt);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -671,7 +672,7 @@ ip_vs_tunnel_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,
if (skb_dst(skb))
skb_dst(skb)->ops->update_pmtu(skb_dst(skb), mtu);

- if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr)) {
+ if (mtu < ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) && !skb_is_gso(skb)) {
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
dst_release(&rt->u.dst);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -760,7 +761,7 @@ ip_vs_dr_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu) {
+ if ((iph->frag_off & htons(IP_DF)) && skb->len > mtu && !skb_is_gso(skb)) {
icmp_send(skb, ICMP_DEST_UNREACH,ICMP_FRAG_NEEDED, htonl(mtu));
ip_rt_put(rt);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -888,7 +889,7 @@ ip_vs_icmp_xmit(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF))) {
+ if ((skb->len > mtu) && (ip_hdr(skb)->frag_off & htons(IP_DF)) && !skb_is_gso(skb)) {
ip_rt_put(rt);
icmp_send(skb, ICMP_DEST_UNREACH, ICMP_FRAG_NEEDED, htonl(mtu));
IP_VS_DBG_RL("%s(): frag needed\n", __func__);
@@ -963,7 +964,7 @@ ip_vs_icmp_xmit_v6(struct sk_buff *skb, struct ip_vs_conn *cp,

/* MTU checking */
mtu = dst_mtu(&rt->u.dst);
- if (skb->len > mtu) {
+ if (skb->len > mtu && !skb_is_gso(skb)) {
dst_release(&rt->u.dst);
icmpv6_send(skb, ICMPV6_PKT_TOOBIG, 0, mtu, skb->dev);
IP_VS_DBG_RL("%s(): frag needed\n", __func__);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

available

From: Larry Finger <Larry....@lwfinger.net>

commit 2d838bb608e2d1f6cb4280e76748cb812dc822e7 upstream.

When b43legacy is loaded without the firmware being available, a following
unload generates a kernel NULL pointer dereference BUG as follows:

[ 214.330789] BUG: unable to handle kernel NULL pointer dereference at 0000004c
[ 214.330997] IP: [<c104c395>] drain_workqueue+0x15/0x170
[ 214.331179] *pde = 00000000
[ 214.331311] Oops: 0000 [#1] SMP
[ 214.331471] Modules linked in: b43legacy(-) ssb pcmcia mac80211 cfg80211 af_packet mperf arc4 ppdev sr_mod cdrom sg shpchp yenta_socket pcmcia_rsrc pci_hotplug pcmcia_core battery parport_pc parport floppy container ac button edd autofs4 ohci_hcd ehci_hcd usbcore usb_common thermal processor scsi_dh_rdac scsi_dh_hp_sw scsi_dh_emc scsi_dh_alua scsi_dh fan thermal_sys hwmon ata_generic pata_ali libata [last unloaded: cfg80211]
[ 214.333421] Pid: 3639, comm: modprobe Not tainted 3.6.0-rc6-wl+ #163 Source Technology VIC 9921/ALI Based Notebook
[ 214.333580] EIP: 0060:[<c104c395>] EFLAGS: 00010246 CPU: 0
[ 214.333687] EIP is at drain_workqueue+0x15/0x170
[ 214.333788] EAX: c162ac40 EBX: cdfb8360 ECX: 0000002a EDX: 00002a2a
[ 214.333890] ESI: 00000000 EDI: 00000000 EBP: cd767e7c ESP: cd767e5c
[ 214.333957] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
[ 214.333957] CR0: 8005003b CR2: 0000004c CR3: 0c96a000 CR4: 00000090
[ 214.333957] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 214.333957] DR6: ffff0ff0 DR7: 00000400
[ 214.333957] Process modprobe (pid: 3639, ti=cd766000 task=cf802e90 task.ti=cd766000)
[ 214.333957] Stack:
[ 214.333957] 00000292 cd767e74 c12c5e09 00000296 00000296 cdfb8360 cdfb9220 00000000
[ 214.333957] cd767e90 c104c4fd cdfb8360 cdfb9220 cd682800 cd767ea4 d0c10184 cd682800
[ 214.333957] cd767ea4 cba31064 cd767eb8 d0867908 cba31064 d087e09c cd96f034 cd767ec4
[ 214.333957] Call Trace:
[ 214.333957] [<c12c5e09>] ? skb_dequeue+0x49/0x60
[ 214.333957] [<c104c4fd>] destroy_workqueue+0xd/0x150
[ 214.333957] [<d0c10184>] ieee80211_unregister_hw+0xc4/0x100 [mac80211]
[ 214.333957] [<d0867908>] b43legacy_remove+0x78/0x80 [b43legacy]
[ 214.333957] [<d083654d>] ssb_device_remove+0x1d/0x30 [ssb]
[ 214.333957] [<c126f15a>] __device_release_driver+0x5a/0xb0
[ 214.333957] [<c126fb07>] driver_detach+0x87/0x90
[ 214.333957] [<c126ef4c>] bus_remove_driver+0x6c/0xe0
[ 214.333957] [<c1270120>] driver_unregister+0x40/0x70
[ 214.333957] [<d083686b>] ssb_driver_unregister+0xb/0x10 [ssb]
[ 214.333957] [<d087c488>] b43legacy_exit+0xd/0xf [b43legacy]
[ 214.333957] [<c1089dde>] sys_delete_module+0x14e/0x2b0
[ 214.333957] [<c110a4a7>] ? vfs_write+0xf7/0x150
[ 214.333957] [<c1240050>] ? tty_write_lock+0x50/0x50
[ 214.333957] [<c110a6f8>] ? sys_write+0x38/0x70
[ 214.333957] [<c1397c55>] syscall_call+0x7/0xb
[ 214.333957] Code: bc 27 00 00 00 00 a1 74 61 56 c1 55 89 e5 e8 a3 fc ff ff 5d c3 90 55 89 e5 57 56 89 c6 53 b8 40 ac 62 c1 83 ec 14 e8 bb b7 34 00 <8b> 46 4c 8d 50 01 85 c0 89 56 4c 75 03 83 0e 40 80 05 40 ac 62
[ 214.333957] EIP: [<c104c395>] drain_workqueue+0x15/0x170 SS:ESP 0068:cd767e5c
[ 214.333957] CR2: 000000000000004c
[ 214.341110] ---[ end trace c7e90ec026d875a6 ]---Index: wireless-testing/drivers/net/wireless/b43legacy/main.c

The problem is fixed by making certain that the ucode pointer is not NULL
before deregistering the driver in mac80211.

Signed-off-by: Larry Finger <Larry....@lwfinger.net>
Signed-off-by: John W. Linville <linv...@tuxdriver.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/wireless/b43legacy/main.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/net/wireless/b43legacy/main.c b/drivers/net/wireless/b43legacy/main.c
index c3968fad..fc0fc85 100644
--- a/drivers/net/wireless/b43legacy/main.c
+++ b/drivers/net/wireless/b43legacy/main.c
@@ -3870,6 +3870,8 @@ static void b43legacy_remove(struct ssb_device *dev)
cancel_work_sync(&wldev->restart_work);

B43legacy_WARN_ON(!wl);
+ if (!wldev->fw.ucode)
+ return; /* NULL if fw never loaded */
if (wl->current_dev == wldev)
ieee80211_unregister_hw(wl->hw);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Lennart Sorensen <lsor...@csclub.uwaterloo.ca>

commit f7bc5051667b74c3861f79eed98c60d5c3b883f7 upstream.

I found a memory leak in sierra_release() (well sierra_probe() I guess)
that looses 8 bytes each time the driver releases a device.

Signed-off-by: Len Sorensen <lsor...@csclub.uwaterloo.ca>
Acked-by: Johan Hovold <jho...@gmail.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/sierra.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index 1b5c9f8..0cbf847 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -925,6 +925,7 @@ static void sierra_release(struct usb_serial *serial)
continue;
kfree(portdata);
}
+ kfree(serial->private);
}

#ifdef CONFIG_PM

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit 9344a972961d1a6d2c04d9008b13617bcb6ec2ef ]

The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack

via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>

Cc: Marcel Holtmann <mar...@holtmann.org>
Cc: Gustavo Padovan <gus...@padovan.org>
Cc: Johan Hedberg <johan....@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/bluetooth/rfcomm/sock.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/bluetooth/rfcomm/sock.c b/net/bluetooth/rfcomm/sock.c
index 1ae3f80..c47b7c4 100644
--- a/net/bluetooth/rfcomm/sock.c
+++ b/net/bluetooth/rfcomm/sock.c
@@ -543,6 +543,7 @@ static int rfcomm_sock_getname(struct socket *sock, struct sockaddr *addr, int *

BT_DBG("sock %p, sk %p", sock, sk);

+ memset(sa, 0, sizeof(*sa));
sa->rc_family = AF_BLUETOOTH;
sa->rc_channel = rfcomm_pi(sk)->channel;
if (peer)

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

llc_ui_recvmsg()

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit c77a4b9cffb6215a15196ec499490d116dfad181 ]

For stream sockets the code misses to update the msg_namelen member
to 0 and therefore makes net/socket.c leak the local, uninitialized
sockaddr_storage variable to userland -- 128 bytes of kernel stack
memory. The msg_namelen update is also missing for datagram sockets
in case the socket is shutting down during receive.

Fix both issues by setting msg_namelen to 0 early. It will be
updated later if we're going to fill the msg_name member.

Signed-off-by: Mathias Krause <min...@googlemail.com>

Cc: Arnaldo Carvalho de Melo <ac...@ghostprotocols.net>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/llc/af_llc.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/net/llc/af_llc.c b/net/llc/af_llc.c
index 606b6ad..8a814a5 100644
--- a/net/llc/af_llc.c
+++ b/net/llc/af_llc.c
@@ -674,6 +674,8 @@ static int llc_ui_recvmsg(struct kiocb *iocb, struct socket *sock,
int target; /* Read at least this many bytes */
long timeo;

+ msg->msg_namelen = 0;
+

lock_sock(sk);
copied = -ENOTCONN;
if (unlikely(sk->sk_type == SOCK_STREAM && sk->sk_state == TCP_LISTEN))

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

bt_sock_recvmsg()

From: Mathias Krause <min...@googlemail.com>

In case the socket is already shutting down, bt_sock_recvmsg() returns
with 0 without updating msg_namelen leading to net/socket.c leaking the

local, uninitialized sockaddr_storage variable to userland -- 128 bytes
of kernel stack memory.

Fix this by moving the msg_namelen assignment in front of the shutdown
test.

Cc: Marcel Holtmann <mar...@holtmann.org>
Cc: Gustavo Padovan <gus...@padovan.org>
Cc: Johan Hedberg <johan....@gmail.com>

Signed-off-by: Mathias Krause <min...@googlemail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf: adjusted to apply to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/bluetooth/af_bluetooth.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/bluetooth/af_bluetooth.c b/net/bluetooth/af_bluetooth.c
index 8cfb5a8..d7239dd 100644
--- a/net/bluetooth/af_bluetooth.c
+++ b/net/bluetooth/af_bluetooth.c
@@ -240,14 +240,14 @@ int bt_sock_recvmsg(struct kiocb *iocb, struct socket *sock,
if (flags & (MSG_OOB))
return -EOPNOTSUPP;

+ msg->msg_namelen = 0;
+

if (!(skb = skb_recv_datagram(sk, flags, noblock, &err))) {
if (sk->sk_shutdown & RCV_SHUTDOWN)
return 0;
return err;
}

- msg->msg_namelen = 0;
-
copied = skb->len;
if (len < copied) {
msg->msg_flags |= MSG_TRUNC;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Greg Thelen <gth...@google.com>

commit 5f00110f7273f9ff04ac69a5f85bb535a4fd0987 upstream.

The tmpfs remount logic preserves filesystem mempolicy if the mpol=M
option is not specified in the remount request. A new policy can be
specified if mpol=M is given.

Before this patch remounting an mpol bound tmpfs without specifying
mpol= mount option in the remount request would set the filesystem's
mempolicy object to a freed mempolicy object.

To reproduce the problem boot a DEBUG_PAGEALLOC kernel and run:
# mkdir /tmp/x

# mount -t tmpfs -o size=100M,mpol=interleave nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=102400k,mpol=interleave:0-3 0 0

# mount -o remount,size=200M nodev /tmp/x

# grep /tmp/x /proc/mounts
nodev /tmp/x tmpfs rw,relatime,size=204800k,mpol=??? 0 0
# note ? garbage in mpol=... output above

# dd if=/dev/zero of=/tmp/x/f count=1
# panic here

Panic:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Oops: 0010 [#1] SMP DEBUG_PAGEALLOC
Call Trace:
mpol_shared_policy_init+0xa5/0x160
shmem_get_inode+0x209/0x270
shmem_mknod+0x3e/0xf0
shmem_create+0x18/0x20
vfs_create+0xb5/0x130
do_last+0x9a1/0xea0
path_openat+0xb3/0x4d0
do_filp_open+0x42/0xa0
do_sys_open+0xfe/0x1e0
compat_sys_open+0x1b/0x20
cstar_dispatch+0x7/0x1f

Non-debug kernels will not crash immediately because referencing the
dangling mpol will not cause a fault. Instead the filesystem will
reference a freed mempolicy object, which will cause unpredictable
behavior.

The problem boils down to a dropped mpol reference below if
shmem_parse_options() does not allocate a new mpol:

config = *sbinfo
shmem_parse_options(data, &config, true)
mpol_put(sbinfo->mpol)
sbinfo->mpol = config.mpol /* BUG: saves unreferenced mpol */

This patch avoids the crash by not releasing the mempolicy if
shmem_parse_options() doesn't create a new mpol.

How far back does this issue go? I see it in both 2.6.36 and 3.3. I did
not look back further.

Signed-off-by: Greg Thelen <gth...@google.com>
Acked-by: Hugh Dickins <hu...@google.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

mm/shmem.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 3e0005b..e6a0c72 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2242,6 +2242,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
unsigned long inodes;
int error = -EINVAL;

+ config.mpol = NULL;
if (shmem_parse_options(data, &config, true))
return error;

@@ -2269,8 +2270,13 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data)
sbinfo->max_inodes = config.max_inodes;
sbinfo->free_inodes = config.max_inodes - inodes;

- mpol_put(sbinfo->mpol);
- sbinfo->mpol = config.mpol; /* transfers initial ref */
+ /*
+ * Preserve previous mempolicy unless mpol remount option was specified.
+ */
+ if (config.mpol) {
+ mpol_put(sbinfo->mpol);
+ sbinfo->mpol = config.mpol; /* transfers initial ref */
+ }
out:
spin_unlock(&sbinfo->stat_lock);
return error;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit e862f1a9b7df4e8196ebec45ac62295138aa3fc2 upstream.

The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)

before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Signed-off-by: David S. Miller <da...@davemloft.net>

[bwh: Backported to 2.6.32: adjust context, indentation]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/atm/common.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/atm/common.c b/net/atm/common.c
index 6c82d72..65737b8 100644
--- a/net/atm/common.c
+++ b/net/atm/common.c
@@ -751,6 +751,7 @@ int vcc_getsockopt(struct socket *sock, int level, int optname,
if (!vcc->dev ||
!test_bit(ATM_VF_ADDR,&vcc->flags))
return -ENOTCONN;
+ memset(&pvc, 0, sizeof(pvc));
pvc.sap_family = AF_ATMPVC;
pvc.sap_addr.itf = vcc->dev->number;
pvc.sap_addr.vpi = vcc->vpi;

Willy Tarreau

unread,

Jun 4, 2013, 7:00:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jay Estabrook <jay.es...@gmail.com>

commit aa8b4be3ac049c8b1df2a87e4d1d902ccfc1f7a9 upstream.

Fixes a NULL pointer dereference at boot on UP1500.

Reviewed-and-Tested-by: Matt Turner <matt...@gmail.com>
Signed-off-by: Jay Estabrook <jay.es...@gmail.com>
Signed-off-by: Matt Turner <matt...@gmail.com>
Signed-off-by: Michael Cree <mc...@orcon.net.nz>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/alpha/kernel/sys_nautilus.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/arch/alpha/kernel/sys_nautilus.c b/arch/alpha/kernel/sys_nautilus.c
index 99c0f46..dc616b3 100644
--- a/arch/alpha/kernel/sys_nautilus.c
+++ b/arch/alpha/kernel/sys_nautilus.c
@@ -189,6 +189,10 @@ nautilus_machine_check(unsigned long vector, unsigned long la_ptr)
extern void free_reserved_mem(void *, void *);
extern void pcibios_claim_one_bus(struct pci_bus *);

+static struct resource irongate_io = {
+ .name = "Irongate PCI IO",
+ .flags = IORESOURCE_IO,
+};
static struct resource irongate_mem = {
.name = "Irongate PCI MEM",
.flags = IORESOURCE_MEM,
@@ -210,6 +214,7 @@ nautilus_init_pci(void)

irongate = pci_get_bus_and_slot(0, 0);
bus->self = irongate;
+ bus->resource[0] = &irongate_io;
bus->resource[1] = &irongate_mem;

pci_bus_size_bridges(bus);

Willy Tarreau

unread,

Jun 4, 2013, 7:00:05 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Paul Moore <pmo...@redhat.com>

[ Upstream commit ded34e0fe8fe8c2d595bfa30626654e4b87621e0 ]

As reported by Jan, and others over the past few years, there is a
race condition caused by unix_release setting the sock->sk pointer
to NULL before properly marking the socket as dead/orphaned. This
can cause a problem with the LSM hook security_unix_may_send() if
there is another socket attempting to write to this partially
released socket in between when sock->sk is set to NULL and it is
marked as dead/orphaned. This patch fixes this by only setting
sock->sk to NULL after the socket has been marked as dead; I also
take the opportunity to make unix_release_sock() a void function
as it only ever returned 0/success.

Dave, I think this one should go on the -stable pile.

Special thanks to Jan for coming up with a reproducer for this
problem.

Reported-by: Jan Stancek <jan.s...@gmail.com>
Signed-off-by: Paul Moore <pmo...@redhat.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/unix/af_unix.c | 7 +++----
1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index db8d51a..d146b76 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -370,7 +370,7 @@ static void unix_sock_destructor(struct sock *sk)
#endif
}

-static int unix_release_sock(struct sock *sk, int embrion)
+static void unix_release_sock(struct sock *sk, int embrion)
{
struct unix_sock *u = unix_sk(sk);
struct dentry *dentry;
@@ -445,8 +445,6 @@ static int unix_release_sock(struct sock *sk, int embrion)

if (unix_tot_inflight)
unix_gc(); /* Garbage collect fds */
-
- return 0;
}

static int unix_listen(struct socket *sock, int backlog)
@@ -660,9 +658,10 @@ static int unix_release(struct socket *sock)
if (!sk)
return 0;

+ unix_release_sock(sk, 0);
sock->sk = NULL;

- return unix_release_sock(sk, 0);
+ return 0;
}

static int unix_autobind(struct socket *sock)

Willy Tarreau

unread,

Jun 4, 2013, 7:00:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Wu Fengguang <fenggu...@intel.com>

[ Upstream commit 77f00f6324cb97cf1df6f9c4aaeea6ada23abdb2 ]

Fix a buffer overflow bug by removing the revision and printk.

[ 22.016214] isdnloop-ISDN-driver Rev 1.11.6.7
[ 22.097508] isdnloop: (loop0) virtual card added
[ 22.174400] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: ffffffff83244972
[ 22.174400]
[ 22.436157] Pid: 1, comm: swapper Not tainted 3.5.0-bisect-00018-gfa8bbb1-dirty #129
[ 22.624071] Call Trace:
[ 22.720558] [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[ 22.815248] [<ffffffff8222b623>] panic+0x110/0x329
[ 22.914330] [<ffffffff83244972>] ? isdnloop_init+0xaf/0xb1
[ 23.014800] [<ffffffff832448c3>] ? CallcNew+0x56/0x56
[ 23.090763] [<ffffffff8108e24b>] __stack_chk_fail+0x2b/0x30
[ 23.185748] [<ffffffff83244972>] isdnloop_init+0xaf/0xb1

Signed-off-by: Fengguang Wu <fenggu...@intel.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/isdn/isdnloop/isdnloop.c | 12 ------------
1 file changed, 12 deletions(-)

diff --git a/drivers/isdn/isdnloop/isdnloop.c b/drivers/isdn/isdnloop/isdnloop.c
index a335c85..22446f7 100644
--- a/drivers/isdn/isdnloop/isdnloop.c
+++ b/drivers/isdn/isdnloop/isdnloop.c
@@ -15,7 +15,6 @@
#include <linux/sched.h>
#include "isdnloop.h"

-static char *revision = "$Revision: 1.11.6.7 $";
static char *isdnloop_id = "loop0";

MODULE_DESCRIPTION("ISDN4Linux: Pseudo Driver that simulates an ISDN card");
@@ -1493,17 +1492,6 @@ isdnloop_addcard(char *id1)
static int __init
isdnloop_init(void)
{
- char *p;
- char rev[10];
-
- if ((p = strchr(revision, ':'))) {
- strcpy(rev, p + 1);
- p = strchr(rev, '$');
- *p = 0;
- } else
- strcpy(rev, " ??? ");
- printk(KERN_NOTICE "isdnloop-ISDN-driver Rev%s\n", rev);
-
if (isdnloop_id)
return (isdnloop_addcard(isdnloop_id));

Willy Tarreau

unread,

Jun 4, 2013, 7:10:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

dev_set_alias()

From: Alexey Khoroshilov <khoro...@ispras.ru>

[ Upstream commit 7364e445f62825758fa61195d237a5b8ecdd06ec ]

Do not leak memory by updating pointer with potentially NULL realloc return value.

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoro...@ispras.ru>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/core/dev.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 46e2a29..f4a6e14 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -967,6 +967,8 @@ rollback:
*/
int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
{
+ char *new_ifalias;
+
ASSERT_RTNL();

if (len >= IFALIASZ)
@@ -980,9 +982,10 @@ int dev_set_alias(struct net_device *dev, const char *alias, size_t len)
return 0;
}

- dev->ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
- if (!dev->ifalias)
+ new_ifalias = krealloc(dev->ifalias, len + 1, GFP_KERNEL);
+ if (!new_ifalias)
return -ENOMEM;
+ dev->ifalias = new_ifalias;

strlcpy(dev->ifalias, alias, len+1);
return len;

Willy Tarreau

unread,

Jun 4, 2013, 7:10:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

hidp_setup_hid()

From: Anderson Lizardo <anderson...@openbossa.org>

The length parameter should be sizeof(req->name) - 1 because there is no
guarantee that string provided by userspace will contain the trailing
'\0'.

Can be easily reproduced by manually setting req->name to 128 non-zero
bytes prior to ioctl(HIDPCONNADD) and checking the device name setup on
input subsystem:

$ cat /sys/devices/pnp0/00\:04/tty/ttyS0/hci0/hci0\:1/input8/name
AAAAAA[...]AAAAAAAAf0:af:f0:af:f0:af

("f0:af:f0:af:f0:af" is the device bluetooth address, taken from "phys"
field in struct hid_device due to overflow.)

Cc: sta...@vger.kernel.org
Signed-off-by: Anderson Lizardo <anderson...@openbossa.org>
Acked-by: Marcel Holtmann <mar...@holtmann.org>
Signed-off-by: Gustavo Padovan <gustavo...@collabora.co.uk>

[backported to 2.6.32 jmm]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/bluetooth/hidp/core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/bluetooth/hidp/core.c b/net/bluetooth/hidp/core.c
index 49d8495..0c2c59d 100644
--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -778,7 +778,7 @@ static int hidp_setup_hid(struct hidp_session *session,
hid->version = req->version;
hid->country = req->country;

- strncpy(hid->name, req->name, 128);
+ strncpy(hid->name, req->name, sizeof(req->name) - 1);
strncpy(hid->phys, batostr(&src), 64);
strncpy(hid->uniq, batostr(&dst), 64);

Willy Tarreau

unread,

Jun 4, 2013, 7:10:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

INSN_INTTRIG

From: Ian Abbott <abb...@mev.co.uk>

commit 5d06e3df280bd230e2eadc16372e62818c63e894 upstream.

`parse_insn()` is dereferencing the user-space pointer `insn->data`
directly when handling the `INSN_INTTRIG` comedi instruction. It
shouldn't be using `insn->data` at all; it should be using the separate
`data` pointer passed to the function. Fix it.

Signed-off-by: Ian Abbott <abb...@mev.co.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/staging/comedi/comedi_fops.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/comedi/comedi_fops.c b/drivers/staging/comedi/comedi_fops.c
index b83c76f..193b836 100644
--- a/drivers/staging/comedi/comedi_fops.c
+++ b/drivers/staging/comedi/comedi_fops.c
@@ -809,7 +809,7 @@ static int parse_insn(struct comedi_device *dev, struct comedi_insn *insn,
ret = -EAGAIN;
break;
}
- ret = s->async->inttrig(dev, s, insn->data[0]);
+ ret = s->async->inttrig(dev, s, data[0]);
if (ret >= 0)
ret = 1;
break;

Willy Tarreau

unread,

Jun 4, 2013, 7:10:02 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.d...@gmail.com>

[ This combines upstream commit
2f53384424251c06038ae612e56231b96ab610ee and the follow-on bug fix
commit 35f9c09fe9c72eb8ca2b8e89a593e1c151f28fc2 ]

vmsplice()/splice(pipe, socket) call do_tcp_sendpages() one page at a
time, adding at most 4096 bytes to an skb. (assuming PAGE_SIZE=4096)

The call to tcp_push() at the end of do_tcp_sendpages() forces an
immediate xmit when pipe is not already filled, and tso_fragment() try
to split these skb to MSS multiples.

4096 bytes are usually split in a skb with 2 MSS, and a remaining
sub-mss skb (assuming MTU=1500)

This makes slow start suboptimal because many small frames are sent to
qdisc/driver layers instead of big ones (constrained by cwnd and packets
in flight of course)

In fact, applications using sendmsg() (adding an additional memory copy)
instead of vmsplice()/splice()/sendfile() are a bit faster because of
this anomaly, especially if serving small files in environments with
large initial [c]wnd.

Call tcp_push() only if MSG_MORE is not set in the flags parameter.

This bit is automatically provided by splice() internals but for the
last page, or on all pages if user specified SPLICE_F_MORE splice()
flag.

In some workloads, this can reduce number of sent logical packets by an
order of magnitude, making zero-copy TCP actually faster than
one-copy :)

Reported-by: Tom Herbert <ther...@google.com>
Cc: Nandita Dukkipati <nand...@google.com>
Cc: Neal Cardwell <ncar...@google.com>
Cc: Tom Herbert <ther...@google.com>
Cc: Yuchung Cheng <ych...@google.com>
Cc: H.K. Jerry Chu <hk...@google.com>
Cc: Maciej Żenczykowski <ma...@google.com>
Cc: Mahesh Bandewar <mah...@google.com>
Cc: Ilpo Järvinen <ilpo.j...@helsinki.fi>
Signed-off-by: Eric Dumazet <eric.d...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/splice.c | 5 ++++-
include/linux/socket.h | 2 +-
net/ipv4/tcp.c | 2 +-
net/socket.c | 6 +++---
4 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/splice.c b/fs/splice.c
index bb92b7c5..f5d5a2b 100644
--- a/fs/splice.c
+++ b/fs/splice.c
@@ -30,6 +30,7 @@
#include <linux/syscalls.h>
#include <linux/uio.h>
#include <linux/security.h>
+#include <linux/socket.h>

/*
* Attempt to steal a page from a pipe buffer. This should perhaps go into
@@ -637,7 +638,9 @@ static int pipe_to_sendpage(struct pipe_inode_info *pipe,

ret = buf->ops->confirm(pipe, buf);
if (!ret) {
- more = (sd->flags & SPLICE_F_MORE) || sd->len < sd->total_len;
+ more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0;
+ if (sd->len < sd->total_len)
+ more |= MSG_SENDPAGE_NOTLAST;
if (file->f_op && file->f_op->sendpage)
ret = file->f_op->sendpage(file, buf->page, buf->offset,
sd->len, &pos, more);
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 3273a0c..3124c51 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -246,7 +246,7 @@ struct ucred {
#define MSG_ERRQUEUE 0x2000 /* Fetch message from error queue */
#define MSG_NOSIGNAL 0x4000 /* Do not generate SIGPIPE */
#define MSG_MORE 0x8000 /* Sender will send more */
-
+#define MSG_SENDPAGE_NOTLAST 0x20000 /* sendpage() internal : not the last page */
#define MSG_EOF MSG_FIN

#define MSG_CMSG_CLOEXEC 0x40000000 /* Set close_on_exit for file
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index b9644d8..6232462 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -847,7 +847,7 @@ wait_for_memory:
}

out:
- if (copied)
+ if (copied && !(flags & MSG_SENDPAGE_NOTLAST))
tcp_push(sk, flags, mss_now, tp->nonagle);
return copied;

diff --git a/net/socket.c b/net/socket.c
index d449812..bf9fc68 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -732,9 +732,9 @@ static ssize_t sock_sendpage(struct file *file, struct page *page,

sock = file->private_data;

- flags = !(file->f_flags & O_NONBLOCK) ? 0 : MSG_DONTWAIT;
- if (more)
- flags |= MSG_MORE;
+ flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0;
+ /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */
+ flags |= more;

return kernel_sendpage(sock, page, offset, size, flags);

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>

[ Upstream commit 3e10986d1d698140747fcfc2761ec9cb64c1d582 ]

Its possible to use RAW sockets to get a crash in
tcp_set_keepalive() / sk_reset_timer()

Fix is to make sure socket is a SOCK_STREAM one.

Reported-by: Dave Jones <da...@redhat.com>
Signed-off-by: Eric Dumazet <edum...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/core/sock.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 4538a34..eafa660 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -562,7 +562,8 @@ set_rcvbuf:

case SO_KEEPALIVE:
#ifdef CONFIG_INET
- if (sk->sk_protocol == IPPROTO_TCP)
+ if (sk->sk_protocol == IPPROTO_TCP &&
+ sk->sk_type == SOCK_STREAM)
tcp_set_keepalive(sk, valbool);
#endif
sock_valbool_flag(sk, SOCK_KEEPOPEN, valbool);

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit ef3313e84acbf349caecae942ab3ab731471f1a1 ]

When msg_namelen is non-zero the sockaddr info gets filled out, as
requested, but the code fails to initialize the padding bytes of struct
sockaddr_ax25 inserted by the compiler for alignment. Additionally the
msg_namelen value is updated to sizeof(struct full_sockaddr_ax25) but is
not always filled up to this size.

Both issues lead to the fact that the code will leak uninitialized
kernel stack bytes in net/socket.c.

Fix both issues by initializing the memory with memset(0).

Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Ralf Baechle <ra...@linux-mips.org>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ax25/af_ax25.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/ax25/af_ax25.c b/net/ax25/af_ax25.c
index 1e9f3e42..8613bd1 100644
--- a/net/ax25/af_ax25.c
+++ b/net/ax25/af_ax25.c
@@ -1654,6 +1654,7 @@ static int ax25_recvmsg(struct kiocb *iocb, struct socket *sock,
ax25_address src;
const unsigned char *mac = skb_mac_header(skb);

+ memset(sax, 0, sizeof(struct full_sockaddr_ax25));
ax25_addr_parse(mac + 1, skb->data - mac - 1, &src, NULL,
&digi, NULL, NULL);
sax->sax25_family = AF_AX25;

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jho...@gmail.com>

commit 3eb55cc4ed88eee3b5230f66abcdbd2a91639eda upstream.

The driver set the usb-serial port pointers to NULL on errors in attach,
effectively preventing usb-serial core from decrementing the port ref
counters and releasing the port devices and associated data.

Signed-off-by: Johan Hovold <jho...@gmail.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/mos7840.c | 1 -
1 file changed, 1 deletion(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 9c338ca..c802c77 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2569,7 +2569,6 @@ error:
kfree(mos7840_port->ctrl_buf);
usb_free_urb(mos7840_port->control_urb);
kfree(mos7840_port);
- serial->port[i] = NULL;
}
return status;

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Nicolas Dichtel <nicolas...@6wind.com>

commit 70789d7052239992824628db8133de08dc78e593 upstream

RFC5722 prohibits reassembling fragments when some data overlaps.

Bug spotted by Zhang Zuotao <zuotao...@6wind.com>.

Signed-off-by: Nicolas Dichtel <nicolas...@6wind.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv6/reassembly.c | 74 +++++++++++----------------------------------------
1 file changed, 15 insertions(+), 59 deletions(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index 4d18699..105de22 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -148,16 +148,6 @@ int ip6_frag_match(struct inet_frag_queue *q, void *a)
}
EXPORT_SYMBOL(ip6_frag_match);

-/* Memory Tracking Functions. */
-static inline void frag_kfree_skb(struct netns_frags *nf,
- struct sk_buff *skb, int *work)
-{
- if (work)
- *work -= skb->truesize;
- atomic_sub(skb->truesize, &nf->mem);
- kfree_skb(skb);
-}
-
void ip6_frag_init(struct inet_frag_queue *q, void *a)
{
struct frag_queue *fq = container_of(q, struct frag_queue, q);
@@ -348,58 +338,22 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
prev = next;
}

- /* We found where to put this one. Check for overlap with
- * preceding fragment, and, if needed, align things so that
- * any overlaps are eliminated.
+ /* RFC5722, Section 4:
+ * When reassembling an IPv6 datagram, if
+ * one or more its constituent fragments is determined to be an
+ * overlapping fragment, the entire datagram (and any constituent
+ * fragments, including those not yet received) MUST be silently
+ * discarded.
*/
- if (prev) {
- int i = (FRAG6_CB(prev)->offset + prev->len) - offset;

- if (i > 0) {
- offset += i;
- if (end <= offset)
- goto err;
- if (!pskb_pull(skb, i))
- goto err;
- if (skb->ip_summed != CHECKSUM_UNNECESSARY)
- skb->ip_summed = CHECKSUM_NONE;
- }
- }
+ /* Check for overlap with preceding fragment. */
+ if (prev &&
+ (FRAG6_CB(prev)->offset + prev->len) - offset > 0)
+ goto discard_fq;

- /* Look for overlap with succeeding segments.
- * If we can merge fragments, do it.
- */
- while (next && FRAG6_CB(next)->offset < end) {
- int i = end - FRAG6_CB(next)->offset; /* overlap is 'i' bytes */
-
- if (i < next->len) {
- /* Eat head of the next overlapped fragment
- * and leave the loop. The next ones cannot overlap.
- */
- if (!pskb_pull(next, i))
- goto err;
- FRAG6_CB(next)->offset += i; /* next fragment */
- fq->q.meat -= i;
- if (next->ip_summed != CHECKSUM_UNNECESSARY)
- next->ip_summed = CHECKSUM_NONE;
- break;
- } else {
- struct sk_buff *free_it = next;
-
- /* Old fragment is completely overridden with
- * new one drop it.
- */
- next = next->next;
-
- if (prev)
- prev->next = next;
- else
- fq->q.fragments = next;
-
- fq->q.meat -= free_it->len;
- frag_kfree_skb(fq->q.net, free_it, NULL);
- }
- }
+ /* Look for overlap with succeeding segment. */
+ if (next && FRAG6_CB(next)->offset < end)
+ goto discard_fq;

FRAG6_CB(skb)->offset = offset;

@@ -436,6 +390,8 @@ static int ip6_frag_queue(struct frag_queue *fq, struct sk_buff *skb,
write_unlock(&ip6_frags.lock);
return -1;

+discard_fq:
+ fq_kill(fq);
err:
IP6_INC_STATS(net, ip6_dst_idev(skb_dst(skb)),
IPSTATS_MIB_REASMFAILS);

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jho...@gmail.com>

commit 65a4cdbb170e4ec1a7fa0e94936d47e24a17b0e8 upstream.

Make sure control urb is freed at release.

Signed-off-by: Johan Hovold <jho...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/mos7840.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index 61829b8..9c338ca 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -2636,6 +2636,7 @@ static void mos7840_release(struct usb_serial *serial)
mos7840_port = mos7840_get_port_private(serial->port[i]);
dbg("mos7840_port %d = %p", i, mos7840_port);
if (mos7840_port) {
+ usb_free_urb(mos7840_port->control_urb);
kfree(mos7840_port->ctrl_buf);
kfree(mos7840_port->dr);
kfree(mos7840_port);

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

commit c25463722509fef0ed630b271576a8c9a70236f3 upstream.

When dump_one_policy() returns an error, e.g. because of a too small
buffer to dump the whole xfrm policy, xfrm_policy_netlink() returns
NULL instead of an error pointer. But its caller expects an error
pointer and therefore continues to operate on a NULL skbuff.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Acked-by: Steffen Klassert <steffen....@secunet.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/xfrm/xfrm_user.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index dff20ac..06f42f6 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -1306,6 +1306,7 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
{
struct xfrm_dump_info info;
struct sk_buff *skb;
+ int err;

skb = nlmsg_new(NLMSG_DEFAULT_SIZE, GFP_KERNEL);
if (!skb)
@@ -1316,9 +1317,10 @@ static struct sk_buff *xfrm_policy_netlink(struct sk_buff *in_skb,
info.nlmsg_seq = seq;
info.nlmsg_flags = 0;

- if (dump_one_policy(xp, dir, 0, &info) < 0) {
+ err = dump_one_policy(xp, dir, 0, &info);
+ if (err) {
kfree_skb(skb);
- return NULL;
+ return ERR_PTR(err);
}

return skb;

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

Bug 38862)

From: Lennart Sorensen <lsor...@csclub.uwaterloo.ca>

[ Upstream commit 93a3aa25933461d76141179fc94aa32d5f9d954a ]

The D-Link DGE-530T rev C1 is a re-badged Realtek 8169 named DLG10028C,
unlike the previous revisions which were skge based. It is probably
the same as the discontinued DGE-528T (0x4300) other than the PCI ID.

The PCI ID is 0x1186:0x4302.

Adding it to r8169.c where 0x1186:0x4300 is already found makes the card
be detected and work.

This fixes https://bugzilla.kernel.org/show_bug.cgi?id=38862

Signed-off-by: Len Sorensen <lsor...@csclub.uwaterloo.ca>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
(cherry picked from commit 7106159f8bd33bd5e5b0ea2c87e499117fc22c69)
Cc: Thomas Bork <t...@eisfair.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/r8169.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/drivers/net/r8169.c b/drivers/net/r8169.c
index b22623d..2d89062 100644
--- a/drivers/net/r8169.c
+++ b/drivers/net/r8169.c
@@ -176,6 +176,7 @@ static struct pci_device_id rtl8169_pci_tbl[] = {
{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8168), 0, 0, RTL_CFG_1 },
{ PCI_DEVICE(PCI_VENDOR_ID_REALTEK, 0x8169), 0, 0, RTL_CFG_0 },
{ PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4300), 0, 0, RTL_CFG_0 },
+ { PCI_DEVICE(PCI_VENDOR_ID_DLINK, 0x4302), 0, 0, RTL_CFG_0 },
{ PCI_DEVICE(PCI_VENDOR_ID_AT, 0xc107), 0, 0, RTL_CFG_0 },
{ PCI_DEVICE(0x16ec, 0x0116), 0, 0, RTL_CFG_0 },
{ PCI_VENDOR_ID_LINKSYS, 0x1032,

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

small block size

From: Jan Kara <ja...@suse.cz>

commit 89b1f39eb4189de745fae554b0d614d87c8d5c63 upstream.

For large UDF filesystems with 512-byte blocks the number of necessary
bitmap blocks is larger than 2^16 so s_nr_groups in udf_bitmap overflows
(the number will overflow for filesystems larger than 128 GB with
512-byte blocks). That results in ENOSPC errors despite the filesystem
has plenty of free space.

Fix the problem by changing s_nr_groups' type to 'int'. That is enough
even for filesystems 2^32 blocks (UDF maximum) and 512-byte blocksize.

Reported-and-tested-by: v10l...@myway.de
Signed-off-by: Jan Kara <ja...@suse.cz>
Cc: Jim Trigg <jtr...@spamcop.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/udf/udf_sb.h | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/udf/udf_sb.h b/fs/udf/udf_sb.h
index d113b72..efa82c9 100644
--- a/fs/udf/udf_sb.h
+++ b/fs/udf/udf_sb.h
@@ -78,7 +78,7 @@ struct udf_virtual_data {
struct udf_bitmap {
__u32 s_extLength;
__u32 s_extPosition;
- __u16 s_nr_groups;
+ int s_nr_groups;
struct buffer_head **s_block_bitmap;
};

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: James Bottomley <James.B...@HansenPartnership.com>

USB surprise removal of sr is triggering an oops in
scsi_dispatch_command(). What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone. The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space). However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Cc: sta...@kernel.org
Signed-off-by: James Bottomley <JBott...@Parallels.com>
(cherry picked from commit bfe159a51203c15d23cb3158fffdc25ec4b4dda1)
Cc: Thomas Bork <t...@eisfair.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

block/blk-core.c | 3 +++
block/blk-exec.c | 7 +++++++
drivers/scsi/scsi_lib.c | 2 ++
3 files changed, 12 insertions(+)

diff --git a/block/blk-core.c b/block/blk-core.c
index 00ac586..4058f46 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -865,6 +865,9 @@ struct request *blk_get_request(struct request_queue *q, int rw, gfp_t gfp_mask)
{
struct request *rq;

+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
+ return NULL;
+
BUG_ON(rw != READ && rw != WRITE);

spin_lock_irq(q->queue_lock);
diff --git a/block/blk-exec.c b/block/blk-exec.c
index 49557e9..85bd7b4 100644
--- a/block/blk-exec.c
+++ b/block/blk-exec.c
@@ -50,6 +50,13 @@ void blk_execute_rq_nowait(struct request_queue *q, struct gendisk *bd_disk,
{
int where = at_head ? ELEVATOR_INSERT_FRONT : ELEVATOR_INSERT_BACK;

+ if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags))) {
+ rq->errors = -ENXIO;
+ if (rq->end_io)
+ rq->end_io(rq, rq->errors);
+ return;
+ }
+
rq->rq_disk = bd_disk;
rq->end_io = done;
WARN_ON(irqs_disabled());
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e28f9b0..933f1c5 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -215,6 +215,8 @@ int scsi_execute(struct scsi_device *sdev, const unsigned char *cmd,
int ret = DRIVER_ERROR << 24;

req = blk_get_request(sdev->request_queue, write, __GFP_WAIT);
+ if (!req)
+ return ret;

if (bufflen && blk_rq_map_kern(sdev->request_queue, req,
buffer, bufflen, __GFP_WAIT))

Willy Tarreau

unread,

Jun 4, 2013, 7:10:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

space

From: Peter Korsgaard <jac...@sunsite.dk>

Commit 9e4f5e29 ("FC Pass Thru support") exported a number of header files
in include/scsi to user space, but didn't change the uX types to the
userspace-compatible __uX types. Without that you'll get compile errors
when including them - E.G.:

include/scsi/scsi.h:145: error: expected specifier-qualifier-list before `u8'

Signed-off-by: Peter Korsgaard <jac...@sunsite.dk>
Cc: Boaz Harrosh <bhar...@panasas.com>
Cc: James Smart <james...@emulex.com>
Cc: James Bottomley <James.B...@HansenPartnership.com>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

(cherry picked from commit 083c8c1e60e5c27a277e87dbeb6b89b47937559f)

Cc: Thomas Bork <t...@eisfair.net>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/scsi/scsi.h | 8 ++++----
include/scsi/scsi_netlink.h | 4 ++--
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/include/scsi/scsi.h b/include/scsi/scsi.h
index 34c46ab..b3cffec 100644
--- a/include/scsi/scsi.h
+++ b/include/scsi/scsi.h
@@ -145,10 +145,10 @@ struct scsi_cmnd;

/* defined in T10 SCSI Primary Commands-2 (SPC2) */
struct scsi_varlen_cdb_hdr {
- u8 opcode; /* opcode always == VARIABLE_LENGTH_CMD */
- u8 control;
- u8 misc[5];
- u8 additional_cdb_length; /* total cdb length - 8 */
+ __u8 opcode; /* opcode always == VARIABLE_LENGTH_CMD */
+ __u8 control;
+ __u8 misc[5];
+ __u8 additional_cdb_length; /* total cdb length - 8 */
__be16 service_action;
/* service specific data follows */
};
diff --git a/include/scsi/scsi_netlink.h b/include/scsi/scsi_netlink.h
index 536752c..58ce8fe 100644
--- a/include/scsi/scsi_netlink.h
+++ b/include/scsi/scsi_netlink.h
@@ -105,8 +105,8 @@ struct scsi_nl_host_vendor_msg {
* PCI : ID data is the 16 bit PCI Registered Vendor ID
*/
#define SCSI_NL_VID_TYPE_SHIFT 56
-#define SCSI_NL_VID_TYPE_MASK ((u64)0xFF << SCSI_NL_VID_TYPE_SHIFT)
-#define SCSI_NL_VID_TYPE_PCI ((u64)0x01 << SCSI_NL_VID_TYPE_SHIFT)
+#define SCSI_NL_VID_TYPE_MASK ((__u64)0xFF << SCSI_NL_VID_TYPE_SHIFT)
+#define SCSI_NL_VID_TYPE_PCI ((__u64)0x01 << SCSI_NL_VID_TYPE_SHIFT)
#define SCSI_NL_VID_ID_MASK (~ SCSI_NL_VID_TYPE_MASK)

Willy Tarreau

unread,

Jun 4, 2013, 7:10:03 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit 721e3eba21e43532e438652dd8f1fcdfce3187e7 upstream.

Commit c278531d39 added a warning when ext4_flush_unwritten_io() is
called without i_mutex being taken. It had previously not been taken
during orphan cleanup since races weren't possible at that point in
the mount process, but as a result of this c278531d39, we will now see
a kernel WARN_ON in this case. Take the i_mutex in
ext4_orphan_cleanup() to suppress this warning.

Reported-by: Alexander Beregalov <a.ber...@gmail.com>
Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Reviewed-by: Zheng Liu <wenqi...@taobao.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/super.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/fs/ext4/super.c b/fs/ext4/super.c
index 3ce77c5..108515f 100644
--- a/fs/ext4/super.c
+++ b/fs/ext4/super.c
@@ -1937,7 +1937,9 @@ static void ext4_orphan_cleanup(struct super_block *sb,
__func__, inode->i_ino, inode->i_size);
jbd_debug(2, "truncating inode %lu to %lld bytes\n",
inode->i_ino, inode->i_size);
+ mutex_lock(&inode->i_mutex);
ext4_truncate(inode);
+ mutex_unlock(&inode->i_mutex);
nr_truncates++;
} else {
ext4_msg(sb, KERN_DEBUG,

Willy Tarreau

unread,

Jun 4, 2013, 7:10:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mathias Krause <min...@googlemail.com>

[ Upstream commit f778a636713a435d3a922c60b1622a91136560c1 ]

The memory reserved to dump the xfrm state includes the padding bytes of
struct xfrm_usersa_info added by the compiler for alignment (7 for
amd64, 3 for i386). Add an explicit memset(0) before filling the buffer

to avoid the info leak.

Signed-off-by: Mathias Krause <min...@googlemail.com>
Acked-by: Steffen Klassert <steffen....@secunet.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/xfrm/xfrm_user.c | 1 +

1 file changed, 1 insertion(+)

diff --git a/net/xfrm/xfrm_user.c b/net/xfrm/xfrm_user.c
index b95a2d6..4823a15 100644
--- a/net/xfrm/xfrm_user.c
+++ b/net/xfrm/xfrm_user.c
@@ -506,6 +506,7 @@ out:

static void copy_to_user_state(struct xfrm_state *x, struct xfrm_usersa_info *p)
{
+ memset(p, 0, sizeof(*p));
memcpy(&p->id, &x->id, sizeof(p->id));
memcpy(&p->sel, &x->sel, sizeof(p->sel));
memcpy(&p->lft, &x->lft, sizeof(p->lft));

Willy Tarreau

unread,

Jun 4, 2013, 7:10:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

Linux iSCSI offload

From: Eddie Wai <eddi...@broadcom.com>

commit d6532207116307eb7ecbfa7b9e02c53230096a50 upstream.

This patch fixes the following kernel panic invoked by uninitialized fields
in the chip initialization for the 1G bnx2 iSCSI offload.

One of the bits in the chip initialization is being used by the latest
firmware to control overflow packets. When this control bit gets enabled
erroneously, it would ultimately result in a bad packet placement which would
cause the bnx2 driver to dereference a NULL ptr in the placement handler.

This can happen under certain stress I/O environment under the Linux
iSCSI offload operation.

This change only affects Broadcom's 5709 chipset.

Unable to handle kernel NULL pointer dereference at 0000000000000008 RIP:
[<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
Pid: 0, comm: swapper Tainted: G ---- 2.6.18-333.el5debug #2
RIP: 0010:[<ffffffff881f0e7d>] [<ffffffff881f0e7d>] :bnx2:bnx2_poll_work+0xd0d/0x13c5
RSP: 0018:ffff8101b575bd50 EFLAGS: 00010216
RAX: 0000000000000005 RBX: ffff81007c5fb180 RCX: 0000000000000000
RDX: 0000000000000ffc RSI: 00000000817e8000 RDI: 0000000000000220
RBP: ffff81015bbd7ec0 R08: ffff8100817e9000 R09: 0000000000000000
R10: ffff81007c5fb180 R11: 00000000000000c8 R12: 000000007a25a010
R13: 0000000000000000 R14: 0000000000000005 R15: ffff810159f80558
FS: 0000000000000000(0000) GS:ffff8101afebc240(0000) knlGS:0000000000000000
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000008 CR3: 0000000000201000 CR4: 00000000000006a0
Process swapper (pid: 0, threadinfo ffff8101b5754000, task ffff8101afebd820)
Stack: 000000000000000b ffff810159f80000 0000000000000040 ffff810159f80520
ffff810159f80500 00cf00cf8008e84b ffffc200100939e0 ffff810009035b20
0000502900000000 000000be00000001 ffff8100817e7810 00d08101b575bea8
Call Trace:
<IRQ> [<ffffffff8008e0d0>] show_schedstat+0x1c2/0x25b
[<ffffffff881f1886>] :bnx2:bnx2_poll+0xf6/0x231
[<ffffffff8000c9b9>] net_rx_action+0xac/0x1b1
[<ffffffff800125a0>] __do_softirq+0x89/0x133
[<ffffffff8005e30c>] call_softirq+0x1c/0x28
[<ffffffff8006d5de>] do_softirq+0x2c/0x7d
[<ffffffff8006d46e>] do_IRQ+0xee/0xf7
[<ffffffff8005d625>] ret_from_intr+0x0/0xa
<EOI> [<ffffffff801a5780>] acpi_processor_idle_simple+0x1c5/0x341
[<ffffffff801a573d>] acpi_processor_idle_simple+0x182/0x341
[<ffffffff801a55bb>] acpi_processor_idle_simple+0x0/0x341
[<ffffffff80049560>] cpu_idle+0x95/0xb8
[<ffffffff80078b1c>] start_secondary+0x479/0x488

Signed-off-by: Eddie Wai <eddi...@broadcom.com>
Reviewed-by: Mike Christie <mich...@cs.wisc.edu>
Signed-off-by: James Bottomley <JBott...@Parallels.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/scsi/bnx2i/bnx2i_hwi.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/scsi/bnx2i/bnx2i_hwi.c b/drivers/scsi/bnx2i/bnx2i_hwi.c
index 5c8d763..1ab55d6 100644
--- a/drivers/scsi/bnx2i/bnx2i_hwi.c
+++ b/drivers/scsi/bnx2i/bnx2i_hwi.c
@@ -1156,6 +1156,9 @@ int bnx2i_send_fw_iscsi_init_msg(struct bnx2i_hba *hba)
int rc = 0;
u64 mask64;

+ memset(&iscsi_init, 0x00, sizeof(struct iscsi_kwqe_init1));
+ memset(&iscsi_init2, 0x00, sizeof(struct iscsi_kwqe_init2));
+
bnx2i_adjust_qp_size(hba);

iscsi_init.flags =

Willy Tarreau

unread,

Jun 4, 2013, 7:10:04 PM6/4/13

to

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

unlinks

From: Alan Stern <st...@rowland.harvard.edu>

commit 004c19682884d4f40000ce1ded53f4a1d0b18206 upstream

This patch (as1477) fixes a problem affecting a few types of EHCI
controller. Contrary to what one might expect, these controllers
automatically stop their internal frame counter when no ports are
enabled. Since ehci-hcd currently relies on the frame counter for
determining when it should unlink QHs from the async schedule, those
controllers run into trouble: The frame counter stops and the QHs
never get unlinked.

Some systems have also experienced other problems traced back to
commit b963801164618e25fbdc0cd452ce49c3628b46c8 (USB: ehci-hcd unlink
speedups), which made the original switch from using the system clock
to using the frame counter. It never became clear what the reason was
for these problems, but evidently it is related to use of the frame
counter.

To fix all these problems, this patch more or less reverts that commit
and goes back to using the system clock. But this can't be done
cleanly because other changes have since been made to the scan_async()
subroutine. One of these changes involved the tricky logic that tries
to avoid rescanning QHs that have already been seen when the scanning
loop is restarted, which happens whenever an URB is given back.
Switching back to clock-based unlinks would make this logic even more
complicated.

Therefore the new code doesn't rescan the entire async list whenever a
giveback occurs. Instead it rescans only the current QH and continues
on from there. This requires the use of a separate pointer to keep
track of the next QH to scan, since the current QH may be unlinked
while the scanning is in progress. That new pointer must be global,
so that it can be adjusted forward whenever the _next_ QH gets
unlinked. (uhci-hcd uses this same trick.)

Simplification of the scanning loop removes a level of indentation,
which accounts for the size of the patch. The amount of code changed
is relatively small, and it isn't exactly a reversion of the
b963801164 commit.

This fixes Bugzilla #32432.

Signed-off-by: Alan Stern <st...@rowland.harvard.edu>
CC: <sta...@kernel.org>
Tested-by: Matej Kenda <mate...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>
Signed-off-by: Thomas Bork <t...@eisfair.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/host/ehci-hcd.c | 8 ++---
drivers/usb/host/ehci-q.c | 82 ++++++++++++++++++++++-----------------------
drivers/usb/host/ehci.h | 3 +-
3 files changed, 45 insertions(+), 48 deletions(-)

diff --git a/drivers/usb/host/ehci-hcd.c b/drivers/usb/host/ehci-hcd.c
index 7b2e99c..8d17f780 100644
--- a/drivers/usb/host/ehci-hcd.c
+++ b/drivers/usb/host/ehci-hcd.c
@@ -84,7 +84,8 @@ static const char hcd_name [] = "ehci_hcd";
#define EHCI_IAA_MSECS 10 /* arbitrary */
#define EHCI_IO_JIFFIES (HZ/10) /* io watchdog > irq_thresh */
#define EHCI_ASYNC_JIFFIES (HZ/20) /* async idle timeout */
-#define EHCI_SHRINK_FRAMES 5 /* async qh unlink delay */
+#define EHCI_SHRINK_JIFFIES (DIV_ROUND_UP(HZ, 200) + 1)
+ /* 200-ms async qh unlink delay */

/* Initial IRQ latency: faster than hw default */
static int log2_irq_thresh = 0; // 0 to 6
@@ -139,10 +140,7 @@ timer_action(struct ehci_hcd *ehci, enum ehci_timer_action action)
break;
/* case TIMER_ASYNC_SHRINK: */
default:
- /* add a jiffie since we synch against the
- * 8 KHz uframe counter.
- */
- t = DIV_ROUND_UP(EHCI_SHRINK_FRAMES * HZ, 1000) + 1;
+ t = EHCI_SHRINK_JIFFIES;
break;
}
mod_timer(&ehci->watchdog, t + jiffies);
diff --git a/drivers/usb/host/ehci-q.c b/drivers/usb/host/ehci-q.c
index 0ee5b4b..3b8fa18 100644
--- a/drivers/usb/host/ehci-q.c
+++ b/drivers/usb/host/ehci-q.c
@@ -1204,6 +1204,8 @@ static void start_unlink_async (struct ehci_hcd *ehci, struct ehci_qh *qh)

prev->hw->hw_next = qh->hw->hw_next;
prev->qh_next = qh->qh_next;
+ if (ehci->qh_scan_next == qh)
+ ehci->qh_scan_next = qh->qh_next.qh;
wmb ();

/* If the controller isn't running, we don't have to wait for it */
@@ -1229,53 +1231,49 @@ static void scan_async (struct ehci_hcd *ehci)
struct ehci_qh *qh;
enum ehci_timer_action action = TIMER_IO_WATCHDOG;

- ehci->stamp = ehci_readl(ehci, &ehci->regs->frame_index);
timer_action_done (ehci, TIMER_ASYNC_SHRINK);
-rescan:
stopped = !HC_IS_RUNNING(ehci_to_hcd(ehci)->state);
- qh = ehci->async->qh_next.qh;
- if (likely (qh != NULL)) {
- do {
- /* clean any finished work for this qh */
- if (!list_empty(&qh->qtd_list) && (stopped ||
- qh->stamp != ehci->stamp)) {
- int temp;
-
- /* unlinks could happen here; completion
- * reporting drops the lock. rescan using
- * the latest schedule, but don't rescan
- * qhs we already finished (no looping)
- * unless the controller is stopped.
- */
- qh = qh_get (qh);
- qh->stamp = ehci->stamp;
- temp = qh_completions (ehci, qh);
- if (qh->needs_rescan)
- unlink_async(ehci, qh);
- qh_put (qh);
- if (temp != 0) {
- goto rescan;
- }
- }

- /* unlink idle entries, reducing DMA usage as well
- * as HCD schedule-scanning costs. delay for any qh
- * we just scanned, there's a not-unusual case that it
- * doesn't stay idle for long.
- * (plus, avoids some kind of re-activation race.)
+ ehci->qh_scan_next = ehci->async->qh_next.qh;
+ while (ehci->qh_scan_next) {
+ qh = ehci->qh_scan_next;
+ ehci->qh_scan_next = qh->qh_next.qh;
+ rescan:
+ /* clean any finished work for this qh */
+ if (!list_empty(&qh->qtd_list)) {
+ int temp;
+
+ /*
+ * Unlinks could happen here; completion reporting
+ * drops the lock. That's why ehci->qh_scan_next
+ * always holds the next qh to scan; if the next qh
+ * gets unlinked then ehci->qh_scan_next is adjusted
+ * in start_unlink_async().
*/
- if (list_empty(&qh->qtd_list)
- && qh->qh_state == QH_STATE_LINKED) {
- if (!ehci->reclaim && (stopped ||
- ((ehci->stamp - qh->stamp) & 0x1fff)
- >= EHCI_SHRINK_FRAMES * 8))
- start_unlink_async(ehci, qh);
- else
- action = TIMER_ASYNC_SHRINK;
- }
+ qh = qh_get(qh);
+ temp = qh_completions(ehci, qh);
+ if (qh->needs_rescan)
+ unlink_async(ehci, qh);
+ qh->unlink_time = jiffies + EHCI_SHRINK_JIFFIES;
+ qh_put(qh);
+ if (temp != 0)
+ goto rescan;
+ }

- qh = qh->qh_next.qh;
- } while (qh);
+ /* unlink idle entries, reducing DMA usage as well
+ * as HCD schedule-scanning costs. delay for any qh
+ * we just scanned, there's a not-unusual case that it
+ * doesn't stay idle for long.
+ * (plus, avoids some kind of re-activation race.)
+ */
+ if (list_empty(&qh->qtd_list)
+ && qh->qh_state == QH_STATE_LINKED) {
+ if (!ehci->reclaim && (stopped ||
+ time_after_eq(jiffies, qh->unlink_time)))
+ start_unlink_async(ehci, qh);
+ else
+ action = TIMER_ASYNC_SHRINK;
+ }
}
if (action == TIMER_ASYNC_SHRINK)
timer_action (ehci, TIMER_ASYNC_SHRINK);
diff --git a/drivers/usb/host/ehci.h b/drivers/usb/host/ehci.h
index 5b3ca74..b2b3416 100644
--- a/drivers/usb/host/ehci.h
+++ b/drivers/usb/host/ehci.h
@@ -74,6 +74,7 @@ struct ehci_hcd { /* one per controller */
/* async schedule support */
struct ehci_qh *async;
struct ehci_qh *reclaim;
+ struct ehci_qh *qh_scan_next;
unsigned scanning : 1;

/* periodic schedule support */
@@ -116,7 +117,6 @@ struct ehci_hcd { /* one per controller */
struct timer_list iaa_watchdog;
struct timer_list watchdog;
unsigned long actions;
- unsigned stamp;
unsigned random_frame;
unsigned long next_statechange;
ktime_t last_periodic_enable;
@@ -335,6 +335,7 @@ struct ehci_qh {
struct ehci_qh *reclaim; /* next to reclaim */

struct ehci_hcd *ehci;
+ unsigned long unlink_time;

/*
* Do NOT use atomic operations for QH refcounting. On some CPUs