[PATCH 3.10 288/319] ratelimit: fix bug in time interval by resetting right begin time

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Jeff Mahoney <je...@suse.com>

commit 0a11b9aae49adf1f952427ef1a1d9e793dd6ffb6 upstream.

new_insert_key only makes any sense when it's associated with a
new_insert_ptr, which is initialized to NULL and changed to a
buffer_head when we also initialize new_insert_key. We can key off of
that to avoid the uninitialized warning.

Link: http://lkml.kernel.org/r/5eca5ffb-2155-8df2...@suse.com
Signed-off-by: Jeff Mahoney <je...@suse.com>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Jan Kara <ja...@suse.cz>
Cc: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/reiserfs/ibalance.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/reiserfs/ibalance.c b/fs/reiserfs/ibalance.c
index e1978fd..58cce0c 100644
--- a/fs/reiserfs/ibalance.c
+++ b/fs/reiserfs/ibalance.c
@@ -1082,8 +1082,9 @@ int balance_internal(struct tree_balance *tb, /* tree_balance structure
insert_ptr);
}

- memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);
insert_ptr[0] = new_insert_ptr;
+ if (new_insert_ptr)
+ memcpy(new_insert_key_addr, &new_insert_key, KEY_SIZE);

return order;
}
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Eric Dumazet <edum...@google.com>

commit 346da62cc186c4b4b1ac59f87f4482b47a047388 upstream.

Andrey reported following warning while fuzzing with syzkaller

WARNING: CPU: 1 PID: 21072 at net/dccp/proto.c:83 dccp_set_state+0x229/0x290
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 21072 Comm: syz-executor Not tainted 4.9.0-rc1+ #293
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
ffff88003d4c7738 ffffffff81b474f4 0000000000000003 dffffc0000000000
ffffffff844f8b00 ffff88003d4c7804 ffff88003d4c7800 ffffffff8140c06a
0000000041b58ab3 ffffffff8479ab7d ffffffff8140beae ffffffff8140cd00
Call Trace:
[< inline >] __dump_stack lib/dump_stack.c:15
[<ffffffff81b474f4>] dump_stack+0xb3/0x10f lib/dump_stack.c:51
[<ffffffff8140c06a>] panic+0x1bc/0x39d kernel/panic.c:179
[<ffffffff8111125c>] __warn+0x1cc/0x1f0 kernel/panic.c:542
[<ffffffff8111144c>] warn_slowpath_null+0x2c/0x40 kernel/panic.c:585
[<ffffffff8389e5d9>] dccp_set_state+0x229/0x290 net/dccp/proto.c:83
[<ffffffff838a0aa2>] dccp_close+0x612/0xc10 net/dccp/proto.c:1016
[<ffffffff8316bf1f>] inet_release+0xef/0x1c0 net/ipv4/af_inet.c:415
[<ffffffff82b6e89e>] sock_release+0x8e/0x1d0 net/socket.c:570
[<ffffffff82b6e9f6>] sock_close+0x16/0x20 net/socket.c:1017
[<ffffffff815256ad>] __fput+0x29d/0x720 fs/file_table.c:208
[<ffffffff81525bb5>] ____fput+0x15/0x20 fs/file_table.c:244
[<ffffffff811727d8>] task_work_run+0xf8/0x170 kernel/task_work.c:116
[< inline >] exit_task_work include/linux/task_work.h:21
[<ffffffff8111bc53>] do_exit+0x883/0x2ac0 kernel/exit.c:828
[<ffffffff811221fe>] do_group_exit+0x10e/0x340 kernel/exit.c:931
[<ffffffff81143c94>] get_signal+0x634/0x15a0 kernel/signal.c:2307
[<ffffffff81054aad>] do_signal+0x8d/0x1a30 arch/x86/kernel/signal.c:807
[<ffffffff81003a05>] exit_to_usermode_loop+0xe5/0x130
arch/x86/entry/common.c:156
[< inline >] prepare_exit_to_usermode arch/x86/entry/common.c:190
[<ffffffff81006298>] syscall_return_slowpath+0x1a8/0x1e0
arch/x86/entry/common.c:259
[<ffffffff83fc1a62>] entry_SYSCALL_64_fastpath+0xc0/0xc2
Dumping ftrace buffer:
(ftrace buffer empty)
Kernel Offset: disabled

Fix this the same way we did for TCP in commit 565b7b2d2e63
("tcp: do not send reset to already closed sockets")

Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Andrey Konovalov <andre...@google.com>
Tested-by: Andrey Konovalov <andre...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/dccp/proto.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/net/dccp/proto.c b/net/dccp/proto.c
index 6c7c78b8..cb55fb9 100644
--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -1012,6 +1012,10 @@ void dccp_close(struct sock *sk, long timeout)
__kfree_skb(skb);
}

+ /* If socket has been already reset kill it. */
+ if (sk->sk_state == DCCP_CLOSED)
+ goto adjudge_to_death;
+
if (data_was_unread) {
/* Unread data was tossed, send an appropriate Reset Code */
DCCP_WARN("ABORT with %u bytes unread\n", data_was_unread);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Dmitry Torokhov <dmitry....@gmail.com>

commit b27c0d0c3bf3073e8ae19875eb1d3755c5e8c072 upstream.

"calibrate" attribute does not provide "show" methods and thus we should
not mark it as readable.

Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/input/touchscreen/ili210x.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/input/touchscreen/ili210x.c b/drivers/input/touchscreen/ili210x.c
index 1418bdd..ceaa790 100644
--- a/drivers/input/touchscreen/ili210x.c
+++ b/drivers/input/touchscreen/ili210x.c
@@ -169,7 +169,7 @@ static ssize_t ili210x_calibrate(struct device *dev,

return count;
}
-static DEVICE_ATTR(calibrate, 0644, NULL, ili210x_calibrate);
+static DEVICE_ATTR(calibrate, S_IWUSR, NULL, ili210x_calibrate);

static struct attribute *ili210x_attributes[] = {
&dev_attr_calibrate.attr,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Felipe Balbi <felipe...@linux.intel.com>

commit c7de573471832dff7d31f0c13b0f143d6f017799 upstream.

When using SG lists, we would end up setting
request->actual to:

num_mapped_sgs * (request->length - count)

Let's fix that up by incrementing request->actual
only once.

Reported-by: Brian E Rogers <brian.e...@intel.com>
Signed-off-by: Felipe Balbi <felipe...@linux.intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/dwc3/gadget.c | 19 +++++++++++--------
1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 6e70c88..0dfee61 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1802,14 +1802,6 @@ static int __dwc3_cleanup_done_trbs(struct dwc3 *dwc, struct dwc3_ep *dep,
s_pkt = 1;
}

- /*
- * We assume here we will always receive the entire data block
- * which we should receive. Meaning, if we program RX to
- * receive 4K but we receive only 2K, we assume that's all we
- * should receive and we simply bounce the request back to the
- * gadget driver for further processing.
- */
- req->request.actual += req->request.length - count;
if (s_pkt)
return 1;
if ((event->status & DEPEVT_STATUS_LST) &&
@@ -1829,6 +1821,7 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
struct dwc3_trb *trb;
unsigned int slot;
unsigned int i;
+ int count = 0;
int ret;

do {
@@ -1845,6 +1838,8 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
slot++;
slot %= DWC3_TRB_NUM;
trb = &dep->trb_pool[slot];
+ count += trb->size & DWC3_TRB_SIZE_MASK;
+

ret = __dwc3_cleanup_done_trbs(dwc, dep, req, trb,
event, status);
@@ -1852,6 +1847,14 @@ static int dwc3_cleanup_done_reqs(struct dwc3 *dwc, struct dwc3_ep *dep,
break;
}while (++i < req->request.num_mapped_sgs);

+ /*
+ * We assume here we will always receive the entire data block
+ * which we should receive. Meaning, if we program RX to
+ * receive 4K but we receive only 2K, we assume that's all we
+ * should receive and we simply bounce the request back to the
+ * gadget driver for further processing.
+ */
+ req->request.actual += req->request.length - count;
dwc3_gadget_giveback(dep, req, status);

if (ret)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Steffen Maier <ma...@linux.vnet.ibm.com>

commit 771bf03537ddfa4a4dde62ef9dfbc82e4f77ab20 upstream.

With commit 2c55b750a884b86dea8b4cc5f15e1484cc47a25c
("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
we lost the N_Port-ID where an ELS response comes from.
With commit 7c7dc196814b9e1d5cc254dc579a5fa78ae524f7
("[SCSI] zfcp: Simplify handling of ct and els requests")
we lost the N_Port-ID where a CT response comes from.
It's especially useful if the request SAN trace record
with D_ID was already lost due to trace buffer wrap.

GS uses an open WKA port handle and ELS just a D_ID, and
only for ELS we could get D_ID from QTCB bottom via zfcp_fsf_req.
To cover both cases, add a new field to zfcp_fsf_ct_els
and fill it in on request to use in SAN response trace.
Strictly speaking the D_ID on SAN response is the FC frame's S_ID.
We don't need a field for the other end which is always us.

Signed-off-by: Steffen Maier <ma...@linux.vnet.ibm.com>
Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.")
Fixes: 7c7dc196814b ("[SCSI] zfcp: Simplify handling of ct and els requests")
Reviewed-by: Benjamin Block <bbl...@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <ha...@suse.com>
Signed-off-by: Martin K. Petersen <martin....@oracle.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/s390/scsi/zfcp_dbf.c | 2 +-
drivers/s390/scsi/zfcp_fsf.c | 2 ++
drivers/s390/scsi/zfcp_fsf.h | 4 +++-
3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/s390/scsi/zfcp_dbf.c b/drivers/s390/scsi/zfcp_dbf.c
index 1abe57d..a940602 100644
--- a/drivers/s390/scsi/zfcp_dbf.c
+++ b/drivers/s390/scsi/zfcp_dbf.c
@@ -400,7 +400,7 @@ void zfcp_dbf_san_res(char *tag, struct zfcp_fsf_req *fsf)

length = (u16)(ct_els->resp->length + FC_CT_HDR_LEN);
zfcp_dbf_san(tag, dbf, sg_virt(ct_els->resp), ZFCP_DBF_SAN_RES, length,
- fsf->req_id, 0);
+ fsf->req_id, ct_els->d_id);
}

/**
diff --git a/drivers/s390/scsi/zfcp_fsf.c b/drivers/s390/scsi/zfcp_fsf.c
index 8898139..f246097 100644
--- a/drivers/s390/scsi/zfcp_fsf.c
+++ b/drivers/s390/scsi/zfcp_fsf.c
@@ -1085,6 +1085,7 @@ int zfcp_fsf_send_ct(struct zfcp_fc_wka_port *wka_port,

req->handler = zfcp_fsf_send_ct_handler;
req->qtcb->header.port_handle = wka_port->handle;
+ ct->d_id = wka_port->d_id;
req->data = ct;

zfcp_dbf_san_req("fssct_1", req, wka_port->d_id);
@@ -1188,6 +1189,7 @@ int zfcp_fsf_send_els(struct zfcp_adapter *adapter, u32 d_id,

hton24(req->qtcb->bottom.support.d_id, d_id);
req->handler = zfcp_fsf_send_els_handler;
+ els->d_id = d_id;
req->data = els;

zfcp_dbf_san_req("fssels1", req, d_id);
diff --git a/drivers/s390/scsi/zfcp_fsf.h b/drivers/s390/scsi/zfcp_fsf.h
index 5e795b8..8cad41f 100644
--- a/drivers/s390/scsi/zfcp_fsf.h
+++ b/drivers/s390/scsi/zfcp_fsf.h
@@ -3,7 +3,7 @@
*
* Interface to the FSF support functions.
*
- * Copyright IBM Corp. 2002, 2010
+ * Copyright IBM Corp. 2002, 2015
*/

#ifndef FSF_H
@@ -462,6 +462,7 @@ struct zfcp_blk_drv_data {
* @handler_data: data passed to handler function
* @port: Optional pointer to port for zfcp internal ELS (only test link ADISC)
* @status: used to pass error status to calling function
+ * @d_id: Destination ID of either open WKA port for CT or of D_ID for ELS
*/
struct zfcp_fsf_ct_els {
struct scatterlist *req;
@@ -470,6 +471,7 @@ struct zfcp_fsf_ct_els {
void *handler_data;
struct zfcp_port *port;
int status;
+ u32 d_id;
};

#endif /* FSF_H */
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Vegard Nossum <vegard...@oracle.com>

commit 11749e086b2766cccf6217a527ef5c5604ba069c upstream.

I got this with syzkaller:

==================================================================
BUG: KASAN: null-ptr-deref on address 0000000000000020
Read of size 32 by task syz-executor/22519
CPU: 1 PID: 22519 Comm: syz-executor Not tainted 4.8.0-rc2+ #169
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2
014
0000000000000001 ffff880111a17a00 ffffffff81f9f141 ffff880111a17a90
ffff880111a17c50 ffff880114584a58 ffff880114584a10 ffff880111a17a80
ffffffff8161fe3f ffff880100000000 ffff880118d74a48 ffff880118d74a68
Call Trace:
[<ffffffff81f9f141>] dump_stack+0x83/0xb2
[<ffffffff8161fe3f>] kasan_report_error+0x41f/0x4c0
[<ffffffff8161ff74>] kasan_report+0x34/0x40
[<ffffffff82c84b54>] ? snd_timer_user_read+0x554/0x790
[<ffffffff8161e79e>] check_memory_region+0x13e/0x1a0
[<ffffffff8161e9c1>] kasan_check_read+0x11/0x20
[<ffffffff82c84b54>] snd_timer_user_read+0x554/0x790
[<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
[<ffffffff817d0831>] ? proc_fault_inject_write+0x1c1/0x250
[<ffffffff817d0670>] ? next_tgid+0x2a0/0x2a0
[<ffffffff8127c278>] ? do_group_exit+0x108/0x330
[<ffffffff8174653a>] ? fsnotify+0x72a/0xca0
[<ffffffff81674dfe>] __vfs_read+0x10e/0x550
[<ffffffff82c84600>] ? snd_timer_user_info_compat.isra.5+0x2b0/0x2b0
[<ffffffff81674cf0>] ? do_sendfile+0xc50/0xc50
[<ffffffff81745e10>] ? __fsnotify_update_child_dentry_flags+0x60/0x60
[<ffffffff8143fec6>] ? kcov_ioctl+0x56/0x190
[<ffffffff81e5ada2>] ? common_file_perm+0x2e2/0x380
[<ffffffff81746b0e>] ? __fsnotify_parent+0x5e/0x2b0
[<ffffffff81d93536>] ? security_file_permission+0x86/0x1e0
[<ffffffff816728f5>] ? rw_verify_area+0xe5/0x2b0
[<ffffffff81675355>] vfs_read+0x115/0x330
[<ffffffff81676371>] SyS_read+0xd1/0x1a0
[<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
[<ffffffff82001c2c>] ? __this_cpu_preempt_check+0x1c/0x20
[<ffffffff8150455a>] ? __context_tracking_exit.part.4+0x3a/0x1e0
[<ffffffff816762a0>] ? vfs_write+0x4b0/0x4b0
[<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
[<ffffffff810052fc>] ? syscall_return_slowpath+0x16c/0x1d0
[<ffffffff83c3276a>] entry_SYSCALL64_slow_path+0x25/0x25
==================================================================

There are a couple of problems that I can see:

- ioctl(SNDRV_TIMER_IOCTL_SELECT), which potentially sets
tu->queue/tu->tqueue to NULL on memory allocation failure, so read()
would get a NULL pointer dereference like the above splat

- the same ioctl() can free tu->queue/to->tqueue which means read()
could potentially see (and dereference) the freed pointer

We can fix both by taking the ioctl_lock mutex when dereferencing
->queue/->tqueue, since that's always held over all the ioctl() code.

Just looking at the code I find it likely that there are more problems
here such as tu->qhead pointing outside the buffer if the size is
changed concurrently using SNDRV_TIMER_IOCTL_PARAMS.

[js] unlock in fail paths

Signed-off-by: Vegard Nossum <vegard...@oracle.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

sound/core/timer.c | 4 ++++

1 file changed, 4 insertions(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index 3476895..f5ddc9b 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -1922,19 +1922,23 @@ static ssize_t snd_timer_user_read(struct file *file, char __user *buffer,
if (err < 0)
goto _error;

+ mutex_lock(&tu->ioctl_lock);
if (tu->tread) {
if (copy_to_user(buffer, &tu->tqueue[tu->qhead++],
sizeof(struct snd_timer_tread))) {
+ mutex_unlock(&tu->ioctl_lock);
err = -EFAULT;
goto _error;
}
} else {
if (copy_to_user(buffer, &tu->queue[tu->qhead++],
sizeof(struct snd_timer_read))) {
+ mutex_unlock(&tu->ioctl_lock);
err = -EFAULT;
goto _error;
}
}
+ mutex_unlock(&tu->ioctl_lock);

tu->qhead %= tu->queue_size;

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Vladimir Zapolskiy <vladimir_...@mentor.com>

commit 147b36d5b70c083cc76770c47d60b347e8eaf231 upstream.

Race condition between registering an I2C device driver and
deregistering an I2C adapter device which is assumed to manage that
I2C device may lead to a NULL pointer dereference due to the
uninitialized list head of driver clients.

The root cause of the issue is that the I2C bus may know about the
registered device driver and thus it is matched by bus_for_each_drv(),
but the list of clients is not initialized and commonly it is NULL,
because I2C device drivers define struct i2c_driver as static and
clients field is expected to be initialized by I2C core:

i2c_register_driver() i2c_del_adapter()
driver_register() ...
bus_add_driver() ...
... bus_for_each_drv(..., __process_removed_adapter)
... i2c_do_del_adapter()
... list_for_each_entry_safe(..., &driver->clients, ...)
INIT_LIST_HEAD(&driver->clients);

To solve the problem it is sufficient to do clients list head
initialization before calling driver_register().

The problem was found while using an I2C device driver with a sluggish
registration routine on a bus provided by a physically detachable I2C
master controller, but practically the oops may be reproduced under
the race between arbitraty I2C device driver registration and managing
I2C bus device removal e.g. by unbinding the latter over sysfs:

% echo 21a4000.i2c > /sys/bus/platform/drivers/imx-i2c/unbind
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Internal error: Oops: 17 [#1] SMP ARM
CPU: 2 PID: 533 Comm: sh Not tainted 4.9.0-rc3+ #61
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
task: e5ada400 task.stack: e4936000
PC is at i2c_do_del_adapter+0x20/0xcc
LR is at __process_removed_adapter+0x14/0x1c
Flags: NzCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
Control: 10c5387d Table: 35bd004a DAC: 00000051
Process sh (pid: 533, stack limit = 0xe4936210)
Stack: (0xe4937d28 to 0xe4938000)
Backtrace:
[<c0667be0>] (i2c_do_del_adapter) from [<c0667cc0>] (__process_removed_adapter+0x14/0x1c)
[<c0667cac>] (__process_removed_adapter) from [<c0516998>] (bus_for_each_drv+0x6c/0xa0)
[<c051692c>] (bus_for_each_drv) from [<c06685ec>] (i2c_del_adapter+0xbc/0x284)
[<c0668530>] (i2c_del_adapter) from [<bf0110ec>] (i2c_imx_remove+0x44/0x164 [i2c_imx])
[<bf0110a8>] (i2c_imx_remove [i2c_imx]) from [<c051a838>] (platform_drv_remove+0x2c/0x44)
[<c051a80c>] (platform_drv_remove) from [<c05183d8>] (__device_release_driver+0x90/0x12c)
[<c0518348>] (__device_release_driver) from [<c051849c>] (device_release_driver+0x28/0x34)
[<c0518474>] (device_release_driver) from [<c0517150>] (unbind_store+0x80/0x104)
[<c05170d0>] (unbind_store) from [<c0516520>] (drv_attr_store+0x28/0x34)
[<c05164f8>] (drv_attr_store) from [<c0298acc>] (sysfs_kf_write+0x50/0x54)
[<c0298a7c>] (sysfs_kf_write) from [<c029801c>] (kernfs_fop_write+0x100/0x214)
[<c0297f1c>] (kernfs_fop_write) from [<c0220130>] (__vfs_write+0x34/0x120)
[<c02200fc>] (__vfs_write) from [<c0221088>] (vfs_write+0xa8/0x170)
[<c0220fe0>] (vfs_write) from [<c0221e74>] (SyS_write+0x4c/0xa8)
[<c0221e28>] (SyS_write) from [<c0108a20>] (ret_fast_syscall+0x0/0x1c)

Signed-off-by: Vladimir Zapolskiy <vladimir_...@mentor.com>
Signed-off-by: Wolfram Sang <w...@the-dreams.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/i2c/i2c-core.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/i2c/i2c-core.c b/drivers/i2c/i2c-core.c
index 9d539cb..c0e4143 100644
--- a/drivers/i2c/i2c-core.c
+++ b/drivers/i2c/i2c-core.c
@@ -1323,6 +1323,7 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver)
/* add the driver to the list of i2c drivers in the driver core */
driver->driver.owner = owner;
driver->driver.bus = &i2c_bus_type;
+ INIT_LIST_HEAD(&driver->clients);

/* When registration returns, the driver core
* will have called probe() for all matching-but-unbound devices.
@@ -1341,7 +1342,6 @@ int i2c_register_driver(struct module *owner, struct i2c_driver *driver)

pr_debug("i2c-core: driver [%s] registered\n", driver->driver.name);

- INIT_LIST_HEAD(&driver->clients);
/* Walk the adapters that are already present */
i2c_for_each_dev(driver, __process_new_driver);

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:09 PM2/5/17

to

From: Alex Vesker <va...@mellanox.com>

commit 344bacca8cd811809fc33a249f2738ab757d327f upstream.

This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.

The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.

[18332.714265] ------------[ cut here ]------------
[18332.717775] WARNING: CPU: 6 PID: 3767 at drivers/infiniband/core/verbs.c:280 ib_dealloc_pd+0xff/0x120 [ib_core]
...
[18332.775198] Hardware name: Red Hat KVM, BIOS Bochs 01/01/2011
[18332.779411] 0000000000000000 ffff8800b50dfbb0 ffffffff813fed47 0000000000000000
[18332.784960] 0000000000000000 ffff8800b50dfbf0 ffffffff8109add1 0000011832f58300
[18332.790547] ffff880226a596c0 ffff880032482000 ffff880032482830 ffff880226a59280
[18332.796199] Call Trace:
[18332.798015] [<ffffffff813fed47>] dump_stack+0x63/0x8c
[18332.801831] [<ffffffff8109add1>] __warn+0xd1/0xf0
[18332.805403] [<ffffffff8109aebd>] warn_slowpath_null+0x1d/0x20
[18332.809706] [<ffffffffa025d90f>] ib_dealloc_pd+0xff/0x120 [ib_core]
[18332.814384] [<ffffffffa04f3d7c>] ipoib_transport_dev_cleanup+0xfc/0x1d0 [ib_ipoib]
[18332.820031] [<ffffffffa04ed648>] ipoib_ib_dev_cleanup+0x98/0x110 [ib_ipoib]
[18332.825220] [<ffffffffa04e62c8>] ipoib_dev_cleanup+0x2d8/0x550 [ib_ipoib]
[18332.830290] [<ffffffffa04e656f>] ipoib_uninit+0x2f/0x40 [ib_ipoib]
[18332.834911] [<ffffffff81772a8a>] rollback_registered_many+0x1aa/0x2c0
[18332.839741] [<ffffffff81772bd1>] rollback_registered+0x31/0x40
[18332.844091] [<ffffffff81773b18>] unregister_netdevice_queue+0x48/0x80
[18332.848880] [<ffffffffa04f489b>] ipoib_vlan_delete+0x1fb/0x290 [ib_ipoib]
[18332.853848] [<ffffffffa04df1cd>] delete_child+0x7d/0xf0 [ib_ipoib]
[18332.858474] [<ffffffff81520c08>] dev_attr_store+0x18/0x30
[18332.862510] [<ffffffff8127fe4a>] sysfs_kf_write+0x3a/0x50
[18332.866349] [<ffffffff8127f4e0>] kernfs_fop_write+0x120/0x170
[18332.870471] [<ffffffff81207198>] __vfs_write+0x28/0xe0
[18332.874152] [<ffffffff810e09bf>] ? percpu_down_read+0x1f/0x50
[18332.878274] [<ffffffff81208062>] vfs_write+0xa2/0x1a0
[18332.881896] [<ffffffff812093a6>] SyS_write+0x46/0xa0
[18332.885632] [<ffffffff810039b7>] do_syscall_64+0x57/0xb0
[18332.889709] [<ffffffff81883321>] entry_SYSCALL64_slow_path+0x25/0x25
[18332.894727] ---[ end trace 09ebbe31f831ef17 ]---

Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events")
Signed-off-by: Alex Vesker <va...@mellanox.com>
Signed-off-by: Leon Romanovsky <le...@kernel.org>
Signed-off-by: Doug Ledford <dled...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/infiniband/ulp/ipoib/ipoib_ib.c | 9 +++++++++
1 file changed, 9 insertions(+)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
index 2cfa76f..39168d3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
@@ -979,8 +979,17 @@ static void __ipoib_ib_dev_flush(struct ipoib_dev_priv *priv,
}

if (level == IPOIB_FLUSH_LIGHT) {
+ int oper_up;
ipoib_mark_paths_invalid(dev);
+ /* Set IPoIB operation as down to prevent races between:
+ * the flush flow which leaves MCG and on the fly joins
+ * which can happen during that time. mcast restart task
+ * should deal with join requests we missed.
+ */
+ oper_up = test_and_clear_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
ipoib_mcast_dev_flush(dev);
+ if (oper_up)
+ set_bit(IPOIB_FLAG_OPER_UP, &priv->flags);
}

if (level >= IPOIB_FLUSH_NORMAL)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Dan Carpenter <dan.ca...@oracle.com>

commit f4693b08cc901912a87369c46537b94ed4084ea0 upstream.

We can't assign -EINVAL to a u16.

Fixes: 3948f0e0c999 ('usb: add Freescale QE/CPM USB peripheral controller driver')
Acked-by: Peter Chen <peter...@nxp.com>
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Felipe Balbi <felipe...@linux.intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/gadget/fsl_qe_udc.c | 7 ++-----
1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/usb/gadget/fsl_qe_udc.c b/drivers/usb/gadget/fsl_qe_udc.c
index 9a7ee33..9fd2330 100644
--- a/drivers/usb/gadget/fsl_qe_udc.c
+++ b/drivers/usb/gadget/fsl_qe_udc.c
@@ -1881,11 +1881,8 @@ static int qe_get_frame(struct usb_gadget *gadget)

tmp = in_be16(&udc->usb_param->frame_n);
if (tmp & 0x8000)
- tmp = tmp & 0x07ff;
- else
- tmp = -EINVAL;
-
- return (int)tmp;
+ return tmp & 0x07ff;
+ return -EINVAL;
}

static int fsl_qe_start(struct usb_gadget *gadget,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Peter Zijlstra <pet...@infradead.org>

commit 47933ad41a86a4a9b50bed7c9b9bd2ba242aac63 upstream

A number of situations currently require the heavyweight smp_mb(),
even though there is no need to order prior stores against later
loads. Many architectures have much cheaper ways to handle these
situations, but the Linux kernel currently has no portable way
to make use of them.

This commit therefore supplies smp_load_acquire() and
smp_store_release() to remedy this situation. The new
smp_load_acquire() primitive orders the specified load against
any subsequent reads or writes, while the new smp_store_release()
primitive orders the specifed store against any prior reads or
writes. These primitives allow array-based circular FIFOs to be
implemented without an smp_mb(), and also allow a theoretical
hole in rcu_assign_pointer() to be closed at no additional
expense on most architectures.

In addition, the RCU experience transitioning from explicit
smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
and rcu_assign_pointer(), respectively resulted in substantial
improvements in readability. It therefore seems likely that
replacing other explicit barriers with smp_load_acquire() and
smp_store_release() will provide similar benefits. It appears
that roughly half of the explicit barriers in core kernel code
might be so replaced.

[Changelog by PaulMck]

Reviewed-by: "Paul E. McKenney" <pau...@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <pet...@infradead.org>
Acked-by: Will Deacon <will....@arm.com>
Cc: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Mathieu Desnoyers <mathieu....@polymtl.ca>
Cc: Michael Ellerman <mic...@ellerman.id.au>
Cc: Michael Neuling <mi...@neuling.org>
Cc: Russell King <li...@arm.linux.org.uk>
Cc: Geert Uytterhoeven <ge...@linux-m68k.org>
Cc: Heiko Carstens <heiko.c...@de.ibm.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Martin Schwidefsky <schwi...@de.ibm.com>
Cc: Victor Kaplansky <VIC...@il.ibm.com>
Cc: Tony Luck <tony...@intel.com>
Cc: Oleg Nesterov <ol...@redhat.com>
Link: http://lkml.kernel.org/r/201312131506...@infradead.org
Signed-off-by: Ingo Molnar <mi...@kernel.org>
[wt: only backported to support next patch]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/arm/include/asm/barrier.h | 15 +++++++++++
arch/arm64/include/asm/barrier.h | 50 +++++++++++++++++++++++++++++++++++++
arch/ia64/include/asm/barrier.h | 23 +++++++++++++++++
arch/metag/include/asm/barrier.h | 15 +++++++++++
arch/mips/include/asm/barrier.h | 15 +++++++++++
arch/powerpc/include/asm/barrier.h | 21 +++++++++++++++-
arch/s390/include/asm/barrier.h | 15 +++++++++++
arch/sparc/include/asm/barrier_64.h | 15 +++++++++++
arch/x86/include/asm/barrier.h | 43 ++++++++++++++++++++++++++++++-
include/asm-generic/barrier.h | 15 +++++++++++
include/linux/compiler.h | 9 +++++++
11 files changed, 234 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/barrier.h b/arch/arm/include/asm/barrier.h
index 8dcd9c7..b00ef07 100644
--- a/arch/arm/include/asm/barrier.h
+++ b/arch/arm/include/asm/barrier.h
@@ -59,6 +59,21 @@
#define smp_wmb() dmb()
#endif

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#define read_barrier_depends() do { } while(0)
#define smp_read_barrier_depends() do { } while(0)

diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h
index d4a6333..78e20ba 100644
--- a/arch/arm64/include/asm/barrier.h
+++ b/arch/arm64/include/asm/barrier.h
@@ -35,10 +35,60 @@
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#else
+
#define smp_mb() asm volatile("dmb ish" : : : "memory")
#define smp_rmb() asm volatile("dmb ishld" : : : "memory")
#define smp_wmb() asm volatile("dmb ishst" : : : "memory")
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ switch (sizeof(*p)) { \
+ case 4: \
+ asm volatile ("stlr %w1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
+ case 8: \
+ asm volatile ("stlr %1, %0" \
+ : "=Q" (*p) : "r" (v) : "memory"); \
+ break; \
+ } \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1; \
+ compiletime_assert_atomic_type(*p); \
+ switch (sizeof(*p)) { \
+ case 4: \
+ asm volatile ("ldar %w0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
+ case 8: \
+ asm volatile ("ldar %0, %1" \
+ : "=r" (___p1) : "Q" (*p) : "memory"); \
+ break; \
+ } \
+ ___p1; \
+})
+
#endif

#define read_barrier_depends() do { } while(0)
diff --git a/arch/ia64/include/asm/barrier.h b/arch/ia64/include/asm/barrier.h
index 60576e0..d0a69aa 100644
--- a/arch/ia64/include/asm/barrier.h
+++ b/arch/ia64/include/asm/barrier.h
@@ -45,14 +45,37 @@
# define smp_rmb() rmb()
# define smp_wmb() wmb()
# define smp_read_barrier_depends() read_barrier_depends()
+
#else
+
# define smp_mb() barrier()
# define smp_rmb() barrier()
# define smp_wmb() barrier()
# define smp_read_barrier_depends() do { } while(0)
+
#endif

/*
+ * IA64 GCC turns volatile stores into st.rel and volatile loads into ld.acq no
+ * need for asm trickery!
+ */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
+/*
* XXX check on this ---I suspect what Linus really wants here is
* acquire vs release semantics but we can't discuss this stuff with
* Linus just yet. Grrr...
diff --git a/arch/metag/include/asm/barrier.h b/arch/metag/include/asm/barrier.h
index e355a4c..2d6f0de 100644
--- a/arch/metag/include/asm/barrier.h
+++ b/arch/metag/include/asm/barrier.h
@@ -85,4 +85,19 @@ static inline void fence(void)
#define smp_read_barrier_depends() do { } while (0)
#define set_mb(var, value) do { var = value; smp_mb(); } while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* _ASM_METAG_BARRIER_H */
diff --git a/arch/mips/include/asm/barrier.h b/arch/mips/include/asm/barrier.h
index 314ab55..52c5b61 100644
--- a/arch/mips/include/asm/barrier.h
+++ b/arch/mips/include/asm/barrier.h
@@ -180,4 +180,19 @@
#define nudge_writes() mb()
#endif

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* __ASM_BARRIER_H */
diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h
index ae78225..f89da80 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -45,11 +45,15 @@
# define SMPWMB eieio
#endif

+#define __lwsync() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+
#define smp_mb() mb()
-#define smp_rmb() __asm__ __volatile__ (stringify_in_c(LWSYNC) : : :"memory")
+#define smp_rmb() __lwsync()
#define smp_wmb() __asm__ __volatile__ (stringify_in_c(SMPWMB) : : :"memory")
#define smp_read_barrier_depends() read_barrier_depends()
#else
+#define __lwsync() barrier()
+
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
@@ -65,4 +69,19 @@
#define data_barrier(x) \
asm volatile("twi 0,%0,0; isync" : : "r" (x) : "memory");

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ __lwsync(); \
+ ___p1; \
+})
+
#endif /* _ASM_POWERPC_BARRIER_H */
diff --git a/arch/s390/include/asm/barrier.h b/arch/s390/include/asm/barrier.h
index 16760ee..578680f 100644
--- a/arch/s390/include/asm/barrier.h
+++ b/arch/s390/include/asm/barrier.h
@@ -32,4 +32,19 @@

#define set_mb(var, value) do { var = value; mb(); } while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif /* __ASM_BARRIER_H */
diff --git a/arch/sparc/include/asm/barrier_64.h b/arch/sparc/include/asm/barrier_64.h
index 95d4598..b5aad96 100644
--- a/arch/sparc/include/asm/barrier_64.h
+++ b/arch/sparc/include/asm/barrier_64.h
@@ -53,4 +53,19 @@ do { __asm__ __volatile__("ba,pt %%xcc, 1f\n\t" \

#define smp_read_barrier_depends() do { } while(0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif /* !(__SPARC64_BARRIER_H) */
diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index c6cd358..04a4890 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -92,12 +92,53 @@
#endif
#define smp_read_barrier_depends() read_barrier_depends()
#define set_mb(var, value) do { (void)xchg(&var, value); } while (0)
-#else
+#else /* !SMP */
#define smp_mb() barrier()
#define smp_rmb() barrier()
#define smp_wmb() barrier()
#define smp_read_barrier_depends() do { } while (0)
#define set_mb(var, value) do { var = value; barrier(); } while (0)
+#endif /* SMP */
+
+#if defined(CONFIG_X86_OOSTORE) || defined(CONFIG_X86_PPRO_FENCE)
+
+/*
+ * For either of these options x86 doesn't have a strong TSO memory
+ * model and we should fall back to full barriers.
+ */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
+#else /* regular x86 TSO memory ordering */
+
+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ barrier(); \
+ ___p1; \
+})
+
#endif

/*
diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h
index 639d7a4..01613b3 100644
--- a/include/asm-generic/barrier.h
+++ b/include/asm-generic/barrier.h
@@ -46,5 +46,20 @@
#define read_barrier_depends() do {} while (0)
#define smp_read_barrier_depends() do {} while (0)

+#define smp_store_release(p, v) \
+do { \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ACCESS_ONCE(*p) = (v); \
+} while (0)
+
+#define smp_load_acquire(p) \
+({ \
+ typeof(*p) ___p1 = ACCESS_ONCE(*p); \
+ compiletime_assert_atomic_type(*p); \
+ smp_mb(); \
+ ___p1; \
+})
+
#endif /* !__ASSEMBLY__ */
#endif /* __ASM_GENERIC_BARRIER_H */
diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index a2ce6f8..6977192 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -302,6 +302,11 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define __same_type(a, b) __builtin_types_compatible_p(typeof(a), typeof(b))
#endif

+/* Is this type a native word size -- useful for atomic operations */
+#ifndef __native_word
+# define __native_word(t) (sizeof(t) == sizeof(int) || sizeof(t) == sizeof(long))
+#endif
+
/* Compile time object size, -1 for unknown */
#ifndef __compiletime_object_size
# define __compiletime_object_size(obj) -1
@@ -341,6 +346,10 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
#define compiletime_assert(condition, msg) \
_compiletime_assert(condition, msg, __compiletime_assert_, __LINE__)

+#define compiletime_assert_atomic_type(t) \
+ compiletime_assert(__native_word(t), \
+ "Need native word sized stores/loads for atomicity.")
+
/*
* Prevent the compiler from merging or refetching accesses. The compiler
* is also forbidden from reordering successive instances of ACCESS_ONCE(),
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Alexander Usyskin <alexande...@intel.com>

commit 582ab27a063a506ccb55fc48afcc325342a2deba upstream.

NFC version reply size checked against only header size, not against
full message size. That may lead potentially to uninitialized memory access
in version data.

That leads to warnings when version data is accessed:
drivers/misc/mei/bus-fixup.c: warning: '*((void *)&ver+11)' may be used uninitialized in this function [-Wuninitialized]: => 212:2

Reported in
Build regressions/improvements in v4.9-rc3
https://lkml.org/lkml/2016/10/30/57

[js] the check is in 3.12 only once

Fixes: 59fcd7c63abf (mei: nfc: Initial nfc implementation)
Signed-off-by: Alexander Usyskin <alexande...@intel.com>
Signed-off-by: Tomas Winkler <tomas....@intel.com>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/misc/mei/nfc.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/misc/mei/nfc.c b/drivers/misc/mei/nfc.c
index 4b7ea3f..1f8f856 100644
--- a/drivers/misc/mei/nfc.c
+++ b/drivers/misc/mei/nfc.c
@@ -292,7 +292,7 @@ static int mei_nfc_if_version(struct mei_nfc_dev *ndev)
return -ENOMEM;

bytes_recv = __mei_cl_recv(cl, (u8 *)reply, if_version_length);
- if (bytes_recv < 0 || bytes_recv < sizeof(struct mei_nfc_reply)) {
+ if (bytes_recv < if_version_length) {
dev_err(&dev->pdev->dev, "Could not read IF version\n");
ret = -EIO;
goto err;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Al Viro <vi...@ZenIV.linux.org.uk>

commit e23d4159b109167126e5bcd7f3775c95de7fee47 upstream.

Switching iov_iter fault-in to multipages variants has exposed an old
bug in underlying fault_in_multipages_...(); they break if the range
passed to them wraps around. Normally access_ok() done by callers will
prevent such (and it's a guaranteed EFAULT - ERR_PTR() values fall into
such a range and they should not point to any valid objects).

However, on architectures where userland and kernel live in different
MMU contexts (e.g. s390) access_ok() is a no-op and on those a range
with a wraparound can reach fault_in_multipages_...().

Since any wraparound means EFAULT there, the fix is trivial - turn
those

while (uaddr <= end)
...
into

if (unlikely(uaddr > end))
return -EFAULT;
do
...
while (uaddr <= end);

Reported-by: Jan Stancek <jsta...@redhat.com>
Tested-by: Jan Stancek <jsta...@redhat.com>
Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/pagemap.h | 38 +++++++++++++++++++-------------------
1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index e3dea75..9497527 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -482,56 +482,56 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size)
*/
static inline int fault_in_multipages_writeable(char __user *uaddr, int size)
{
- int ret = 0;
char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
- return ret;
+ return 0;

+ if (unlikely(uaddr > end))
+ return -EFAULT;
/*
* Writing zeroes into userspace here is OK, because we know that if
* the zero gets there, we'll be overwriting it.
*/
- while (uaddr <= end) {
- ret = __put_user(0, uaddr);
- if (ret != 0)
- return ret;
+ do {
+ if (unlikely(__put_user(0, uaddr) != 0))
+ return -EFAULT;
uaddr += PAGE_SIZE;
- }
+ } while (uaddr <= end);

/* Check whether the range spilled into the next page. */
if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK))
- ret = __put_user(0, end);
+ return __put_user(0, end);

- return ret;
+ return 0;
}

static inline int fault_in_multipages_readable(const char __user *uaddr,
int size)
{
volatile char c;
- int ret = 0;
const char __user *end = uaddr + size - 1;

if (unlikely(size == 0))
- return ret;
+ return 0;

- while (uaddr <= end) {
- ret = __get_user(c, uaddr);
- if (ret != 0)
- return ret;
+ if (unlikely(uaddr > end))
+ return -EFAULT;
+
+ do {
+ if (unlikely(__get_user(c, uaddr) != 0))
+ return -EFAULT;
uaddr += PAGE_SIZE;
- }
+ } while (uaddr <= end);

/* Check whether the range spilled into the next page. */
if (((unsigned long)uaddr & PAGE_MASK) ==
((unsigned long)end & PAGE_MASK)) {
- ret = __get_user(c, end);
- (void)c;
+ return __get_user(c, end);
}

- return ret;
+ return 0;
}

int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Long Li <lon...@microsoft.com>

commit 407a3aee6ee2d2cb46d9ba3fc380bc29f35d020c upstream.

The host keeps sending heartbeat packets independent of the
guest responding to them. Even though we respond to the heartbeat messages at
interrupt level, we can have situations where there maybe multiple heartbeat
messages pending that have not been responded to. For instance this occurs when the
VM is paused and the host continues to send the heartbeat messages.
Address this issue by draining and responding to all
the heartbeat messages that maybe pending.

Signed-off-by: Long Li <lon...@microsoft.com>
Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/hv/hv_util.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/hv_util.c b/drivers/hv/hv_util.c
index 64c778f..5f69c83 100644
--- a/drivers/hv/hv_util.c
+++ b/drivers/hv/hv_util.c
@@ -244,10 +244,14 @@ static void heartbeat_onchannelcallback(void *context)
struct heartbeat_msg_data *heartbeat_msg;
u8 *hbeat_txf_buf = util_heartbeat.recv_buffer;

- vmbus_recvpacket(channel, hbeat_txf_buf,
- PAGE_SIZE, &recvlen, &requestid);
+ while (1) {
+
+ vmbus_recvpacket(channel, hbeat_txf_buf,
+ PAGE_SIZE, &recvlen, &requestid);
+
+ if (!recvlen)
+ break;

- if (recvlen > 0) {
icmsghdrp = (struct icmsg_hdr *)&hbeat_txf_buf[
sizeof(struct vmbuspipe_hdr)];

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Erez Shitrit <ere...@mellanox.com>

commit 68c6bcdd8bd00394c234b915ab9b97c74104130c upstream.

The function send_leave sets the member: group->query_id
(group->query_id = ret) after calling the sa_query, but leave_handler
can be executed before the setting and it might delete the group object,
and will get a memory corruption.

Additionally, this patch gets rid of group->query_id variable which is
not used.

Fixes: faec2f7b96b5 ('IB/sa: Track multicast join/leave requests')
Signed-off-by: Erez Shitrit <ere...@mellanox.com>

Signed-off-by: Leon Romanovsky <le...@kernel.org>
Signed-off-by: Doug Ledford <dled...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/infiniband/core/multicast.c | 13 ++-----------
1 file changed, 2 insertions(+), 11 deletions(-)

diff --git a/drivers/infiniband/core/multicast.c b/drivers/infiniband/core/multicast.c
index d2360a8..180d7f4 100644
--- a/drivers/infiniband/core/multicast.c
+++ b/drivers/infiniband/core/multicast.c
@@ -106,7 +106,6 @@ struct mcast_group {
atomic_t refcount;
enum mcast_group_state state;
struct ib_sa_query *query;
- int query_id;
u16 pkey_index;
u8 leave_state;
int retries;
@@ -339,11 +338,7 @@ static int send_join(struct mcast_group *group, struct mcast_member *member)
member->multicast.comp_mask,
3000, GFP_KERNEL, join_handler, group,
&group->query);
- if (ret >= 0) {
- group->query_id = ret;
- ret = 0;
- }
- return ret;
+ return (ret > 0) ? 0 : ret;
}

static int send_leave(struct mcast_group *group, u8 leave_state)
@@ -363,11 +358,7 @@ static int send_leave(struct mcast_group *group, u8 leave_state)
IB_SA_MCMEMBER_REC_JOIN_STATE,
3000, GFP_KERNEL, leave_handler,
group, &group->query);
- if (ret >= 0) {
- group->query_id = ret;
- ret = 0;
- }
- return ret;
+ return (ret > 0) ? 0 : ret;
}

static void join_group(struct mcast_group *group, struct mcast_member *member,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:30:10 PM2/5/17

to

From: Alexey Khoroshilov <khoro...@ispras.ru>

commit 3b7c7e52efda0d4640060de747768360ba70a7c0 upstream.

There is an allocation with GFP_KERNEL flag in mos7840_write(),
while it may be called from interrupt context.

Follow-up for commit 191252837626 ("USB: kobil_sct: fix non-atomic
allocation in write path")

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoro...@ispras.ru>
Signed-off-by: Johan Hovold <jo...@kernel.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/mos7840.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/mos7840.c b/drivers/usb/serial/mos7840.c
index d060130..7df7df6 100644
--- a/drivers/usb/serial/mos7840.c
+++ b/drivers/usb/serial/mos7840.c
@@ -1438,8 +1438,8 @@ static int mos7840_write(struct tty_struct *tty, struct usb_serial_port *port,
}

if (urb->transfer_buffer == NULL) {
- urb->transfer_buffer =
- kmalloc(URB_TRANSFER_BUFFER_SIZE, GFP_KERNEL);
+ urb->transfer_buffer = kmalloc(URB_TRANSFER_BUFFER_SIZE,
+ GFP_ATOMIC);

if (urb->transfer_buffer == NULL) {
dev_err_console(port, "%s no more kernel memory...\n",
--
2.8.0.rc2.1.gbe9624a

diff --git a/arch/x86/include/asm/uaccess.h b/arch/x86/include/asm/uaccess.h
index 5ee2687..995c49a 100644
--- a/arch/x86/include/asm/uaccess.h
+++ b/arch/x86/include/asm/uaccess.h
@@ -381,7 +381,7 @@ do { \
asm volatile("1: mov"itype" %1,%"rtype"0\n" \
"2:\n" \
_ASM_EXTABLE_EX(1b, 2b) \
- : ltype(x) : "m" (__m(addr)))
+ : ltype(x) : "m" (__m(addr)), "0" (0))

#define __put_user_nocheck(x, ptr, size) \
({ \
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:35:34 PM2/5/17

to

From: Petr Vandrovec <pe...@vandrovec.name>

commit 2ce9d2272b98743b911196c49e7af5841381c206 upstream.

Some code (all error handling) submits CDBs that are allocated
on the stack. This breaks with CB/CBI code that tries to create
URB directly from SCSI command buffer - which happens to be in
vmalloced memory with vmalloced kernel stacks.

Let's make copy of the command in usb_stor_CB_transport.

Signed-off-by: Petr Vandrovec <pe...@vandrovec.name>
Acked-by: Alan Stern <st...@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/storage/transport.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/storage/transport.c b/drivers/usb/storage/transport.c
index b1d815e..8988b26 100644
--- a/drivers/usb/storage/transport.c
+++ b/drivers/usb/storage/transport.c
@@ -919,10 +919,15 @@ int usb_stor_CB_transport(struct scsi_cmnd *srb, struct us_data *us)

/* COMMAND STAGE */
/* let's send the command via the control pipe */
+ /*
+ * Command is sometime (f.e. after scsi_eh_prep_cmnd) on the stack.
+ * Stack may be vmallocated. So no DMA for us. Make a copy.
+ */
+ memcpy(us->iobuf, srb->cmnd, srb->cmd_len);
result = usb_stor_ctrl_transfer(us, us->send_ctrl_pipe,
US_CBI_ADSC,
USB_TYPE_CLASS | USB_RECIP_INTERFACE, 0,
- us->ifnum, srb->cmnd, srb->cmd_len);
+ us->ifnum, us->iobuf, srb->cmd_len);

/* check the return code for the command */
usb_stor_dbg(us, "Call to usb_stor_ctrl_transfer() returned %d\n",
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:36:00 PM2/5/17

diff --git a/drivers/block/drbd/drbd_main.c b/drivers/block/drbd/drbd_main.c
index a5dca6a..776fc08 100644
--- a/drivers/block/drbd/drbd_main.c
+++ b/drivers/block/drbd/drbd_main.c
@@ -1771,7 +1771,7 @@ int drbd_send(struct drbd_tconn *tconn, struct socket *sock,
* do we need to block DRBD_SIG if sock == &meta.socket ??
* otherwise wake_asender() might interrupt some send_*Ack !
*/
- rv = kernel_sendmsg(sock, &msg, &iov, 1, size);
+ rv = kernel_sendmsg(sock, &msg, &iov, 1, iov.iov_len);
if (rv == -EAGAIN) {
if (we_should_drop_the_connection(tconn, sock))
break;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Kinglong Mee <kingl...@gmail.com>

commit 3f42d2c428c724212c5f4249daea97e254eb0546 upstream.

Connection from alloc_conn must be freed through free_conn,
otherwise, the reference of svc_xprt will never be put.

Signed-off-by: Kinglong Mee <kingl...@gmail.com>
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/nfsd/nfs4state.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 4a58afa..b0878e1 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -2193,7 +2193,8 @@ out:
if (!list_empty(&clp->cl_revoked))
seq->status_flags |= SEQ4_STATUS_RECALLABLE_STATE_REVOKED;
out_no_session:
- kfree(conn);
+ if (conn)
+ free_conn(conn);
spin_unlock(&nn->client_lock);
return status;
out_put_session:
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Stephen Suryaputra Lin <stephen.sur...@gmail.com>

commit 969447f226b451c453ddc83cac6144eaeac6f2e3 upstream.

In v2.6, ip_rt_redirect() calls arp_bind_neighbour() which returns 0
and then the state of the neigh for the new_gw is checked. If the state
isn't valid then the redirected route is deleted. This behavior is
maintained up to v3.5.7 by check_peer_redirect() because rt->rt_gateway
is assigned to peer->redirect_learned.a4 before calling
ipv4_neigh_lookup().

After commit 5943634fc559 ("ipv4: Maintain redirect and PMTU info in
struct rtable again."), ipv4_neigh_lookup() is performed without the
rt_gateway assigned to the new_gw. In the case when rt_gateway (old_gw)
isn't zero, the function uses it as the key. The neigh is most likely
valid since the old_gw is the one that sends the ICMP redirect message.
Then the new_gw is assigned to fib_nh_exception. The problem is: the
new_gw ARP may never gets resolved and the traffic is blackholed.

So, use the new_gw for neigh lookup.

Changes from v1:
- use __ipv4_neigh_lookup instead (per Eric Dumazet).

Fixes: 5943634fc559 ("ipv4: Maintain redirect and PMTU info in struct rtable again.")
Signed-off-by: Stephen Suryaputra Lin <ssu...@ieee.org>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/route.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index cbad9b8..e59d633 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -713,7 +713,9 @@ static void __ip_do_redirect(struct rtable *rt, struct sk_buff *skb, struct flow
goto reject_redirect;
}

- n = ipv4_neigh_lookup(&rt->dst, NULL, &new_gw);
+ n = __ipv4_neigh_lookup(rt->dst.dev, new_gw);
+ if (!n)
+ n = neigh_create(&arp_tbl, &new_gw, rt->dst.dev);
if (!IS_ERR(n)) {
if (!(n->nud_state & NUD_VALID)) {
neigh_event_send(n, NULL);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Marcelo Ricardo Leitner <marcelo...@gmail.com>

commit bf911e985d6bbaa328c20c3e05f4eb03de11fdd6 upstream.

Andrey Konovalov reported that KASAN detected that SCTP was using a slab
beyond the boundaries. It was caused because when handling out of the
blue packets in function sctp_sf_ootb() it was checking the chunk len
only after already processing the first chunk, validating only for the
2nd and subsequent ones.

The fix is to just move the check upwards so it's also validated for the
1st chunk.

Reported-by: Andrey Konovalov <andre...@google.com>
Tested-by: Andrey Konovalov <andre...@google.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo...@gmail.com>
Reviewed-by: Xin Long <lucie...@gmail.com>
Acked-by: Neil Horman <nho...@tuxdriver.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sctp/sm_statefuns.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d9cbecb..df938b2 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -3428,6 +3428,12 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
commands);

+ /* Report violation if chunk len overflows */
+ ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
+ if (ch_end > skb_tail_pointer(skb))
+ return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
+ commands);
+
/* Now that we know we at least have a chunk header,
* do things that are type appropriate.
*/
@@ -3459,12 +3465,6 @@ sctp_disposition_t sctp_sf_ootb(struct net *net,
}
}

- /* Report violation if chunk len overflows */
- ch_end = ((__u8 *)ch) + WORD_ROUND(ntohs(ch->length));
- if (ch_end > skb_tail_pointer(skb))
- return sctp_sf_violation_chunklen(net, ep, asoc, type, arg,
- commands);
-
ch = (sctp_chunkhdr_t *) ch_end;
} while (ch_end < skb_tail_pointer(skb));

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Mike Snitzer <sni...@redhat.com>

commit 299f6230bc6d0ccd5f95bb0fb865d80a9c7d5ccc upstream.

v4.8-rc3 commit 99f3c90d0d ("dm flakey: error READ bios during the
down_interval") overlooked the 'drop_writes' feature, which is meant to
allow reads to be issued rather than errored, during the down_interval.

Fixes: 99f3c90d0d ("dm flakey: error READ bios during the down_interval")
Reported-by: Qu Wenruo <quwe...@cn.fujitsu.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/md/dm-flakey.c | 27 ++++++++++++++++-----------
1 file changed, 16 insertions(+), 11 deletions(-)

diff --git a/drivers/md/dm-flakey.c b/drivers/md/dm-flakey.c
index a9a47cd..ace01a3 100644
--- a/drivers/md/dm-flakey.c
+++ b/drivers/md/dm-flakey.c
@@ -286,15 +286,13 @@ static int flakey_map(struct dm_target *ti, struct bio *bio)
pb->bio_submitted = true;

/*
- * Map reads as normal only if corrupt_bio_byte set.
+ * Error reads if neither corrupt_bio_byte or drop_writes are set.
+ * Otherwise, flakey_end_io() will decide if the reads should be modified.
*/
if (bio_data_dir(bio) == READ) {
- /* If flags were specified, only corrupt those that match. */
- if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
- all_corrupt_bio_flags_match(bio, fc))
- goto map_bio;
- else
+ if (!fc->corrupt_bio_byte && !test_bit(DROP_WRITES, &fc->flags))
return -EIO;
+ goto map_bio;
}

/*
@@ -331,14 +329,21 @@ static int flakey_end_io(struct dm_target *ti, struct bio *bio, int error)
struct flakey_c *fc = ti->private;
struct per_bio_data *pb = dm_per_bio_data(bio, sizeof(struct per_bio_data));

- /*
- * Corrupt successful READs while in down state.
- */
if (!error && pb->bio_submitted && (bio_data_dir(bio) == READ)) {
- if (fc->corrupt_bio_byte)
+ if (fc->corrupt_bio_byte && (fc->corrupt_bio_rw == READ) &&
+ all_corrupt_bio_flags_match(bio, fc)) {
+ /*
+ * Corrupt successful matching READs while in down state.
+ */
corrupt_bio_data(bio, fc);
- else
+
+ } else if (!test_bit(DROP_WRITES, &fc->flags)) {
+ /*
+ * Error read during the down_interval if drop_writes
+ * wasn't configured.
+ */
return -EIO;
+ }
}

return error;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Eli Cooper <elic...@gmx.com>

commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream.

skb->cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)->frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)->flags, otherwise it needs to be done earlier.

Cc: sta...@vger.kernel.org
Signed-off-by: Eli Cooper <elic...@gmx.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/net/ip6_tunnel.h | 1 +

1 file changed, 1 insertion(+)

diff --git a/include/net/ip6_tunnel.h b/include/net/ip6_tunnel.h
index 4da5de1..b140c60 100644
--- a/include/net/ip6_tunnel.h
+++ b/include/net/ip6_tunnel.h
@@ -75,6 +75,7 @@ static inline void ip6tunnel_xmit(struct sk_buff *skb, struct net_device *dev)
int pkt_len, err;

nf_reset(skb);
+ memset(skb->cb, 0, sizeof(struct inet6_skb_parm));
pkt_len = skb->len;
err = ip6_local_out(skb);

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Arend Van Spriel <arend.v...@broadcom.com>

commit ded89912156b1a47d940a0c954c43afbabd0c42c upstream.

User-space can choose to omit NL80211_ATTR_SSID and only provide raw
IE TLV data. When doing so it can provide SSID IE with length exceeding
the allowed size. The driver further processes this IE copying it
into a local variable without checking the length. Hence stack can be
corrupted and used as exploit.

Reported-by: Daxing Guo <freen...@gmail.com>
Reviewed-by: Hante Meuleman <hante.m...@broadcom.com>
Reviewed-by: Pieter-Paul Giesberts <pieter-pau...@broadcom.com>
Reviewed-by: Franky Lin <frank...@broadcom.com>
Signed-off-by: Arend van Spriel <arend.v...@broadcom.com>
Signed-off-by: Kalle Valo <kv...@codeaurora.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
index 301e572..2c52430 100644
--- a/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
+++ b/drivers/net/wireless/brcm80211/brcmfmac/wl_cfg80211.c
@@ -3726,7 +3726,7 @@ brcmf_cfg80211_start_ap(struct wiphy *wiphy, struct net_device *ndev,
(u8 *)&settings->beacon.head[ie_offset],
settings->beacon.head_len - ie_offset,
WLAN_EID_SSID);
- if (!ssid_ie)
+ if (!ssid_ie || ssid_ie->len > IEEE80211_MAX_SSID_LEN)
return -EINVAL;

memcpy(ssid_le.SSID, ssid_ie->data, ssid_ie->len);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Christian Borntraeger <bornt...@de.ibm.com>

commit 230fa253df6352af12ad0a16128760b5cb3f92df upstream.

ACCESS_ONCE does not work reliably on non-scalar types. For
example gcc 4.6 and 4.7 might remove the volatile tag for such
accesses during the SRA (scalar replacement of aggregates) step
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=58145)

Let's provide READ_ONCE/ASSIGN_ONCE that will do all accesses via
scalar types as suggested by Linus Torvalds. Accesses larger than
the machines word size cannot be guaranteed to be atomic. These
macros will use memcpy and emit a build warning.

Signed-off-by: Christian Borntraeger <bornt...@de.ibm.com>
[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/compiler.h | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 74 insertions(+)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index 6977192..9df1978 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -179,6 +179,80 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);
# define __UNIQUE_ID(prefix) __PASTE(__PASTE(__UNIQUE_ID_, prefix), __LINE__)
#endif

+#include <uapi/linux/types.h>
+
+static __always_inline void data_access_exceeds_word_size(void)
+#ifdef __compiletime_warning
+__compiletime_warning("data access exceeds word size and won't be atomic")
+#endif
+;
+
+static __always_inline void data_access_exceeds_word_size(void)
+{
+}
+
+static __always_inline void __read_once_size(volatile void *p, void *res, int size)
+{
+ switch (size) {
+ case 1: *(__u8 *)res = *(volatile __u8 *)p; break;
+ case 2: *(__u16 *)res = *(volatile __u16 *)p; break;
+ case 4: *(__u32 *)res = *(volatile __u32 *)p; break;
+#ifdef CONFIG_64BIT
+ case 8: *(__u64 *)res = *(volatile __u64 *)p; break;
+#endif
+ default:
+ barrier();
+ __builtin_memcpy((void *)res, (const void *)p, size);
+ data_access_exceeds_word_size();
+ barrier();
+ }
+}
+
+static __always_inline void __assign_once_size(volatile void *p, void *res, int size)
+{
+ switch (size) {
+ case 1: *(volatile __u8 *)p = *(__u8 *)res; break;
+ case 2: *(volatile __u16 *)p = *(__u16 *)res; break;
+ case 4: *(volatile __u32 *)p = *(__u32 *)res; break;
+#ifdef CONFIG_64BIT
+ case 8: *(volatile __u64 *)p = *(__u64 *)res; break;
+#endif
+ default:
+ barrier();
+ __builtin_memcpy((void *)p, (const void *)res, size);
+ data_access_exceeds_word_size();
+ barrier();
+ }
+}
+
+/*
+ * Prevent the compiler from merging or refetching reads or writes. The
+ * compiler is also forbidden from reordering successive instances of
+ * READ_ONCE, ASSIGN_ONCE and ACCESS_ONCE (see below), but only when the
+ * compiler is aware of some particular ordering. One way to make the
+ * compiler aware of ordering is to put the two invocations of READ_ONCE,
+ * ASSIGN_ONCE or ACCESS_ONCE() in different C statements.
+ *
+ * In contrast to ACCESS_ONCE these two macros will also work on aggregate
+ * data types like structs or unions. If the size of the accessed data
+ * type exceeds the word size of the machine (e.g., 32 bits or 64 bits)
+ * READ_ONCE() and ASSIGN_ONCE() will fall back to memcpy and print a
+ * compile-time warning.
+ *
+ * Their two major use cases are: (1) Mediating communication between
+ * process-level code and irq/NMI handlers, all running on the same CPU,
+ * and (2) Ensuring that the compiler does not fold, spindle, or otherwise
+ * mutilate accesses that either do not require ordering or that interact
+ * with an explicit memory barrier or atomic instruction that provides the
+ * required ordering.
+ */
+
+#define READ_ONCE(x) \
+ ({ typeof(x) __val; __read_once_size(&x, &__val, sizeof(__val)); __val; })
+
+#define ASSIGN_ONCE(val, x) \
+ ({ typeof(x) __val; __val = val; __assign_once_size(&x, &__val, sizeof(__val)); __val; })
+
#endif /* __KERNEL__ */

#endif /* __ASSEMBLY__ */
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Eric Dumazet <edum...@google.com>

commit 20c64d5cd5a2bdcdc8982a06cb05e5e1bd851a3d upstream.

A malicious TCP receiver, sending SACK, can force the sender to split
skbs in write queue and increase its memory usage.

Then, when socket is closed and its write queue purged, we might
overflow sk_forward_alloc (It becomes negative)

sk_mem_reclaim() does nothing in this case, and more than 2GB
are leaked from TCP perspective (tcp_memory_allocated is not changed)

Then warnings trigger from inet_sock_destruct() and
sk_stream_kill_queues() seeing a not zero sk_forward_alloc

All TCP stack can be stuck because TCP is under memory pressure.

A simple fix is to preemptively reclaim from sk_mem_uncharge().

This makes sure a socket wont have more than 2 MB forward allocated,
after burst and idle period.

Signed-off-by: Eric Dumazet <edum...@google.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/net/sock.h | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/include/net/sock.h b/include/net/sock.h
index 6d2fbac..a46dd30 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -1422,6 +1422,16 @@ static inline void sk_mem_uncharge(struct sock *sk, int size)
if (!sk_has_account(sk))
return;
sk->sk_forward_alloc += size;
+
+ /* Avoid a possible overflow.
+ * TCP send queues can make this happen, if sk_mem_reclaim()
+ * is not called and more than 2 GBytes are released at once.
+ *
+ * If we reach 2 MBytes, reclaim 1 MBytes right now, there is
+ * no need to hold that much forward allocation anyway.
+ */
+ if (unlikely(sk->sk_forward_alloc >= 1 << 21))
+ __sk_mem_reclaim(sk, 1 << 20);
}

static inline void sk_wmem_free_skb(struct sock *sk, struct sk_buff *skb)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Felipe Balbi <felipe...@linux.intel.com>

commit fd9afd3cbe404998d732be6cc798f749597c5114 upstream.

According to Dave Miller "the networking stack has a
hard requirement that all SKBs which are transmitted
must have their completion signalled in a fininte
amount of time. This is because, until the SKB is
freed by the driver, it holds onto socket,
netfilter, and other subsystem resources."

In summary, this means that using TX IRQ throttling
for the networking gadgets is, at least, complex and
we should avoid it for the time being.

Reported-by: Ville Syrjälä <ville....@linux.intel.com>
Tested-by: Ville Syrjälä <ville....@linux.intel.com>
Suggested-by: David Miller <da...@davemloft.net>
Signed-off-by: Felipe Balbi <felipe...@linux.intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/gadget/u_ether.c | 8 --------
1 file changed, 8 deletions(-)

diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index aad066d..ef5c623 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -584,14 +584,6 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb,

req->length = length;

- /* throttle high/super speed IRQ rate back slightly */
- if (gadget_is_dualspeed(dev->gadget))
- req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH ||
- dev->gadget->speed == USB_SPEED_SUPER)) &&
- !list_empty(&dev->tx_reqs))
- ? ((atomic_read(&dev->tx_qlen) % qmult) != 0)
- : 0;
-
retval = usb_ep_queue(in, req, GFP_ATOMIC);
switch (retval) {
default:
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

to

From: Oliver Hartkopp <sock...@hartkopp.net>

commit deb507f91f1adbf64317ad24ac46c56eeccfb754 upstream.

Andrey Konovalov reported an issue with proc_register in bcm.c.
As suggested by Cong Wang this patch adds a lock_sock() protection and
a check for unsuccessful proc_create_data() in bcm_connect().

Reference: http://marc.info/?l=linux-netdev&m=147732648731237

Reported-by: Andrey Konovalov <andre...@google.com>
Suggested-by: Cong Wang <xiyou.w...@gmail.com>
Signed-off-by: Oliver Hartkopp <sock...@hartkopp.net>
Acked-by: Cong Wang <xiyou.w...@gmail.com>
Tested-by: Andrey Konovalov <andre...@google.com>
Signed-off-by: Marc Kleine-Budde <m...@pengutronix.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/can/bcm.c | 32 +++++++++++++++++++++++---------
1 file changed, 23 insertions(+), 9 deletions(-)

diff --git a/net/can/bcm.c b/net/can/bcm.c
index 35cf02d..dd0781c 100644
--- a/net/can/bcm.c
+++ b/net/can/bcm.c
@@ -1500,24 +1500,31 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
struct sockaddr_can *addr = (struct sockaddr_can *)uaddr;
struct sock *sk = sock->sk;
struct bcm_sock *bo = bcm_sk(sk);
+ int ret = 0;

if (len < sizeof(*addr))
return -EINVAL;

- if (bo->bound)
- return -EISCONN;
+ lock_sock(sk);
+
+ if (bo->bound) {
+ ret = -EISCONN;
+ goto fail;
+ }

/* bind a device to this socket */
if (addr->can_ifindex) {
struct net_device *dev;

dev = dev_get_by_index(&init_net, addr->can_ifindex);
- if (!dev)
- return -ENODEV;
-
+ if (!dev) {
+ ret = -ENODEV;
+ goto fail;
+ }
if (dev->type != ARPHRD_CAN) {
dev_put(dev);
- return -ENODEV;
+ ret = -ENODEV;
+ goto fail;
}

bo->ifindex = dev->ifindex;
@@ -1528,17 +1535,24 @@ static int bcm_connect(struct socket *sock, struct sockaddr *uaddr, int len,
bo->ifindex = 0;
}

- bo->bound = 1;
-
if (proc_dir) {
/* unique socket address as filename */
sprintf(bo->procname, "%lu", sock_i_ino(sk));
bo->bcm_proc_read = proc_create_data(bo->procname, 0644,
proc_dir,
&bcm_proc_fops, sk);
+ if (!bo->bcm_proc_read) {
+ ret = -ENOMEM;
+ goto fail;
+ }
}

- return 0;
+ bo->bound = 1;
+
+fail:
+ release_sock(sk);
+
+ return ret;
}

static int bcm_recvmsg(struct kiocb *iocb, struct socket *sock,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:07 PM2/5/17

1 file changed, 1 insertion(+), 1 deletion(-)

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Scot Doyle <lkm...@scotdoyle.com>

commit 009e39ae44f4191188aeb6dfbf661b771dbbe515 upstream.

When resizing a vt its selection may exceed the new size, resulting in
an invalid memory access [1]. Clear the selection before resizing.

[1] http://lkml.kernel.org/r/CACT4Y+acDTwy4umEvf5ROBGiRJNrxHN4Cn5szCXE5Jw-d1B=X...@mail.gmail.com

Reported-and-tested-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: Scot Doyle <lkm...@scotdoyle.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/tty/vt/vt.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/tty/vt/vt.c b/drivers/tty/vt/vt.c
index 5deddca..010ec70 100644
--- a/drivers/tty/vt/vt.c
+++ b/drivers/tty/vt/vt.c
@@ -869,6 +869,9 @@ static int vc_do_resize(struct tty_struct *tty, struct vc_data *vc,
if (!newscreen)
return -ENOMEM;

+ if (vc == sel_cons)
+ clear_selection();
+
old_rows = vc->vc_rows;
old_row_size = vc->vc_size_row;

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Punit Agrawal <punit....@arm.com>

commit 806487a8fc8f385af75ed261e9ab658fc845e633 upstream.

Although ghes_proc() tests for errors while reading the error status,
it always return success (0). Fix this by propagating the return
value.

Fixes: d334a49113a4a33 (ACPI, APEI, Generic Hardware Error Source memory error support)
Signed-of-by: Punit Agrawal <punit.agrawa.@arm.com>
Tested-by: Tyler Baicar <tba...@codeaurora.org>
Reviewed-by: Borislav Petkov <b...@suse.de>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/acpi/apei/ghes.c | 2 +-

1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index fcd7d91..070b843 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -647,7 +647,7 @@ static int ghes_proc(struct ghes *ghes)
ghes_do_proc(ghes, ghes->estatus);
out:
ghes_clear_estatus(ghes);
- return 0;
+ return rc;
}

static void ghes_add_timer(struct ghes *ghes)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Daeho Jeong <daeho...@samsung.com>

commit 93e3b4e6631d2a74a8cf7429138096862ff9f452 upstream.

Now, ext4_do_update_inode() clears high 16-bit fields of uid/gid
of deleted and evicted inode to fix up interoperability with old
kernels. However, it checks only i_dtime of an inode to determine
whether the inode was deleted and evicted, and this is very risky,
because i_dtime can be used for the pointer maintaining orphan inode
list, too. We need to further check whether the i_dtime is being
used for the orphan inode list even if the i_dtime is not NULL.

We found that high 16-bit fields of uid/gid of inode are unintentionally
and permanently cleared when the inode truncation is just triggered,
but not finished, and the inode metadata, whose high uid/gid bits are
cleared, is written on disk, and the sudden power-off follows that
in order.

Signed-off-by: Daeho Jeong <daeho...@samsung.com>
Signed-off-by: Hobin Woo <hobi...@samsung.com>

Signed-off-by: Theodore Ts'o <ty...@mit.edu>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext4/inode.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 046e0e1..a187055 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -4480,14 +4480,14 @@ static int ext4_do_update_inode(handle_t *handle,
* Fix up interoperability with old kernels. Otherwise, old inodes get
* re-used with the upper 16 bits of the uid/gid intact
*/
- if (!ei->i_dtime) {
+ if (ei->i_dtime && list_empty(&ei->i_orphan)) {
+ raw_inode->i_uid_high = 0;
+ raw_inode->i_gid_high = 0;
+ } else {
raw_inode->i_uid_high =
cpu_to_le16(high_16_bits(i_uid));
raw_inode->i_gid_high =
cpu_to_le16(high_16_bits(i_gid));
- } else {
- raw_inode->i_uid_high = 0;
- raw_inode->i_gid_high = 0;
}
} else {
raw_inode->i_uid_low = cpu_to_le16(fs_high2lowuid(i_uid));
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Stefan Richter <ste...@s5r6.in-berlin.de>

commit e9300a4b7bbae83af1f7703938c94cf6dc6d308f upstream.

RFC 2734 defines the datagram_size field in fragment encapsulation
headers thus:

datagram_size: The encoded size of the entire IP datagram. The
value of datagram_size [...] SHALL be one less than the value of
Total Length in the datagram's IP header (see STD 5, RFC 791).

Accordingly, the eth1394 driver of Linux 2.6.36 and older set and got
this field with a -/+1 offset:

ether1394_tx() /* transmit */
ether1394_encapsulate_prep()
hdr->ff.dg_size = dg_size - 1;

ether1394_data_handler() /* receive */
if (hdr->common.lf == ETH1394_HDR_LF_FF)
dg_size = hdr->ff.dg_size + 1;
else
dg_size = hdr->sf.dg_size + 1;

Likewise, I observe OS X 10.4 and Windows XP Pro SP3 to transmit 1500
byte sized datagrams in fragments with datagram_size=1499 if link
fragmentation is required.

Only firewire-net sets and gets datagram_size without this offset. The
result is lacking interoperability of firewire-net with OS X, Windows
XP, and presumably Linux' eth1394. (I did not test with the latter.)
For example, FTP data transfers to a Linux firewire-net box with max_rec
smaller than the 1500 bytes MTU
- from OS X fail entirely,
- from Win XP start out with a bunch of fragmented datagrams which
time out, then continue with unfragmented datagrams because Win XP
temporarily reduces the MTU to 576 bytes.

So let's fix firewire-net's datagram_size accessors.

Note that firewire-net thereby loses interoperability with unpatched
firewire-net, but only if link fragmentation is employed. (This happens
with large broadcast datagrams, and with large datagrams on several
FireWire CardBus cards with smaller max_rec than equivalent PCI cards,
and it can be worked around by setting a small enough MTU.)

Signed-off-by: Stefan Richter <ste...@s5r6.in-berlin.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/firewire/net.c | 8 ++++----

1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/firewire/net.c b/drivers/firewire/net.c
index 70a716e..1321319 100644
--- a/drivers/firewire/net.c
+++ b/drivers/firewire/net.c
@@ -73,13 +73,13 @@ struct rfc2734_header {

#define fwnet_get_hdr_lf(h) (((h)->w0 & 0xc0000000) >> 30)
#define fwnet_get_hdr_ether_type(h) (((h)->w0 & 0x0000ffff))
-#define fwnet_get_hdr_dg_size(h) (((h)->w0 & 0x0fff0000) >> 16)
+#define fwnet_get_hdr_dg_size(h) ((((h)->w0 & 0x0fff0000) >> 16) + 1)
#define fwnet_get_hdr_fg_off(h) (((h)->w0 & 0x00000fff))
#define fwnet_get_hdr_dgl(h) (((h)->w1 & 0xffff0000) >> 16)

-#define fwnet_set_hdr_lf(lf) ((lf) << 30)
+#define fwnet_set_hdr_lf(lf) ((lf) << 30)
#define fwnet_set_hdr_ether_type(et) (et)
-#define fwnet_set_hdr_dg_size(dgs) ((dgs) << 16)
+#define fwnet_set_hdr_dg_size(dgs) (((dgs) - 1) << 16)
#define fwnet_set_hdr_fg_off(fgo) (fgo)

#define fwnet_set_hdr_dgl(dgl) ((dgl) << 16)
@@ -635,7 +635,7 @@ static int fwnet_incoming_packet(struct fwnet_device *dev, __be32 *buf, int len,
fg_off = fwnet_get_hdr_fg_off(&hdr);
}
datagram_label = fwnet_get_hdr_dgl(&hdr);
- dg_size = fwnet_get_hdr_dg_size(&hdr); /* ??? + 1 */
+ dg_size = fwnet_get_hdr_dg_size(&hdr);

if (fg_off + len > dg_size)
return 0;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Erez Shitrit <ere...@mellanox.com>

commit 546481c2816ea3c061ee9d5658eb48070f69212e upstream.

When a new CM connection is being requested, ipoib driver copies data
from the path pointer in the CM/tx object, the path object might be
invalid at the point and memory corruption will happened later when now
the CM driver will try using that data.

The next scenario demonstrates it:
neigh_add_path --> ipoib_cm_create_tx -->
queue_work (pointer to path is in the cm/tx struct)
#while the work is still in the queue,
#the port goes down and causes the ipoib_flush_paths:
ipoib_flush_paths --> path_free --> kfree(path)
#at this point the work scheduled starts.
ipoib_cm_tx_start --> copy from the (invalid)path pointer:
(memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);)
-> memory corruption.

To fix that the driver now starts the CM/tx connection only if that
specific path exists in the general paths database.
This check is protected with the relevant locks, and uses the gid from
the neigh member in the CM/tx object which is valid according to the ref
count that was taken by the CM/tx.

Fixes: 839fcaba35 ('IPoIB: Connected mode experimental support')
Signed-off-by: Erez Shitrit <ere...@mellanox.com>

Signed-off-by: Leon Romanovsky <le...@kernel.org>
Signed-off-by: Doug Ledford <dled...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/infiniband/ulp/ipoib/ipoib.h | 1 +
drivers/infiniband/ulp/ipoib/ipoib_cm.c | 16 ++++++++++++++++
drivers/infiniband/ulp/ipoib/ipoib_main.c | 2 +-
3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h b/drivers/infiniband/ulp/ipoib/ipoib.h
index eb71aaa..fb9a7b3 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib.h
+++ b/drivers/infiniband/ulp/ipoib/ipoib.h
@@ -460,6 +460,7 @@ void ipoib_send(struct net_device *dev, struct sk_buff *skb,
struct ipoib_ah *address, u32 qpn);
void ipoib_reap_ah(struct work_struct *work);

+struct ipoib_path *__path_find(struct net_device *dev, void *gid);
void ipoib_mark_paths_invalid(struct net_device *dev);
void ipoib_flush_paths(struct net_device *dev);
struct ipoib_dev_priv *ipoib_intf_alloc(const char *format);
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
index 3eceb61..aa9ad2d 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
@@ -1290,6 +1290,8 @@ void ipoib_cm_destroy_tx(struct ipoib_cm_tx *tx)
}
}

+#define QPN_AND_OPTIONS_OFFSET 4
+
static void ipoib_cm_tx_start(struct work_struct *work)
{
struct ipoib_dev_priv *priv = container_of(work, struct ipoib_dev_priv,
@@ -1298,6 +1300,7 @@ static void ipoib_cm_tx_start(struct work_struct *work)
struct ipoib_neigh *neigh;
struct ipoib_cm_tx *p;
unsigned long flags;
+ struct ipoib_path *path;
int ret;

struct ib_sa_path_rec pathrec;
@@ -1310,7 +1313,19 @@ static void ipoib_cm_tx_start(struct work_struct *work)
p = list_entry(priv->cm.start_list.next, typeof(*p), list);
list_del_init(&p->list);
neigh = p->neigh;
+
qpn = IPOIB_QPN(neigh->daddr);
+ /*
+ * As long as the search is with these 2 locks,
+ * path existence indicates its validity.
+ */
+ path = __path_find(dev, neigh->daddr + QPN_AND_OPTIONS_OFFSET);
+ if (!path) {
+ pr_info("%s ignore not valid path %pI6\n",
+ __func__,
+ neigh->daddr + QPN_AND_OPTIONS_OFFSET);
+ goto free_neigh;
+ }
memcpy(&pathrec, &p->path->pathrec, sizeof pathrec);

spin_unlock_irqrestore(&priv->lock, flags);
@@ -1322,6 +1337,7 @@ static void ipoib_cm_tx_start(struct work_struct *work)
spin_lock_irqsave(&priv->lock, flags);

if (ret) {
+free_neigh:
neigh = p->neigh;
if (neigh) {
neigh->cm = NULL;
diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c b/drivers/infiniband/ulp/ipoib/ipoib_main.c
index a481094..375f9ed 100644
--- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
+++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
@@ -251,7 +251,7 @@ int ipoib_set_mode(struct net_device *dev, const char *buf)
return -EINVAL;
}

-static struct ipoib_path *__path_find(struct net_device *dev, void *gid)
+struct ipoib_path *__path_find(struct net_device *dev, void *gid)
{
struct ipoib_dev_priv *priv = netdev_priv(dev);
struct rb_node *n = priv->path_tree.rb_node;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:08 PM2/5/17

to

From: Myron Stowe <myron...@redhat.com>

commit 06cf35f903aa6da0cc8d9f81e9bcd1f7e1b534bb upstream.

Some AMD CS553x devices have read-only BARs because of a firmware or
hardware defect. There's a workaround in quirk_cs5536_vsa(), but it no
longer works after 36e8164882ca ("PCI: Restore detection of read-only
BARs"). Prior to 36e8164882ca, we filled in res->start; afterwards we
leave it zeroed out. The quirk only updated the size, so the driver tried
to use a region starting at zero, which didn't work.

Expand quirk_cs5536_vsa() to read the base addresses from the BARs and
hard-code the sizes.

On Nix's system BAR 2's read-only value is 0x6200. Prior to 36e8164882ca,
we interpret that as a 512-byte BAR based on the lowest-order bit set. Per
datasheet sec 5.6.1, that BAR (MFGPT) requires only 64 bytes; use that to
avoid clearing any address bits if a platform uses only 64-byte alignment.

[js] pcibios_bus_to_resource takes pdev, not bus in 3.12

[bhelgaas: changelog, reduce BAR 2 size to 64]
Fixes: 36e8164882ca ("PCI: Restore detection of read-only BARs")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=85991#c4
Link: http://support.amd.com/TechDocs/31506_cs5535_databook.pdf
Link: http://support.amd.com/TechDocs/33238G_cs5536_db.pdf
Reported-and-tested-by: Nix <n...@esperi.org.uk>
Signed-off-by: Myron Stowe <myron...@redhat.com>
Signed-off-by: Bjorn Helgaas <bhel...@google.com>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/pci/quirks.c | 41 +++++++++++++++++++++++++++++++++++++----
1 file changed, 37 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index a663715..b6625e5 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -339,19 +339,52 @@ static void quirk_s3_64M(struct pci_dev *dev)
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3, PCI_DEVICE_ID_S3_868, quirk_s3_64M);
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_S3, PCI_DEVICE_ID_S3_968, quirk_s3_64M);

+static void quirk_io(struct pci_dev *dev, int pos, unsigned size,
+ const char *name)
+{
+ u32 region;
+ struct pci_bus_region bus_region;
+ struct resource *res = dev->resource + pos;
+
+ pci_read_config_dword(dev, PCI_BASE_ADDRESS_0 + (pos << 2), &region);
+
+ if (!region)
+ return;
+
+ res->name = pci_name(dev);
+ res->flags = region & ~PCI_BASE_ADDRESS_IO_MASK;
+ res->flags |=
+ (IORESOURCE_IO | IORESOURCE_PCI_FIXED | IORESOURCE_SIZEALIGN);
+ region &= ~(size - 1);
+
+ /* Convert from PCI bus to resource space */
+ bus_region.start = region;
+ bus_region.end = region + size - 1;
+ pcibios_bus_to_resource(dev, res, &bus_region);
+
+ dev_info(&dev->dev, FW_BUG "%s quirk: reg 0x%x: %pR\n",
+ name, PCI_BASE_ADDRESS_0 + (pos << 2), res);
+}
+
/*
* Some CS5536 BIOSes (for example, the Soekris NET5501 board w/ comBIOS
* ver. 1.33 20070103) don't set the correct ISA PCI region header info.
* BAR0 should be 8 bytes; instead, it may be set to something like 8k
* (which conflicts w/ BAR1's memory range).
+ *
+ * CS553x's ISA PCI BARs may also be read-only (ref:
+ * https://bugzilla.kernel.org/show_bug.cgi?id=85991 - Comment #4 forward).
*/
static void quirk_cs5536_vsa(struct pci_dev *dev)
{
+ static char *name = "CS5536 ISA bridge";
+
if (pci_resource_len(dev, 0) != 8) {
- struct resource *res = &dev->resource[0];
- res->end = res->start + 8 - 1;
- dev_info(&dev->dev, "CS5536 ISA bridge bug detected "
- "(incorrect header); workaround applied.\n");
+ quirk_io(dev, 0, 8, name); /* SMB */
+ quirk_io(dev, 1, 256, name); /* GPIO */
+ quirk_io(dev, 2, 64, name); /* MFGPT */
+ dev_info(&dev->dev, "%s bug detected (incorrect header); workaround applied\n",
+ name);
}
}
DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CS5536_ISA, quirk_cs5536_vsa);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:09 PM2/5/17

to

From: Al Viro <vi...@zeniv.linux.org.uk>

commit 7798bf2140ebcc36eafec6a4194fffd8d585d471 upstream.

On faulting sigreturn we do get SIGSEGV, all right, but anything
we'd put into pt_regs could end up in the coredump. And since
__copy_from_user() never zeroed on arc, we'd better bugger off
on its failure without copying random uninitialized bits of
kernel stack into pt_regs...

Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/arc/kernel/signal.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/arch/arc/kernel/signal.c b/arch/arc/kernel/signal.c
index 6763654..0823087 100644
--- a/arch/arc/kernel/signal.c
+++ b/arch/arc/kernel/signal.c
@@ -80,13 +80,14 @@ static int restore_usr_regs(struct pt_regs *regs, struct rt_sigframe __user *sf)
int err;

err = __copy_from_user(&set, &sf->uc.uc_sigmask, sizeof(set));
- if (!err)
- set_current_blocked(&set);
-
- err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs),
+ err |= __copy_from_user(regs, &(sf->uc.uc_mcontext.regs.scratch),
sizeof(sf->uc.uc_mcontext.regs.scratch));
+ if (err)
+ return err;

- return err;
+ set_current_blocked(&set);
+
+ return 0;
}

static inline int is_do_ss_needed(unsigned int magic)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:09 PM2/5/17

to

From: Peter Zijlstra <pet...@infradead.org>

commit 7bd3e239d6c6d1cad276e8f130b386df4234dcd7 upstream.

The fact that volatile allows for atomic load/stores is a special case
not a requirement for {READ,WRITE}_ONCE(). Their primary purpose is to
force the compiler to emit load/stores _once_.

Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Acked-by: Christian Borntraeger <bornt...@de.ibm.com>
Acked-by: Will Deacon <will....@arm.com>
Acked-by: Linus Torvalds <torv...@linux-foundation.org>
Cc: Davidlohr Bueso <da...@stgolabs.net>
Cc: H. Peter Anvin <h...@zytor.com>
Cc: Paul McKenney <pau...@linux.vnet.ibm.com>
Cc: Stephen Rothwell <s...@canb.auug.org.au>
Cc: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Ingo Molnar <mi...@kernel.org>

[wt: backported only for next patch]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/compiler.h | 16 ----------------
1 file changed, 16 deletions(-)

diff --git a/include/linux/compiler.h b/include/linux/compiler.h
index c6d06e9..53563d8 100644
--- a/include/linux/compiler.h
+++ b/include/linux/compiler.h
@@ -181,29 +181,16 @@ void ftrace_likely_update(struct ftrace_branch_data *f, int val, int expect);

#include <uapi/linux/types.h>

-static __always_inline void data_access_exceeds_word_size(void)
-#ifdef __compiletime_warning
-__compiletime_warning("data access exceeds word size and won't be atomic")
-#endif
-;
-
-static __always_inline void data_access_exceeds_word_size(void)
-{
-}
-
static __always_inline void __read_once_size(const volatile void *p, void *res, int size)
{
switch (size) {

case 1: *(__u8 *)res = *(volatile __u8 *)p; break;

case 2: *(__u16 *)res = *(volatile __u16 *)p; break;

case 4: *(__u32 *)res = *(volatile __u32 *)p; break;

-#ifdef CONFIG_64BIT

case 8: *(__u64 *)res = *(volatile __u64 *)p; break;

-#endif
default:
barrier();

__builtin_memcpy((void *)res, (const void *)p, size);

- data_access_exceeds_word_size();
barrier();
}
}
@@ -214,13 +201,10 @@ static __always_inline void __write_once_size(volatile void *p, void *res, int s

case 1: *(volatile __u8 *)p = *(__u8 *)res; break;

case 2: *(volatile __u16 *)p = *(__u16 *)res; break;

case 4: *(volatile __u32 *)p = *(__u32 *)res; break;

-#ifdef CONFIG_64BIT

case 8: *(volatile __u64 *)p = *(__u64 *)res; break;

-#endif
default:
barrier();

__builtin_memcpy((void *)p, (const void *)res, size);

- data_access_exceeds_word_size();
barrier();
}
}
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:09 PM2/5/17

to

From: Sara Sharon <sara....@intel.com>

commit d5d0689aefc59c6a5352ca25d7e6d47d03f543ce upstream.

This fixes a pretty ancient bug that hasn't manifested itself
until now.
The scratchbuf for command queue is allocated only for 32 slots
but is accessed with the queue write pointer - which can be
up to 256.
Since the scratch buf size was 16 and there are up to 256 TFDs
we never passed a page boundary when accessing the scratch buffer,
but when attempting to increase the size of the scratch buffer a
panic was quick to follow when trying to access the address resulted
in a page boundary.

Signed-off-by: Sara Sharon <sara....@intel.com>
Fixes: 38c0f334b359 ("iwlwifi: use coherent DMA memory for command header")
Signed-off-by: Luca Coelho <luciano...@intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/wireless/iwlwifi/pcie/tx.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/net/wireless/iwlwifi/pcie/tx.c b/drivers/net/wireless/iwlwifi/pcie/tx.c
index f05962c..2e3a0d7 100644
--- a/drivers/net/wireless/iwlwifi/pcie/tx.c
+++ b/drivers/net/wireless/iwlwifi/pcie/tx.c
@@ -1311,9 +1311,9 @@ static int iwl_pcie_enqueue_hcmd(struct iwl_trans *trans,

/* start the TFD with the scratchbuf */
scratch_size = min_t(int, copy_size, IWL_HCMD_SCRATCHBUF_SIZE);
- memcpy(&txq->scratchbufs[q->write_ptr], &out_cmd->hdr, scratch_size);
+ memcpy(&txq->scratchbufs[idx], &out_cmd->hdr, scratch_size);
iwl_pcie_txq_build_tfd(trans, txq,
- iwl_pcie_get_scratchbuf_dma(txq, q->write_ptr),
+ iwl_pcie_get_scratchbuf_dma(txq, idx),
scratch_size, 1);

/* map first command fragment, if any remains */
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:09 PM2/5/17

to

From: "Darrick J. Wong" <darric...@oracle.com>

commit 58d789678546d46d7bbd809dd7dab417c0f23655 upstream.

The function xfs_calc_dquots_per_chunk takes a parameter in units
of basic blocks. The kernel seems to get the units wrong, but
userspace got 'fixed' by commenting out the unnecessary conversion.
Fix both.

Signed-off-by: Darrick J. Wong <darric...@oracle.com>
Reviewed-by: Eric Sandeen <san...@redhat.com>
Signed-off-by: Dave Chinner <da...@fromorbit.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/xfs/xfs_dquot.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/xfs/xfs_dquot.c b/fs/xfs/xfs_dquot.c
index bac3e16..e59f309 100644
--- a/fs/xfs/xfs_dquot.c
+++ b/fs/xfs/xfs_dquot.c
@@ -309,8 +309,7 @@ xfs_dquot_buf_verify_crc(
if (mp->m_quotainfo)
ndquots = mp->m_quotainfo->qi_dqperchunk;
else
- ndquots = xfs_qm_calc_dquots_per_chunk(mp,
- XFS_BB_TO_FSB(mp, bp->b_length));
+ ndquots = xfs_qm_calc_dquots_per_chunk(mp, bp->b_length);

for (i = 0; i < ndquots; i++, d++) {
if (!xfs_verify_cksum((char *)d, sizeof(struct xfs_dqblk),
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:40:10 PM2/5/17

to

From: Joe Perches <j...@perches.com>

commit 7f032d6ef6154868a2a5d5f6b2c3f8587292196c upstream.

The seq_printf return value, because it's frequently misused,
will eventually be converted to void.

See: commit 1f33c41c03da ("seq_file: Rename seq_overflow() to
seq_has_overflowed() and make public")

Signed-off-by: Joe Perches <j...@perches.com>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

ipc/msg.c | 34 ++++++++++++++++++----------------
ipc/sem.c | 26 ++++++++++++++------------
ipc/shm.c | 42 ++++++++++++++++++++++--------------------
ipc/util.c | 6 ++++--
4 files changed, 58 insertions(+), 50 deletions(-)

diff --git a/ipc/msg.c b/ipc/msg.c
index 32aaaab..9ce27d8 100644
--- a/ipc/msg.c
+++ b/ipc/msg.c
@@ -1046,21 +1046,23 @@ static int sysvipc_msg_proc_show(struct seq_file *s, void *it)
struct user_namespace *user_ns = seq_user_ns(s);
struct msg_queue *msq = it;

- return seq_printf(s,
- "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n",
- msq->q_perm.key,
- msq->q_perm.id,
- msq->q_perm.mode,
- msq->q_cbytes,
- msq->q_qnum,
- msq->q_lspid,
- msq->q_lrpid,
- from_kuid_munged(user_ns, msq->q_perm.uid),
- from_kgid_munged(user_ns, msq->q_perm.gid),
- from_kuid_munged(user_ns, msq->q_perm.cuid),
- from_kgid_munged(user_ns, msq->q_perm.cgid),
- msq->q_stime,
- msq->q_rtime,
- msq->q_ctime);
+ seq_printf(s,
+ "%10d %10d %4o %10lu %10lu %5u %5u %5u %5u %5u %5u %10lu %10lu %10lu\n",
+ msq->q_perm.key,
+ msq->q_perm.id,
+ msq->q_perm.mode,
+ msq->q_cbytes,
+ msq->q_qnum,
+ msq->q_lspid,
+ msq->q_lrpid,
+ from_kuid_munged(user_ns, msq->q_perm.uid),
+ from_kgid_munged(user_ns, msq->q_perm.gid),
+ from_kuid_munged(user_ns, msq->q_perm.cuid),
+ from_kgid_munged(user_ns, msq->q_perm.cgid),
+ msq->q_stime,
+ msq->q_rtime,
+ msq->q_ctime);
+
+ return 0;
}
#endif
diff --git a/ipc/sem.c b/ipc/sem.c
index 47a1519..57242be 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -2172,17 +2172,19 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)

sem_otime = get_semotime(sma);

- return seq_printf(s,
- "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n",
- sma->sem_perm.key,
- sma->sem_perm.id,
- sma->sem_perm.mode,
- sma->sem_nsems,
- from_kuid_munged(user_ns, sma->sem_perm.uid),
- from_kgid_munged(user_ns, sma->sem_perm.gid),
- from_kuid_munged(user_ns, sma->sem_perm.cuid),
- from_kgid_munged(user_ns, sma->sem_perm.cgid),
- sem_otime,
- sma->sem_ctime);
+ seq_printf(s,
+ "%10d %10d %4o %10u %5u %5u %5u %5u %10lu %10lu\n",
+ sma->sem_perm.key,
+ sma->sem_perm.id,
+ sma->sem_perm.mode,
+ sma->sem_nsems,
+ from_kuid_munged(user_ns, sma->sem_perm.uid),
+ from_kgid_munged(user_ns, sma->sem_perm.gid),
+ from_kuid_munged(user_ns, sma->sem_perm.cuid),
+ from_kgid_munged(user_ns, sma->sem_perm.cgid),
+ sem_otime,
+ sma->sem_ctime);
+
+ return 0;
}
#endif
diff --git a/ipc/shm.c b/ipc/shm.c
index 08b14f6..1f141c9 100644
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1331,25 +1331,27 @@ static int sysvipc_shm_proc_show(struct seq_file *s, void *it)
#define SIZE_SPEC "%21lu"
#endif

- return seq_printf(s,
- "%10d %10d %4o " SIZE_SPEC " %5u %5u "
- "%5lu %5u %5u %5u %5u %10lu %10lu %10lu "
- SIZE_SPEC " " SIZE_SPEC "\n",
- shp->shm_perm.key,
- shp->shm_perm.id,
- shp->shm_perm.mode,
- shp->shm_segsz,
- shp->shm_cprid,
- shp->shm_lprid,
- shp->shm_nattch,
- from_kuid_munged(user_ns, shp->shm_perm.uid),
- from_kgid_munged(user_ns, shp->shm_perm.gid),
- from_kuid_munged(user_ns, shp->shm_perm.cuid),
- from_kgid_munged(user_ns, shp->shm_perm.cgid),
- shp->shm_atim,
- shp->shm_dtim,
- shp->shm_ctim,
- rss * PAGE_SIZE,
- swp * PAGE_SIZE);
+ seq_printf(s,
+ "%10d %10d %4o " SIZE_SPEC " %5u %5u "
+ "%5lu %5u %5u %5u %5u %10lu %10lu %10lu "
+ SIZE_SPEC " " SIZE_SPEC "\n",
+ shp->shm_perm.key,
+ shp->shm_perm.id,
+ shp->shm_perm.mode,
+ shp->shm_segsz,
+ shp->shm_cprid,
+ shp->shm_lprid,
+ shp->shm_nattch,
+ from_kuid_munged(user_ns, shp->shm_perm.uid),
+ from_kgid_munged(user_ns, shp->shm_perm.gid),
+ from_kuid_munged(user_ns, shp->shm_perm.cuid),
+ from_kgid_munged(user_ns, shp->shm_perm.cgid),
+ shp->shm_atim,
+ shp->shm_dtim,
+ shp->shm_ctim,
+ rss * PAGE_SIZE,
+ swp * PAGE_SIZE);
+
+ return 0;
}
#endif
diff --git a/ipc/util.c b/ipc/util.c
index 7353425..cc10689 100644
--- a/ipc/util.c
+++ b/ipc/util.c
@@ -904,8 +904,10 @@ static int sysvipc_proc_show(struct seq_file *s, void *it)
struct ipc_proc_iter *iter = s->private;
struct ipc_proc_iface *iface = iter->iface;

- if (it == SEQ_START_TOKEN)
- return seq_puts(s, iface->header);
+ if (it == SEQ_START_TOKEN) {
+ seq_puts(s, iface->header);
+ return 0;
+ }

return iface->show(s, it);
}
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:05 PM2/5/17

to

From: Markus Elfring <elf...@users.sourceforge.net>

commit 5f0163a5ee9cc7c59751768bdfd94a73186debba upstream.

The put_device() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring <elf...@users.sourceforge.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[wt: backported only to ease next patch as suggested by Jiri]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/base/core.c | 3 +--

1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 2a19097..d92ecf3 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1138,8 +1138,7 @@ done:
kobject_del(&dev->kobj);
Error:
cleanup_device_parent(dev);
- if (parent)
- put_device(parent);
+ put_device(parent);
name_error:
kfree(dev->p);
dev->p = NULL;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:05 PM2/5/17

to

From: Eric Dumazet <edum...@google.com>

commit 1aa9d1a0e7eefcc61696e147d123453fc0016005 upstream.

dccp_v6_err() does not use pskb_may_pull() and might access garbage.

We only need 4 bytes at the beginning of the DCCP header, like TCP,
so the 8 bytes pulled in icmpv6_notify() are more than enough.

Signed-off-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/dccp/ipv6.c | 15 ++++++++-------
1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/net/dccp/ipv6.c b/net/dccp/ipv6.c
index 6cf9f77..fb72107 100644
--- a/net/dccp/ipv6.c
+++ b/net/dccp/ipv6.c
@@ -83,7 +83,7 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
u8 type, u8 code, int offset, __be32 info)
{
const struct ipv6hdr *hdr = (const struct ipv6hdr *)skb->data;
- const struct dccp_hdr *dh = (struct dccp_hdr *)(skb->data + offset);
+ const struct dccp_hdr *dh;
struct dccp_sock *dp;
struct ipv6_pinfo *np;
struct sock *sk;
@@ -91,12 +91,13 @@ static void dccp_v6_err(struct sk_buff *skb, struct inet6_skb_parm *opt,
__u64 seq;
struct net *net = dev_net(skb->dev);

- if (skb->len < offset + sizeof(*dh) ||
- skb->len < offset + __dccp_basic_hdr_len(dh)) {
- ICMP6_INC_STATS_BH(net, __in6_dev_get(skb->dev),
- ICMP6_MIB_INERRORS);
- return;
- }
+ /* Only need dccph_dport & dccph_sport which are the first
+ * 4 bytes in dccp header.
+ * Our caller (icmpv6_notify()) already pulled 8 bytes for us.
+ */
+ BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_sport) > 8);
+ BUILD_BUG_ON(offsetofend(struct dccp_hdr, dccph_dport) > 8);
+ dh = (struct dccp_hdr *)(skb->data + offset);

sk = inet6_lookup(net, &dccp_hashinfo,
&hdr->daddr, dh->dccph_dport,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:05 PM2/5/17

to

From: Bart Van Assche <bart.va...@sandisk.com>

commit 51093254bf879bc9ce96590400a87897c7498463 upstream.

Let the target core check task existence instead of the SRP target
driver. Additionally, let the target core check the validity of the
task management request instead of the ib_srpt driver.

This patch fixes the following kernel crash:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000001
IP: [<ffffffffa0565f37>] srpt_handle_new_iu+0x6d7/0x790 [ib_srpt]
Oops: 0002 [#1] SMP
Call Trace:
[<ffffffffa05660ce>] srpt_process_completion+0xde/0x570 [ib_srpt]
[<ffffffffa056669f>] srpt_compl_thread+0x13f/0x160 [ib_srpt]
[<ffffffff8109726f>] kthread+0xcf/0xe0
[<ffffffff81613cfc>] ret_from_fork+0x7c/0xb0

Signed-off-by: Bart Van Assche <bart.va...@sandisk.com>
Fixes: 3e4f574857ee ("ib_srpt: Convert TMR path to target_submit_tmr")
Tested-by: Alex Estrin <alex....@intel.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Cc: Nicholas Bellinger <n...@linux-iscsi.org>
Cc: Sagi Grimberg <sa...@mellanox.com>
Cc: stable <sta...@vger.kernel.org>
Signed-off-by: Doug Ledford <dled...@redhat.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/infiniband/ulp/srpt/ib_srpt.c | 59 +----------------------------------
1 file changed, 1 insertion(+), 58 deletions(-)

diff --git a/drivers/infiniband/ulp/srpt/ib_srpt.c b/drivers/infiniband/ulp/srpt/ib_srpt.c
index fcf9f87..1e79114 100644
--- a/drivers/infiniband/ulp/srpt/ib_srpt.c
+++ b/drivers/infiniband/ulp/srpt/ib_srpt.c
@@ -1754,47 +1754,6 @@ send_sense:
return -1;
}

-/**
- * srpt_rx_mgmt_fn_tag() - Process a task management function by tag.
- * @ch: RDMA channel of the task management request.
- * @fn: Task management function to perform.
- * @req_tag: Tag of the SRP task management request.
- * @mgmt_ioctx: I/O context of the task management request.
- *
- * Returns zero if the target core will process the task management
- * request asynchronously.
- *
- * Note: It is assumed that the initiator serializes tag-based task management
- * requests.
- */
-static int srpt_rx_mgmt_fn_tag(struct srpt_send_ioctx *ioctx, u64 tag)
-{
- struct srpt_device *sdev;
- struct srpt_rdma_ch *ch;
- struct srpt_send_ioctx *target;
- int ret, i;
-
- ret = -EINVAL;
- ch = ioctx->ch;
- BUG_ON(!ch);
- BUG_ON(!ch->sport);
- sdev = ch->sport->sdev;
- BUG_ON(!sdev);
- spin_lock_irq(&sdev->spinlock);
- for (i = 0; i < ch->rq_size; ++i) {
- target = ch->ioctx_ring[i];
- if (target->cmd.se_lun == ioctx->cmd.se_lun &&
- target->tag == tag &&
- srpt_get_cmd_state(target) != SRPT_STATE_DONE) {
- ret = 0;
- /* now let the target core abort &target->cmd; */
- break;
- }
- }
- spin_unlock_irq(&sdev->spinlock);
- return ret;
-}
-
static int srp_tmr_to_tcm(int fn)
{
switch (fn) {
@@ -1829,7 +1788,6 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
struct se_cmd *cmd;
struct se_session *sess = ch->sess;
uint64_t unpacked_lun;
- uint32_t tag = 0;
int tcm_tmr;
int rc;

@@ -1845,25 +1803,10 @@ static void srpt_handle_tsk_mgmt(struct srpt_rdma_ch *ch,
srpt_set_cmd_state(send_ioctx, SRPT_STATE_MGMT);
send_ioctx->tag = srp_tsk->tag;
tcm_tmr = srp_tmr_to_tcm(srp_tsk->tsk_mgmt_func);
- if (tcm_tmr < 0) {
- send_ioctx->cmd.se_tmr_req->response =
- TMR_TASK_MGMT_FUNCTION_NOT_SUPPORTED;
- goto fail;
- }
unpacked_lun = srpt_unpack_lun((uint8_t *)&srp_tsk->lun,
sizeof(srp_tsk->lun));
-
- if (srp_tsk->tsk_mgmt_func == SRP_TSK_ABORT_TASK) {
- rc = srpt_rx_mgmt_fn_tag(send_ioctx, srp_tsk->task_tag);
- if (rc < 0) {
- send_ioctx->cmd.se_tmr_req->response =
- TMR_TASK_DOES_NOT_EXIST;
- goto fail;
- }
- tag = srp_tsk->task_tag;
- }
rc = target_submit_tmr(&send_ioctx->cmd, sess, NULL, unpacked_lun,
- srp_tsk, tcm_tmr, GFP_KERNEL, tag,
+ srp_tsk, tcm_tmr, GFP_KERNEL, srp_tsk->task_tag,
TARGET_SCF_ACK_KREF);
if (rc != 0) {
send_ioctx->cmd.se_tmr_req->response = TMR_FUNCTION_REJECTED;
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:05 PM2/5/17

to

From: Bart Van Assche <bart.va...@sandisk.com>

commit 3b785fbcf81c3533772c52b717f77293099498d3 upstream.

This avoids that new requests are queued while __dm_destroy() is in
progress.

[js] use md->queue instead of non-present helper

Signed-off-by: Bart Van Assche <bart.va...@sandisk.com>

Signed-off-by: Mike Snitzer <sni...@redhat.com>
Signed-off-by: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/md/dm.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/drivers/md/dm.c b/drivers/md/dm.c
index f69fed8..a77ef6c 100644
--- a/drivers/md/dm.c
+++ b/drivers/md/dm.c
@@ -2323,6 +2323,7 @@ EXPORT_SYMBOL_GPL(dm_device_name);

static void __dm_destroy(struct mapped_device *md, bool wait)
{
+ struct request_queue *q = md->queue;
struct dm_table *map;

might_sleep();
@@ -2333,6 +2334,10 @@ static void __dm_destroy(struct mapped_device *md, bool wait)
set_bit(DMF_FREEING, &md->flags);
spin_unlock(&_minor_lock);

+ spin_lock_irq(q->queue_lock);
+ queue_flag_set(QUEUE_FLAG_DYING, q);
+ spin_unlock_irq(q->queue_lock);
+
/*
* Take suspend_lock so that presuspend and postsuspend methods
* do not race with internal suspend.
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:05 PM2/5/17

to

From: Vegard Nossum <vegard...@oracle.com>

commit 6b760bb2c63a9e322c0e4a0b5daf335ad93d5a33 upstream.

I got this:

divide error: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 1327 Comm: a.out Not tainted 4.8.0-rc2+ #189
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org 04/01/2014
task: ffff8801120a9580 task.stack: ffff8801120b0000
RIP: 0010:[<ffffffff82c8bd9a>] [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
RSP: 0018:ffff88011aa87da8 EFLAGS: 00010006
RAX: 0000000000004f76 RBX: ffff880112655e88 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff880112655ea0 RDI: 0000000000000001
RBP: ffff88011aa87e00 R08: ffff88013fff905c R09: ffff88013fff9048
R10: ffff88013fff9050 R11: 00000001050a7b8c R12: ffff880114778a00
R13: ffff880114778ab4 R14: ffff880114778b30 R15: 0000000000000000
FS: 00007f071647c700(0000) GS:ffff88011aa80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000603001 CR3: 0000000112021000 CR4: 00000000000006e0
Stack:
0000000000000000 ffff880114778ab8 ffff880112655ea0 0000000000004f76
ffff880112655ec8 ffff880112655e80 ffff880112655e88 ffff88011aa98fc0
00000000b97ccf2b dffffc0000000000 ffff88011aa98fc0 ffff88011aa87ef0
Call Trace:
<IRQ>
[<ffffffff813abce7>] __hrtimer_run_queues+0x347/0xa00
[<ffffffff82c8bbc0>] ? snd_hrtimer_close+0x130/0x130
[<ffffffff813ab9a0>] ? retrigger_next_event+0x1b0/0x1b0
[<ffffffff813ae1a6>] ? hrtimer_interrupt+0x136/0x4b0
[<ffffffff813ae220>] hrtimer_interrupt+0x1b0/0x4b0
[<ffffffff8120f91e>] local_apic_timer_interrupt+0x6e/0xf0
[<ffffffff81227ad3>] ? kvm_guest_apic_eoi_write+0x13/0xc0
[<ffffffff83c35086>] smp_apic_timer_interrupt+0x76/0xa0
[<ffffffff83c3416c>] apic_timer_interrupt+0x8c/0xa0
<EOI>
[<ffffffff83c3239c>] ? _raw_spin_unlock_irqrestore+0x2c/0x60
[<ffffffff82c8185d>] snd_timer_start1+0xdd/0x670
[<ffffffff82c87015>] snd_timer_continue+0x45/0x80
[<ffffffff82c88100>] snd_timer_user_ioctl+0x1030/0x2830
[<ffffffff8159f3a0>] ? __follow_pte.isra.49+0x430/0x430
[<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
[<ffffffff815a26fa>] ? do_wp_page+0x3aa/0x1c90
[<ffffffff815aa4f8>] ? handle_mm_fault+0xbc8/0x27f0
[<ffffffff815a9930>] ? __pmd_alloc+0x370/0x370
[<ffffffff82c870d0>] ? snd_timer_pause+0x80/0x80
[<ffffffff816b0733>] do_vfs_ioctl+0x193/0x1050
[<ffffffff816b05a0>] ? ioctl_preallocate+0x200/0x200
[<ffffffff81002f2f>] ? syscall_trace_enter+0x3cf/0xdb0
[<ffffffff815045ba>] ? __context_tracking_exit.part.4+0x9a/0x1e0
[<ffffffff81002b60>] ? exit_to_usermode_loop+0x190/0x190
[<ffffffff82001a97>] ? check_preemption_disabled+0x37/0x1e0
[<ffffffff81d93889>] ? security_file_ioctl+0x89/0xb0
[<ffffffff816b167f>] SyS_ioctl+0x8f/0xc0
[<ffffffff816b15f0>] ? do_vfs_ioctl+0x1050/0x1050
[<ffffffff81005524>] do_syscall_64+0x1c4/0x4e0
[<ffffffff83c32b2a>] entry_SYSCALL64_slow_path+0x25/0x25
Code: e8 fc 42 7b fe 8b 0d 06 8a 50 03 49 0f af cf 48 85 c9 0f 88 7c 01 00 00 48 89 4d a8 e8 e0 42 7b fe 48 8b 45 c0 48 8b 4d a8 48 99 <48> f7 f9 49 01 c7 e8 cb 42 7b fe 48 8b 55 d0 48 b8 00 00 00 00
RIP [<ffffffff82c8bd9a>] snd_hrtimer_callback+0x1da/0x3f0
RSP <ffff88011aa87da8>
---[ end trace 6aa380f756a21074 ]---

The problem happens when you call ioctl(SNDRV_TIMER_IOCTL_CONTINUE) on a
completely new/unused timer -- it will have ->sticks == 0, which causes a
divide by 0 in snd_hrtimer_callback().

Signed-off-by: Vegard Nossum <vegard...@oracle.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

sound/core/timer.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/sound/core/timer.c b/sound/core/timer.c
index f5ddc9b..f297eac 100644
--- a/sound/core/timer.c
+++ b/sound/core/timer.c
@@ -817,6 +817,7 @@ int snd_timer_new(struct snd_card *card, char *id, struct snd_timer_id *tid,
timer->tmr_subdevice = tid->subdevice;
if (id)
strlcpy(timer->id, id, sizeof(timer->id));
+ timer->sticks = 1;
INIT_LIST_HEAD(&timer->device_list);
INIT_LIST_HEAD(&timer->open_list_head);
INIT_LIST_HEAD(&timer->active_list_head);
--
2.8.0.rc2.1.gbe9624a

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Sergei Miroshnichenko <serg...@emcraft.com>

commit 9abefcb1aaa58b9d5aa40a8bb12c87d02415e4c8 upstream.

A timer was used to restart after the bus-off state, leading to a
relatively large can_restart() executed in an interrupt context,
which in turn sets up pinctrl. When this happens during system boot,
there is a high probability of grabbing the pinctrl_list_mutex,
which is locked already by the probe() of other device, making the
kernel suspect a deadlock condition [1].

To resolve this issue, the restart_timer is replaced by a delayed
work.

[1] https://github.com/victronenergy/venus/issues/24

Signed-off-by: Sergei Miroshnichenko <serg...@emcraft.com>
Signed-off-by: Marc Kleine-Budde <m...@pengutronix.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/can/dev.c | 27 +++++++++++++++++----------
include/linux/can/dev.h | 3 ++-
2 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/net/can/dev.c b/drivers/net/can/dev.c
index 464e5f6..284d751 100644
--- a/drivers/net/can/dev.c
+++ b/drivers/net/can/dev.c
@@ -22,6 +22,7 @@
#include <linux/slab.h>
#include <linux/netdevice.h>
#include <linux/if_arp.h>
+#include <linux/workqueue.h>
#include <linux/can.h>
#include <linux/can/dev.h>
#include <linux/can/skb.h>
@@ -394,9 +395,8 @@ EXPORT_SYMBOL_GPL(can_free_echo_skb);
/*
* CAN device restart for bus-off recovery
*/
-static void can_restart(unsigned long data)
+static void can_restart(struct net_device *dev)
{
- struct net_device *dev = (struct net_device *)data;
struct can_priv *priv = netdev_priv(dev);
struct net_device_stats *stats = &dev->stats;
struct sk_buff *skb;
@@ -436,6 +436,14 @@ restart:
netdev_err(dev, "Error %d during restart", err);
}

+static void can_restart_work(struct work_struct *work)
+{
+ struct delayed_work *dwork = to_delayed_work(work);
+ struct can_priv *priv = container_of(dwork, struct can_priv, restart_work);
+
+ can_restart(priv->dev);
+}
+
int can_restart_now(struct net_device *dev)
{
struct can_priv *priv = netdev_priv(dev);
@@ -449,8 +457,8 @@ int can_restart_now(struct net_device *dev)
if (priv->state != CAN_STATE_BUS_OFF)
return -EBUSY;

- /* Runs as soon as possible in the timer context */
- mod_timer(&priv->restart_timer, jiffies);
+ cancel_delayed_work_sync(&priv->restart_work);
+ can_restart(dev);

return 0;
}
@@ -472,8 +480,8 @@ void can_bus_off(struct net_device *dev)
priv->can_stats.bus_off++;

if (priv->restart_ms)
- mod_timer(&priv->restart_timer,
- jiffies + (priv->restart_ms * HZ) / 1000);
+ schedule_delayed_work(&priv->restart_work,
+ msecs_to_jiffies(priv->restart_ms));
}
EXPORT_SYMBOL_GPL(can_bus_off);

@@ -556,6 +564,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)
return NULL;

priv = netdev_priv(dev);
+ priv->dev = dev;

if (echo_skb_max) {
priv->echo_skb_max = echo_skb_max;
@@ -565,7 +574,7 @@ struct net_device *alloc_candev(int sizeof_priv, unsigned int echo_skb_max)

priv->state = CAN_STATE_STOPPED;

- init_timer(&priv->restart_timer);
+ INIT_DELAYED_WORK(&priv->restart_work, can_restart_work);

return dev;
}
@@ -599,8 +608,6 @@ int open_candev(struct net_device *dev)
if (!netif_carrier_ok(dev))
netif_carrier_on(dev);

- setup_timer(&priv->restart_timer, can_restart, (unsigned long)dev);
-
return 0;
}
EXPORT_SYMBOL_GPL(open_candev);
@@ -615,7 +622,7 @@ void close_candev(struct net_device *dev)
{
struct can_priv *priv = netdev_priv(dev);

- del_timer_sync(&priv->restart_timer);
+ cancel_delayed_work_sync(&priv->restart_work);
can_flush_echo_skb(dev);
}
EXPORT_SYMBOL_GPL(close_candev);
diff --git a/include/linux/can/dev.h b/include/linux/can/dev.h
index fb0ab65..fb9fbe2 100644
--- a/include/linux/can/dev.h
+++ b/include/linux/can/dev.h
@@ -31,6 +31,7 @@ enum can_mode {
* CAN common private data
*/
struct can_priv {
+ struct net_device *dev;
struct can_device_stats can_stats;

struct can_bittiming bittiming;
@@ -42,7 +43,7 @@ struct can_priv {
u32 ctrlmode_supported;

int restart_ms;
- struct timer_list restart_timer;
+ struct delayed_work restart_work;

int (*do_set_bittiming)(struct net_device *dev);
int (*do_set_mode)(struct net_device *dev, enum can_mode mode);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Karl Beldan <kbe...@baylibre.com>

commit f6d7c1b5598b6407c3f1da795dd54acf99c1990c upstream.

This fixes subpage writes when using 4-bit HW ECC.

There has been numerous reports about ECC errors with devices using this
driver for a while. Also the 4-bit ECC has been reported as broken with
subpages in [1] and with 16 bits NANDs in the driver and in mach* board
files both in mainline and in the vendor BSPs.

What I saw with 4-bit ECC on a 16bits NAND (on an LCDK) which got me to
try reinitializing the ECC engine:
- R/W on whole pages properly generates/checks RS code
- try writing the 1st subpage only of a blank page, the subpage is well
written and the RS code properly generated, re-reading the same page
the HW detects some ECC error, reading the same page again no ECC
error is detected

Note that the ECC engine is already reinitialized in the 1-bit case.

Tested on my LCDK with UBI+UBIFS using subpages.
This could potentially get rid of the issue workarounded in [1].

[1] 28c015a9daab ("mtd: davinci-nand: disable subpage write for keystone-nand")

Fixes: 6a4123e581b3 ("mtd: nand: davinci_nand, 4-bit ECC for smallpage")
Signed-off-by: Karl Beldan <kbe...@baylibre.com>
Acked-by: Boris Brezillon <boris.b...@free-electrons.com>
Signed-off-by: Brian Norris <computer...@gmail.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/mtd/nand/davinci_nand.c | 3 +++

1 file changed, 3 insertions(+)

diff --git a/drivers/mtd/nand/davinci_nand.c b/drivers/mtd/nand/davinci_nand.c
index c3e15a5..e4f16cf 100644
--- a/drivers/mtd/nand/davinci_nand.c
+++ b/drivers/mtd/nand/davinci_nand.c
@@ -241,6 +241,9 @@ static void nand_davinci_hwctl_4bit(struct mtd_info *mtd, int mode)
unsigned long flags;
u32 val;

+ /* Reset ECC hardware */
+ davinci_nand_readl(info, NAND_4BIT_ECC1_OFFSET);
+
spin_lock_irqsave(&davinci_nand_lock, flags);

/* Start 4-bit ECC calculation for read/write */
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Dan Carpenter <dan.ca...@oracle.com>

commit 2d6a4d64812bb12dda53704943b61a7496d02098 upstream.

The curly braces are missing here so we print stuff unintentionally.

Fixes: 9da4714a2d44 ('slub: slabinfo update for cmpxchg handling')
Link: http://lkml.kernel.org/r/20160715211243.GE19522@mwanda
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Acked-by: Christoph Lameter <c...@linux.com>
Cc: Sergey Senozhatsky <sergey.se...@gmail.com>
Cc: Colin Ian King <colin...@canonical.com>
Cc: Laura Abbott <lab...@fedoraproject.org>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

tools/vm/slabinfo.c | 3 ++-

1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/vm/slabinfo.c b/tools/vm/slabinfo.c
index 808d5a9..bcc6125 100644
--- a/tools/vm/slabinfo.c
+++ b/tools/vm/slabinfo.c
@@ -493,10 +493,11 @@ static void slab_stats(struct slabinfo *s)
s->alloc_node_mismatch, (s->alloc_node_mismatch * 100) / total);
}

- if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail)
+ if (s->cmpxchg_double_fail || s->cmpxchg_double_cpu_fail) {
printf("\nCmpxchg_double Looping\n------------------------\n");
printf("Locked Cmpxchg Double redos %lu\nUnlocked Cmpxchg Double redos %lu\n",
s->cmpxchg_double_fail, s->cmpxchg_double_cpu_fail);
+ }
}

static void report(struct slabinfo *s)
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Felipe Balbi <felipe...@linux.intel.com>

commit 6c83f77278f17a7679001027e9231291c20f0d8a upstream.

If we don't guarantee that we will always get an
interrupt at least when we're queueing our very last
request, we could fall into situation where we queue
every request with 'no_interrupt' set. This will
cause the link to get stuck.

The behavior above has been triggered with g_ether
and dwc3.

Reported-by: Ville Syrjälä <ville....@linux.intel.com>
Signed-off-by: Felipe Balbi <felipe...@linux.intel.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/gadget/u_ether.c | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/gadget/u_ether.c b/drivers/usb/gadget/u_ether.c
index 4b76124..aad066d 100644
--- a/drivers/usb/gadget/u_ether.c
+++ b/drivers/usb/gadget/u_ether.c
@@ -586,8 +586,9 @@ static netdev_tx_t eth_start_xmit(struct sk_buff *skb,

/* throttle high/super speed IRQ rate back slightly */
if (gadget_is_dualspeed(dev->gadget))
- req->no_interrupt = (dev->gadget->speed == USB_SPEED_HIGH ||
- dev->gadget->speed == USB_SPEED_SUPER)
+ req->no_interrupt = (((dev->gadget->speed == USB_SPEED_HIGH ||
+ dev->gadget->speed == USB_SPEED_SUPER)) &&
+ !list_empty(&dev->tx_reqs))
? ((atomic_read(&dev->tx_qlen) % qmult) != 0)
: 0;

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Joe Perches <j...@perches.com>

commit 8c7fbe5795a016259445a61e072eb0118aaf6a61 upstream.

Commit 3876488444e7 ("include/stddef.h: Move offsetofend() from vfio.h
to a generic kernel header") added offsetofend outside the normal
include #ifndef/#endif guard. Move it inside.

Miscellanea:

o remove unnecessary blank line
o standardize offsetof macros whitespace style

Signed-off-by: Joe Perches <j...@perches.com>
Cc: Denys Vlasenko <dvla...@redhat.com>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

[wt: backported only for ipv6 out-of-bounds fix]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/stddef.h | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/stddef.h b/include/linux/stddef.h
index 076af43..9c61c7c 100644
--- a/include/linux/stddef.h
+++ b/include/linux/stddef.h
@@ -3,7 +3,6 @@

#include <uapi/linux/stddef.h>

-
#undef NULL
#define NULL ((void *)0)

@@ -14,10 +13,9 @@ enum {

#undef offsetof
#ifdef __compiler_offsetof
-#define offsetof(TYPE,MEMBER) __compiler_offsetof(TYPE,MEMBER)
+#define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
#else
-#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
-#endif
+#define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
#endif

/**
@@ -28,3 +26,5 @@ enum {
*/
#define offsetofend(TYPE, MEMBER) \
(offsetof(TYPE, MEMBER) + sizeof(((TYPE *)0)->MEMBER))
+
+#endif
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:06 PM2/5/17

to

From: Ching Huang <chin...@areca.com.tw>

commit 2bf7dc8443e113844d078fd6541b7f4aa544f92f upstream.

The arcmsr driver failed to pass SYNCHRONIZE CACHE to controller
firmware. Depending on how drive caches are handled internally by
controller firmware this could potentially lead to data integrity
problems.

Ensure that cache flushes are passed to the controller.

[mkp: applied by hand and removed unused vars]

Signed-off-by: Ching Huang <chin...@areca.com.tw>
Reported-by: Tomas Henzl <the...@redhat.com>

Signed-off-by: Martin K. Petersen <martin....@oracle.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/scsi/arcmsr/arcmsr_hba.c | 9 ---------
1 file changed, 9 deletions(-)

diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 66dda86..8d9477c 100644
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -2069,18 +2069,9 @@ static int arcmsr_queue_command_lck(struct scsi_cmnd *cmd,
struct AdapterControlBlock *acb = (struct AdapterControlBlock *) host->hostdata;
struct CommandControlBlock *ccb;
int target = cmd->device->id;
- int lun = cmd->device->lun;
- uint8_t scsicmd = cmd->cmnd[0];
cmd->scsi_done = done;
cmd->host_scribble = NULL;
cmd->result = 0;
- if ((scsicmd == SYNCHRONIZE_CACHE) ||(scsicmd == SEND_DIAGNOSTIC)){
- if(acb->devstate[target][lun] == ARECA_RAID_GONE) {
- cmd->result = (DID_NO_CONNECT << 16);
- }
- cmd->scsi_done(cmd);
- return 0;
- }
if (target == 16) {
/* virtual device for iop message transfer */
arcmsr_handle_virtual_command(acb, cmd);
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:07 PM2/5/17

to

From: Andrew Bresticker <abre...@chromium.org>

commit d771fdf94180de2bd811ac90cba75f0f346abf8d upstream.

The ramoops buffer may be mapped as either I/O memory or uncached
memory. On ARM64, this results in a device-type (strongly-ordered)
mapping. Since unnaligned accesses to device-type memory will
generate an alignment fault (regardless of whether or not strict
alignment checking is enabled), it is not safe to use memcpy().
memcpy_fromio() is guaranteed to only use aligned accesses, so use
that instead.

Signed-off-by: Andrew Bresticker <abre...@chromium.org>
Signed-off-by: Enric Balletbo Serra <enric.b...@collabora.com>
Reviewed-by: Puneet Kumar <punee...@chromium.org>
Signed-off-by: Kees Cook <kees...@chromium.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/pstore/ram_core.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/pstore/ram_core.c b/fs/pstore/ram_core.c
index eb42483..7df456d 100644
--- a/fs/pstore/ram_core.c
+++ b/fs/pstore/ram_core.c
@@ -286,8 +286,8 @@ void persistent_ram_save_old(struct persistent_ram_zone *prz)
}

prz->old_log_size = size;
- memcpy(prz->old_log, &buffer->data[start], size - start);
- memcpy(prz->old_log + size - start, &buffer->data[0], start);
+ memcpy_fromio(prz->old_log, &buffer->data[start], size - start);
+ memcpy_fromio(prz->old_log + size - start, &buffer->data[0], start);
}

int notrace persistent_ram_write(struct persistent_ram_zone *prz,
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:07 PM2/5/17

to

From: Kashyap Desai <kashya...@broadcom.com>

commit 1e793f6fc0db920400574211c48f9157a37e3945 upstream.

Commit 02b01e010afe ("megaraid_sas: return sync cache call with
success") modified the driver to successfully complete SYNCHRONIZE_CACHE
commands without passing them to the controller. Disk drive caches are
only explicitly managed by controller firmware when operating in RAID
mode. So this commit effectively disabled writeback cache flushing for
any drives used in JBOD mode, leading to data integrity failures.

[mkp: clarified patch description]

Fixes: 02b01e010afeeb49328d35650d70721d2ca3fd59
Signed-off-by: Kashyap Desai <kashya...@broadcom.com>
Signed-off-by: Sumit Saxena <sumit....@broadcom.com>
Reviewed-by: Tomas Henzl <the...@redhat.com>
Reviewed-by: Hannes Reinecke <ha...@suse.com>
Reviewed-by: Ewan D. Milne <emi...@redhat.com>

Signed-off-by: Martin K. Petersen <martin....@oracle.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/scsi/megaraid/megaraid_sas_base.c | 13 +++++--------
1 file changed, 5 insertions(+), 8 deletions(-)

diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 6ced6a3..0626a16 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -1487,16 +1487,13 @@ megasas_queue_command_lck(struct scsi_cmnd *scmd, void (*done) (struct scsi_cmnd
goto out_done;
}

- switch (scmd->cmnd[0]) {
- case SYNCHRONIZE_CACHE:
- /*
- * FW takes care of flush cache on its own
- * No need to send it down
- */
+ /*
+ * FW takes care of flush cache on its own for Virtual Disk.
+ * No need to send it down for VD. For JBOD send SYNCHRONIZE_CACHE to FW.
+ */
+ if ((scmd->cmnd[0] == SYNCHRONIZE_CACHE) && MEGASAS_IS_LOGICAL(scmd)) {
scmd->result = DID_OK << 16;
goto out_done;
- default:
- break;
}

if (instance->instancet->build_and_issue_cmd(instance, scmd)) {
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:07 PM2/5/17

to

From: Wei Yongjun <weiyo...@huawei.com>

commit 751eb6b6042a596b0080967c1a529a9fe98dac1d upstream.

In general, when DAD detected IPv6 duplicate address, ifp->state
will be set to INET6_IFADDR_STATE_ERRDAD and DAD is stopped by a
delayed work, the call tree should be like this:

ndisc_recv_ns
-> addrconf_dad_failure <- missing ifp put
-> addrconf_mod_dad_work
-> schedule addrconf_dad_work()
-> addrconf_dad_stop() <- missing ifp hold before call it

addrconf_dad_failure() called with ifp refcont holding but not put.
addrconf_dad_work() call addrconf_dad_stop() without extra holding
refcount. This will not cause any issue normally.

But the race between addrconf_dad_failure() and addrconf_dad_work()
may cause ifp refcount leak and netdevice can not be unregister,
dmesg show the following messages:

IPv6: eth0: IPv6 duplicate address fe80::XX:XXXX:XXXX:XX detected!
...
unregister_netdevice: waiting for eth0 to become free. Usage count = 1

Cc: sta...@vger.kernel.org
Fixes: c15b1ccadb32 ("ipv6: move DAD and addrconf_verify processing
to workqueue")
Signed-off-by: Wei Yongjun <weiyo...@huawei.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Cc: <sta...@vger.kernel.org>
Signed-off-by: Mike Manning <mman...@brocade.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv6/addrconf.c | 2 ++

1 file changed, 2 insertions(+)

diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
index 98dd353..0f18d858 100644
--- a/net/ipv6/addrconf.c
+++ b/net/ipv6/addrconf.c
@@ -1694,6 +1694,7 @@ void addrconf_dad_failure(struct inet6_ifaddr *ifp)
spin_unlock_bh(&ifp->state_lock);

addrconf_mod_dad_work(ifp, 0);
+ in6_ifa_put(ifp);
}

/* Join to solicited addr multicast group. */
@@ -3337,6 +3338,7 @@ static void addrconf_dad_work(struct work_struct *w)
addrconf_dad_begin(ifp);
goto out;
} else if (action == DAD_ABORT) {
+ in6_ifa_hold(ifp);
addrconf_dad_stop(ifp, 1);
goto out;
}
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:07 PM2/5/17

to

From: Dave Chinner <dchi...@redhat.com>

commit f3d7ebdeb2c297bd26272384e955033493ca291c upstream.

From inspection, the superblock sb_inprogress check is done in the
verifier and triggered only for the primary superblock via a
"bp->b_bn == XFS_SB_DADDR" check.

Unfortunately, the primary superblock is an uncached buffer, and
hence it is configured by xfs_buf_read_uncached() with:

bp->b_bn = XFS_BUF_DADDR_NULL; /* always null for uncached buffers */

And so this check never triggers. Fix it.

Signed-off-by: Dave Chinner <dchi...@redhat.com>
Reviewed-by: Brian Foster <bfo...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: Dave Chinner <da...@fromorbit.com>
[wt: s/xfs_sb.c/xfs_mount.c in 3.10]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/xfs/xfs_mount.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index e8e310c..363c4cc 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -689,7 +689,8 @@ xfs_sb_verify(
* Only check the in progress field for the primary superblock as
* mkfs.xfs doesn't clear it from secondary superblocks.
*/
- return xfs_mount_validate_sb(mp, &sb, bp->b_bn == XFS_SB_DADDR,
+ return xfs_mount_validate_sb(mp, &sb,
+ bp->b_maps[0].bm_bn == XFS_SB_DADDR,
check_version);
}

--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:07 PM2/5/17

to

From: Manfred Spraul <man...@colorfullife.com>

commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
- a non-simple operations is ongoing (sma->sem_perm.lock held)
- a complex operation is sleeping (sma->complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions. See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma->complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
find Thread A, because Thread A does not own sem->lock yet.
- the thread does the operation, increases complex_count,
drops sem_lock, sleeps

Thread A:
- spin_lock(&sem->lock), spin_is_locked(sma->sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

[js] use set_mb instead of smp_store_mb

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-gi...@colorfullife.com
Reported-by: <fel...@informatik.uni-bremen.de>

Cc: "H. Peter Anvin" <h...@zytor.com>

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Davidlohr Bueso <da...@stgolabs.net>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Ingo Molnar <mi...@elte.hu>
Cc: <1vi...@web.de>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/sem.h | 1 +
ipc/sem.c | 129 ++++++++++++++++++++++++++++++----------------------
2 files changed, 76 insertions(+), 54 deletions(-)

diff --git a/include/linux/sem.h b/include/linux/sem.h
index 976ce3a..d0efd6e 100644
--- a/include/linux/sem.h
+++ b/include/linux/sem.h
@@ -21,6 +21,7 @@ struct sem_array {
struct list_head list_id; /* undo requests on this array */
int sem_nsems; /* no. of semaphores in array */
int complex_count; /* pending complex operations */
+ bool complex_mode; /* no parallel simple ops */
};

#ifdef CONFIG_SYSVIPC
diff --git a/ipc/sem.c b/ipc/sem.c
index 57242be..94c6ec5 100644
--- a/ipc/sem.c
+++ b/ipc/sem.c
@@ -155,14 +155,21 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it);

/*
* Locking:
+ * a) global sem_lock() for read/write
* sem_undo.id_next,
* sem_array.complex_count,
- * sem_array.pending{_alter,_cont},
- * sem_array.sem_undo: global sem_lock() for read/write
- * sem_undo.proc_next: only "current" is allowed to read/write that field.
+ * sem_array.complex_mode
+ * sem_array.pending{_alter,_const},
+ * sem_array.sem_undo
*
+ * b) global or semaphore sem_lock() for read/write:
* sem_array.sem_base[i].pending_{const,alter}:
- * global or semaphore sem_lock() for read/write
+ * sem_array.complex_mode (for read)
+ *
+ * c) special:
+ * sem_undo_list.list_proc:
+ * * undo_list->lock for write
+ * * rcu for read
*/

#define sc_semmsl sem_ctls[0]
@@ -263,24 +270,25 @@ static void sem_rcu_free(struct rcu_head *head)
#define ipc_smp_acquire__after_spin_is_unlocked() smp_rmb()

/*
- * Wait until all currently ongoing simple ops have completed.
+ * Enter the mode suitable for non-simple operations:
* Caller must own sem_perm.lock.
- * New simple ops cannot start, because simple ops first check
- * that sem_perm.lock is free.
- * that a) sem_perm.lock is free and b) complex_count is 0.
*/
-static void sem_wait_array(struct sem_array *sma)
+static void complexmode_enter(struct sem_array *sma)
{
int i;
struct sem *sem;

- if (sma->complex_count) {
- /* The thread that increased sma->complex_count waited on
- * all sem->lock locks. Thus we don't need to wait again.
- */
+ if (sma->complex_mode) {
+ /* We are already in complex_mode. Nothing to do */
return;
}

+ /* We need a full barrier after seting complex_mode:
+ * The write to complex_mode must be visible
+ * before we read the first sem->lock spinlock state.
+ */
+ set_mb(sma->complex_mode, true);
+
for (i = 0; i < sma->sem_nsems; i++) {
sem = sma->sem_base + i;
spin_unlock_wait(&sem->lock);
@@ -289,6 +297,28 @@ static void sem_wait_array(struct sem_array *sma)
}

/*
+ * Try to leave the mode that disallows simple operations:
+ * Caller must own sem_perm.lock.
+ */
+static void complexmode_tryleave(struct sem_array *sma)
+{
+ if (sma->complex_count) {
+ /* Complex ops are sleeping.
+ * We must stay in complex mode
+ */
+ return;
+ }
+ /*
+ * Immediately after setting complex_mode to false,
+ * a simple op can start. Thus: all memory writes
+ * performed by the current operation must be visible
+ * before we set complex_mode to false.
+ */
+ smp_store_release(&sma->complex_mode, false);
+}
+
+#define SEM_GLOBAL_LOCK (-1)
+/*
* If the request contains only one semaphore operation, and there are
* no complex transactions pending, lock only the semaphore involved.
* Otherwise, lock the entire semaphore array, since we either have
@@ -304,55 +334,42 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Complex operation - acquire a full lock */
ipc_lock_object(&sma->sem_perm);

- /* And wait until all simple ops that are processed
- * right now have dropped their locks.
- */
- sem_wait_array(sma);
- return -1;
+ /* Prevent parallel simple ops */
+ complexmode_enter(sma);
+ return SEM_GLOBAL_LOCK;
}

/*
* Only one semaphore affected - try to optimize locking.
- * The rules are:
- * - optimized locking is possible if no complex operation
- * is either enqueued or processed right now.
- * - The test for enqueued complex ops is simple:
- * sma->complex_count != 0
- * - Testing for complex ops that are processed right now is
- * a bit more difficult. Complex ops acquire the full lock
- * and first wait that the running simple ops have completed.
- * (see above)
- * Thus: If we own a simple lock and the global lock is free
- * and complex_count is now 0, then it will stay 0 and
- * thus just locking sem->lock is sufficient.
+ * Optimized locking is possible if no complex operation
+ * is either enqueued or processed right now.
+ *
+ * Both facts are tracked by complex_mode.
*/
sem = sma->sem_base + sops->sem_num;

- if (sma->complex_count == 0) {
+ /*
+ * Initial check for complex_mode. Just an optimization,
+ * no locking, no memory barrier.
+ */
+ if (!sma->complex_mode) {
/*
* It appears that no complex operation is around.
* Acquire the per-semaphore lock.
*/
spin_lock(&sem->lock);

- /* Then check that the global lock is free */
- if (!spin_is_locked(&sma->sem_perm.lock)) {
- /*
- * We need a memory barrier with acquire semantics,
- * otherwise we can race with another thread that does:
- * complex_count++;
- * spin_unlock(sem_perm.lock);
- */
- ipc_smp_acquire__after_spin_is_unlocked();
+ /*
+ * See 51d7d5205d33
+ * ("powerpc: Add smp_mb() to arch_spin_is_locked()"):
+ * A full barrier is required: the write of sem->lock
+ * must be visible before the read is executed
+ */
+ smp_mb();

- /* Now repeat the test of complex_count:
- * It can't change anymore until we drop sem->lock.
- * Thus: if is now 0, then it will stay 0.
- */
- if (sma->complex_count == 0) {
- /* fast path successful! */
- return sops->sem_num;
- }
+ if (!smp_load_acquire(&sma->complex_mode)) {
+ /* fast path successful! */
+ return sops->sem_num;
}
spin_unlock(&sem->lock);
}
@@ -372,15 +389,16 @@ static inline int sem_lock(struct sem_array *sma, struct sembuf *sops,
/* Not a false alarm, thus complete the sequence for a
* full lock.
*/
- sem_wait_array(sma);
- return -1;
+ complexmode_enter(sma);
+ return SEM_GLOBAL_LOCK;
}
}

static inline void sem_unlock(struct sem_array *sma, int locknum)
{
- if (locknum == -1) {
+ if (locknum == SEM_GLOBAL_LOCK) {
unmerge_queues(sma);
+ complexmode_tryleave(sma);
ipc_unlock_object(&sma->sem_perm);
} else {
struct sem *sem = sma->sem_base + locknum;
@@ -540,6 +558,7 @@ static int newary(struct ipc_namespace *ns, struct ipc_params *params)
}

sma->complex_count = 0;
+ sma->complex_mode = true; /* dropped by sem_unlock below */
INIT_LIST_HEAD(&sma->pending_alter);
INIT_LIST_HEAD(&sma->pending_const);
INIT_LIST_HEAD(&sma->list_id);
@@ -2165,10 +2184,10 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
/*
* The proc interface isn't aware of sem_lock(), it calls
* ipc_lock_object() directly (in sysvipc_find_ipc).
- * In order to stay compatible with sem_lock(), we must wait until
- * all simple semop() calls have left their critical regions.
+ * In order to stay compatible with sem_lock(), we must
+ * enter / leave complex_mode.
*/
- sem_wait_array(sma);
+ complexmode_enter(sma);

sem_otime = get_semotime(sma);

@@ -2185,6 +2204,8 @@ static int sysvipc_sem_proc_show(struct seq_file *s, void *it)
sem_otime,
sma->sem_ctime);

+ complexmode_tryleave(sma);
+
return 0;
}
#endif
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:08 PM2/5/17

to

From: Dan Carpenter <dan.ca...@oracle.com>

commit 9a6dc644512fd083400a96ac4a035ac154fe6b8d upstream.

set_bit() and clear_bit() take the bit number so this code is really
doing "1 << (1 << irq)" which is a double shift bug. It's done
consistently so it won't cause a problem unless "irq" is more than 4.

Fixes: 70c6cce04066 ('mfd: Support 88pm80x in 80x driver')
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Lee Jones <lee....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/mfd/88pm80x.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/mfd/88pm80x.h b/include/linux/mfd/88pm80x.h
index e94537b..b55c95d 100644
--- a/include/linux/mfd/88pm80x.h
+++ b/include/linux/mfd/88pm80x.h
@@ -345,7 +345,7 @@ static inline int pm80x_dev_suspend(struct device *dev)
int irq = platform_get_irq(pdev, 0);

if (device_may_wakeup(dev))
- set_bit((1 << irq), &chip->wu_flag);
+ set_bit(irq, &chip->wu_flag);

return 0;
}
@@ -357,7 +357,7 @@ static inline int pm80x_dev_resume(struct device *dev)
int irq = platform_get_irq(pdev, 0);

if (device_may_wakeup(dev))
- clear_bit((1 << irq), &chip->wu_flag);
+ clear_bit(irq, &chip->wu_flag);

return 0;
}
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,

Feb 5, 2017, 2:50:08 PM2/5/17

to

From: Andrey Ryabinin <arya...@virtuozzo.com>

commit 70d78fe7c8b640b5acfad56ad341985b3810998a upstream.

It could be not possible to freeze coredumping task when it waits for
'core_state->startup' completion, because threads are frozen in
get_signal() before they got a chance to complete 'core_state->startup'.

Inability to freeze a task during suspend will cause suspend to fail.
Also CRIU uses cgroup freezer during dump operation. So with an
unfreezable task the CRIU dump will fail because it waits for a
transition from 'FREEZING' to 'FROZEN' state which will never happen.

Use freezer_do_not_count() to tell freezer to ignore coredumping task
while it waits for core_state->startup completion.

Link: http://lkml.kernel.org/r/1475225434-3753-1-git...@virtuozzo.com
Signed-off-by: Andrey Ryabinin <arya...@virtuozzo.com>
Acked-by: Pavel Machek <pa...@ucw.cz>
Acked-by: Oleg Nesterov <ol...@redhat.com>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: Tejun Heo <t...@kernel.org>
Cc: "Rafael J. Wysocki" <r...@rjwysocki.net>
Cc: Michal Hocko <mho...@kernel.org>

Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/coredump.c | 3 +++

1 file changed, 3 insertions(+)

diff --git a/fs/coredump.c b/fs/coredump.c
index 4f03b2b..a94f94d 100644
--- a/fs/coredump.c
+++ b/fs/coredump.c
@@ -1,6 +1,7 @@
#include <linux/slab.h>
#include <linux/file.h>
#include <linux/fdtable.h>
+#include <linux/freezer.h>
#include <linux/mm.h>
#include <linux/stat.h>
#include <linux/fcntl.h>
@@ -375,7 +376,9 @@ static int coredump_wait(int exit_code, struct core_state *core_state)
if (core_waiters > 0) {
struct core_thread *ptr;

+ freezer_do_not_count();
wait_for_completion(&core_state->startup);
+ freezer_count();
/*
* Wait for all the threads to become inactive, so that
* all the thread context (extended register state, like
--
2.8.0.rc2.1.gbe9624a

Willy Tarreau

unread,