Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 4.12 18/99] tipc: fix use-after-free

164 views
Skip to first unread message

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit 5bfd37b4de5c98e86b12bd13be5aa46c7484a125 ]

syszkaller reported use-after-free in tipc [1]

When msg->rep skb is freed, set the pointer to NULL,
so that caller does not free it again.

[1]

==================================================================
BUG: KASAN: use-after-free in skb_push+0xd4/0xe0 net/core/skbuff.c:1466
Read of size 8 at addr ffff8801c6e71e90 by task syz-executor5/4115

CPU: 1 PID: 4115 Comm: syz-executor5 Not tainted 4.13.0-rc4+ #32
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:430
skb_push+0xd4/0xe0 net/core/skbuff.c:1466
tipc_nl_compat_recv+0x833/0x18f0 net/tipc/netlink_compat.c:1209
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x4512e9
RSP: 002b:00007f3bc8184c08 EFLAGS: 00000216 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000718000 RCX: 00000000004512e9
RDX: 0000000000000020 RSI: 0000000020fdb000 RDI: 0000000000000006
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000216 R12: 00000000004b5e76
R13: 00007f3bc8184b48 R14: 00000000004b5e86 R15: 0000000000000000

Allocated by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:489
kmem_cache_alloc_node+0x13d/0x750 mm/slab.c:3651
__alloc_skb+0xf1/0x740 net/core/skbuff.c:219
alloc_skb include/linux/skbuff.h:903 [inline]
tipc_tlv_alloc+0x26/0xb0 net/tipc/netlink_compat.c:148
tipc_nl_compat_dumpit+0xf2/0x3c0 net/tipc/netlink_compat.c:248
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 4115:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kmem_cache_free+0x77/0x280 mm/slab.c:3763
kfree_skbmem+0x1a1/0x1d0 net/core/skbuff.c:622
__kfree_skb net/core/skbuff.c:682 [inline]
kfree_skb+0x165/0x4c0 net/core/skbuff.c:699
tipc_nl_compat_dumpit+0x36a/0x3c0 net/tipc/netlink_compat.c:260
tipc_nl_compat_handle net/tipc/netlink_compat.c:1130 [inline]
tipc_nl_compat_recv+0x756/0x18f0 net/tipc/netlink_compat.c:1199
genl_family_rcv_msg+0x7b7/0xfb0 net/netlink/genetlink.c:598
genl_rcv_msg+0xb2/0x140 net/netlink/genetlink.c:623
netlink_rcv_skb+0x216/0x440 net/netlink/af_netlink.c:2397
genl_rcv+0x28/0x40 net/netlink/genetlink.c:634
netlink_unicast_kernel net/netlink/af_netlink.c:1265 [inline]
netlink_unicast+0x4e8/0x6f0 net/netlink/af_netlink.c:1291
netlink_sendmsg+0xa4a/0xe60 net/netlink/af_netlink.c:1854
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg+0xca/0x110 net/socket.c:643
sock_write_iter+0x31a/0x5d0 net/socket.c:898
call_write_iter include/linux/fs.h:1743 [inline]
new_sync_write fs/read_write.c:457 [inline]
__vfs_write+0x684/0x970 fs/read_write.c:470
vfs_write+0x189/0x510 fs/read_write.c:518
SYSC_write fs/read_write.c:565 [inline]
SyS_write+0xef/0x220 fs/read_write.c:557
entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff8801c6e71dc0
which belongs to the cache skbuff_head_cache of size 224
The buggy address is located 208 bytes inside of
224-byte region [ffff8801c6e71dc0, ffff8801c6e71ea0)
The buggy address belongs to the page:
page:ffffea00071b9c40 count:1 mapcount:0 mapping:ffff8801c6e71000 index:0x0
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801c6e71000 0000000000000000 000000010000000c
raw: ffffea0007224a20 ffff8801d98caf48 ffff8801d9e79040 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801c6e71d80: fc fc fc fc fc fc fc fc fb fb fb fb fb fb fb fb
ffff8801c6e71e00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8801c6e71e80: fb fb fb fb fc fc fc fc fc fc fc fc fc fc fc fc
^
ffff8801c6e71f00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff8801c6e71f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Cc: Jon Maloy <jon....@ericsson.com>
Cc: Ying Xue <ying...@windriver.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/tipc/netlink_compat.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

--- a/net/tipc/netlink_compat.c
+++ b/net/tipc/netlink_compat.c
@@ -258,13 +258,15 @@ static int tipc_nl_compat_dumpit(struct
arg = nlmsg_new(0, GFP_KERNEL);
if (!arg) {
kfree_skb(msg->rep);
+ msg->rep = NULL;
return -ENOMEM;
}

err = __tipc_nl_compat_dumpit(cmd, msg, arg);
- if (err)
+ if (err) {
kfree_skb(msg->rep);
-
+ msg->rep = NULL;
+ }
kfree_skb(arg);

return err;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vineet Gupta <vgu...@synopsys.com>

commit b5ddb6d54729d814356937572d6c9b599f10c29f upstream.

PAE40 confiuration in hardware extends some of the address registers
for TLB/cache ops to 2 words.

So far kernel was NOT setting the higher word if feature was not enabled
in software which is wrong. Those need to be set to 0 in such case.

Normally this would be done in the cache flush / tlb ops, however since
these registers only exist conditionally, this would have to be
conditional to a flag being set on boot which is expensive/ugly -
specially for the more common case of PAE exists but not in use.
Optimize that by zero'ing them once at boot - nobody will write to
them afterwards

Signed-off-by: Vineet Gupta <vgu...@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/arc/include/asm/mmu.h | 2 ++
arch/arc/mm/cache.c | 34 ++++++++++++++++++++++++++++------
arch/arc/mm/tlb.c | 12 +++++++++++-
3 files changed, 41 insertions(+), 7 deletions(-)

--- a/arch/arc/include/asm/mmu.h
+++ b/arch/arc/include/asm/mmu.h
@@ -94,6 +94,8 @@ static inline int is_pae40_enabled(void)
return IS_ENABLED(CONFIG_ARC_HAS_PAE40);
}

+extern int pae40_exist_but_not_enab(void);
+
#endif /* !__ASSEMBLY__ */

#endif
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -1123,6 +1123,13 @@ noinline void __init arc_ioc_setup(void)
__dc_enable();
}

+/*
+ * Cache related boot time checks/setups only needed on master CPU:
+ * - Geometry checks (kernel build and hardware agree: e.g. L1_CACHE_BYTES)
+ * Assume SMP only, so all cores will have same cache config. A check on
+ * one core suffices for all
+ * - IOC setup / dma callbacks only need to be done once
+ */
void __init arc_cache_init_master(void)
{
unsigned int __maybe_unused cpu = smp_processor_id();
@@ -1202,12 +1209,27 @@ void __ref arc_cache_init(void)

printk(arc_cache_mumbojumbo(0, str, sizeof(str)));

- /*
- * Only master CPU needs to execute rest of function:
- * - Assume SMP so all cores will have same cache config so
- * any geomtry checks will be same for all
- * - IOC setup / dma callbacks only need to be setup once
- */
if (!cpu)
arc_cache_init_master();
+
+ /*
+ * In PAE regime, TLB and cache maintenance ops take wider addresses
+ * And even if PAE is not enabled in kernel, the upper 32-bits still need
+ * to be zeroed to keep the ops sane.
+ * As an optimization for more common !PAE enabled case, zero them out
+ * once at init, rather than checking/setting to 0 for every runtime op
+ */
+ if (is_isa_arcv2() && pae40_exist_but_not_enab()) {
+
+ if (IS_ENABLED(CONFIG_ARC_HAS_ICACHE))
+ write_aux_reg(ARC_REG_IC_PTAG_HI, 0);
+
+ if (IS_ENABLED(CONFIG_ARC_HAS_DCACHE))
+ write_aux_reg(ARC_REG_DC_PTAG_HI, 0);
+
+ if (l2_line_sz) {
+ write_aux_reg(ARC_REG_SLC_RGN_END1, 0);
+ write_aux_reg(ARC_REG_SLC_RGN_START1, 0);
+ }
+ }
}
--- a/arch/arc/mm/tlb.c
+++ b/arch/arc/mm/tlb.c
@@ -104,6 +104,8 @@
/* A copy of the ASID from the PID reg is kept in asid_cache */
DEFINE_PER_CPU(unsigned int, asid_cache) = MM_CTXT_FIRST_CYCLE;

+static int __read_mostly pae_exists;
+
/*
* Utility Routine to erase a J-TLB entry
* Caller needs to setup Index Reg (manually or via getIndex)
@@ -784,7 +786,7 @@ void read_decode_mmu_bcr(void)
mmu->u_dtlb = mmu4->u_dtlb * 4;
mmu->u_itlb = mmu4->u_itlb * 4;
mmu->sasid = mmu4->sasid;
- mmu->pae = mmu4->pae;
+ pae_exists = mmu->pae = mmu4->pae;
}
}

@@ -809,6 +811,11 @@ char *arc_mmu_mumbojumbo(int cpu_id, cha
return buf;
}

+int pae40_exist_but_not_enab(void)
+{
+ return pae_exists && !is_pae40_enabled();
+}
+
void arc_mmu_init(void)
{
char str[256];
@@ -859,6 +866,9 @@ void arc_mmu_init(void)
/* swapper_pg_dir is the pgd for the kernel, used by vmalloc */
write_aux_reg(ARC_REG_SCRATCH_DATA0, swapper_pg_dir);
#endif
+
+ if (pae40_exist_but_not_enab())
+ write_aux_reg(ARC_REG_TLBPD1HI, 0);
}

/*

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dragos Bogdan <dragos...@analog.com>

commit fdd0d32eb95f135041236a6885d9006315aa9a1d upstream.

According to the datasheet, the range of the acceleration is [-10 g, + 10 g],
so the scale factor should be 10 instead of 5.

Signed-off-by: Dragos Bogdan <dragos...@analog.com>
Acked-by: Lars-Peter Clausen <la...@metafoo.de>
Signed-off-by: Jonathan Cameron <Jonathan...@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/iio/imu/adis16480.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/iio/imu/adis16480.c
+++ b/drivers/iio/imu/adis16480.c
@@ -696,7 +696,7 @@ static const struct adis16480_chip_info
.gyro_max_val = IIO_RAD_TO_DEGREE(22500),
.gyro_max_scale = 450,
.accel_max_val = IIO_M_S_2_TO_G(12500),
- .accel_max_scale = 5,
+ .accel_max_scale = 10,
},
[ADIS16485] = {
.channels = adis16485_channels,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:08 AM8/28/17
to
This is the start of the stable review cycle for the 4.12.10 release.
There are 99 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Wed Aug 30 08:04:17 UTC 2017.
Anything received after that time might be too late.

The whole patch series can be found in one patch at:
kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.12.10-rc1.gz
or in the git tree and branch at:
git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.12.y
and the diffstat can be found below.

thanks,

greg k-h

-------------
Pseudo-Shortlog of commits:

Greg Kroah-Hartman <gre...@linuxfoundation.org>
Linux 4.12.10-rc1

Benjamin Herrenschmidt <be...@kernel.crashing.org>
powerpc/mm: Ensure cpumask update is ordered

Lv Zheng <lv.z...@intel.com>
ACPI: EC: Fix regression related to wrong ECDT initialization order

Hanjun Guo <hanju...@linaro.org>
ACPI: APD: Fix HID for Hisilicon Hip07/08

Dave Jiang <dave....@intel.com>
ntb: transport shouldn't disable link due to bogus values in SPADs

Logan Gunthorpe <log...@deltatee.com>
ntb: ntb_test: ensure the link is up before trying to configure the mws

Linus Torvalds <torv...@linux-foundation.org>
Clarify (and fix) MAX_LFS_FILESIZE macros

Joerg Roedel <jro...@suse.de>
iommu: Fix wrong freeing of iommu_device->dev

Charles Milette <charles...@gmail.com>
staging: rtl8188eu: add RNX-N150NUB support

Lorenzo Bianconi <lorenzo.b...@gmail.com>
iio: magnetometer: st_magn: remove ihl property for LSM303AGR

Lorenzo Bianconi <lorenzo.b...@gmail.com>
iio: magnetometer: st_magn: fix status register address for LSM303AGR

Srinivas Pandruvada <srinivas....@linux.intel.com>
iio: hid-sensor-trigger: Fix the race with user space powering up sensors

Dragos Bogdan <dragos...@analog.com>
iio: imu: adis16480: Fix acceleration scale factor for adis16480

Martijn Coenen <ma...@android.com>
ANDROID: binder: fix proc->tsk check.

Riley Andrews <rian...@google.com>
binder: Use wake up hint for synchronous transactions.

Todd Kjos <tk...@android.com>
binder: use group leader instead of open thread

Todd Kjos <tk...@android.com>
Revert "android: binder: Sanity check at binder ioctl"

Jeffy Chen <jeffy...@rock-chips.com>
Bluetooth: bnep: fix possible might sleep error in bnep_session

Jeffy Chen <jeffy...@rock-chips.com>
Bluetooth: cmtp: fix possible might sleep error in cmtp_session

Jeffy Chen <jeffy...@rock-chips.com>
Bluetooth: hidp: fix possible might sleep error in hidp_session_thread

Mateusz Jurczyk <mjur...@google.com>
netfilter: nfnetlink: Improve input length sanitization in nfnetlink_rcv

Florian Westphal <f...@strlen.de>
netfilter: nat: fix src map lookup

Florian Westphal <f...@strlen.de>
netfilter: expect: fix crash when putting uninited expectation

Vadim Lomovtsev <vlom...@redhat.com>
net: sunrpc: svcsock: fix NULL-pointer exception

Eric Biggers <ebig...@google.com>
x86/mm: Fix use-after-free of ldt_struct

Nicholas Piggin <npi...@gmail.com>
timers: Fix excessive granularity of new timers after a nohz idle

Mark Rutland <mark.r...@arm.com>
perf/core: Fix group {cpu,task} validation

Steven Rostedt (VMware) <ros...@goodmis.org>
ftrace: Check for null ret_stack on profile function graph entry function

Christoph Hellwig <h...@lst.de>
virtio_pci: fix cpu affinity support

Steven Rostedt (VMware) <ros...@goodmis.org>
ring-buffer: Have ring_buffer_alloc_read_page() return error on offline CPU

Chuck Lever <chuck...@oracle.com>
nfsd: Limit end of page list when decoding NFSv4 WRITE

Ronnie Sahlberg <lsah...@redhat.com>
cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup()

Sachin Prabhu <spr...@redhat.com>
cifs: Fix df output for users with quota limits

Nicholas Piggin <npi...@gmail.com>
kbuild: linker script do not match C names unless LD_DEAD_CODE_DATA_ELIMINATION is configured

Bharat Potnuri <bha...@chelsio.com>
RDMA/uverbs: Initialize cq_context appropriately

Steven Rostedt (VMware) <ros...@goodmis.org>
tracing: Fix freeing of filter in create_filter() when set_str is false

Chunyu Hu <ch...@redhat.com>
tracing: Fix kmemleak in tracing_map_array_free()

Dan Carpenter <dan.ca...@oracle.com>
tracing: Missing error code in tracer_alloc_buffers()

Steven Rostedt (VMware) <ros...@goodmis.org>
tracing: Call clear_boot_tracer() at lateinit_sync

Sakari Ailus <sakari...@linux.intel.com>
ACPI: device property: Fix node lookup in acpi_graph_get_child_prop_value()

Alex Deucher <alexd...@gmail.com>
Revert "drm/amdgpu: fix vblank_time when displays are off"

fred gao <fred...@intel.com>
drm/i915/gvt: Fix the kernel null pointer error

Jani Nikula <jani....@intel.com>
drm/i915/vbt: ignore extraneous child devices for a port

Maarten Lankhorst <maarten....@linux.intel.com>
drm/atomic: If the atomic check fails, return its value first

Maarten Lankhorst <maarten....@linux.intel.com>
drm/atomic: Handle -EDEADLK with out-fences correctly

Jonathan Liu <net...@gmail.com>
drm/sun4i: Implement drm_driver lastclose to restore fbdev console

Chris Wilson <ch...@chris-wilson.co.uk>
drm: Release driver tracking before making the object available again

Nikhil Mahale <nma...@nvidia.com>
drm: Fix framebuffer leak

Dave Martin <Dave....@arm.com>
arm64: fpsimd: Prevent registers leaking across exec

Pavel Tatashin <pasha.t...@oracle.com>
mm/memblock.c: reversed logic in memblock_discard()

Eric Biggers <ebig...@google.com>
fork: fix incorrect fput of ->exe_file causing use-after-free

Eric Biggers <ebig...@google.com>
mm/madvise.c: fix freeing of locked page with MADV_FREE

Ulf Hansson <ulf.h...@linaro.org>
i2c: designware: Fix system suspend

Ross Zwisler <ross.z...@linux.intel.com>
dax: fix deadlock due to misaligned PMD faults

Kirill A. Shutemov <kirill....@linux.intel.com>
mm, shmem: fix handling /sys/kernel/mm/transparent_hugepage/shmem_enabled

Chen Yu <yu.c...@intel.com>
PM/hibernate: touch NMI watchdog when creating snapshot

Vineet Gupta <vgu...@synopsys.com>
ARCv2: PAE40: set MSB even if !CONFIG_ARC_HAS_PAE40 but PAE exists in SoC

Alexey Brodkin <Alexey....@synopsys.com>
ARCv2: PAE40: Explicitly set MSB counterpart of SLC region ops addresses

Alexey Brodkin <abro...@synopsys.com>
ARCv2: SLC: Make sure busy bit is set properly for region ops

Takashi Sakamoto <o-ta...@sakamocchi.jp>
ALSA: firewire-motu: destroy stream data surely at failure of card initialization

Takashi Sakamoto <o-ta...@sakamocchi.jp>
ALSA: firewire: fix NULL pointer dereference when releasing uninitialized data of iso-resource

Takashi Iwai <ti...@suse.de>
ALSA: hda - Add stereo mic quirk for Lenovo G50-70 (17aa:3978)

Takashi Iwai <ti...@suse.de>
ALSA: core: Fix unexpected error at replacing user TLV

Joakim Tjernlund <joakim.t...@infinera.com>
ALSA: usb-audio: Add delay quirk for H650e/Jabra 550a USB headsets

Paolo Bonzini <pbon...@redhat.com>
KVM: x86: block guest protection keys unless the host has them enabled

Paolo Bonzini <pbon...@redhat.com>
KVM, pkeys: do not use PKRU value in vcpu->arch.guest_fpu.state

Paolo Bonzini <pbon...@redhat.com>
KVM: x86: simplify handling of PKRU

Heiko Carstens <heiko.c...@de.ibm.com>
KVM: s390: sthyi: fix specification exception detection

Heiko Carstens <heiko.c...@de.ibm.com>
KVM: s390: sthyi: fix sthyi inline assembly

Masaki Ota <masak...@jp.alps.com>
Input: ALPS - fix two-finger scroll breakage in right side on ALPS touchpad

KT Liao <kt....@emc.com.tw>
Input: elan_i2c - add ELAN0602 ACPI ID to support Lenovo Yoga310

Aaron Ma <aaro...@canonical.com>
Input: trackpoint - add new trackpoint firmware ID

Edward Cree <ec...@solarflare.com>
bpf/verifier: fix min/max handling in BPF_SUB

Daniel Borkmann <dan...@iogearbox.net>
bpf: fix mixed signed/unsigned derived min/max value bounds

John Fastabend <john.fa...@gmail.com>
bpf, verifier: add additional patterns to evaluate_reg_imm_alu

Konstantin Khlebnikov <khleb...@yandex-team.ru>
net_sched: fix order of queue length updates in qdisc_replace()

Xin Long <lucie...@gmail.com>
net: sched: fix NULL pointer dereference when action calls some targets

Colin Ian King <colin...@canonical.com>
irda: do not leak initialized list.dev to userspace

Huy Nguyen <hu...@mellanox.com>
net/mlx4_core: Enable 4K UAR if SRIOV module parameter is not enabled

Neal Cardwell <ncar...@google.com>
tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP

Wei Wang <wei...@google.com>
ipv6: repair fib6 tree in failure case

Wei Wang <wei...@google.com>
ipv6: reset fn->rr_ptr when replacing route

Eric Dumazet <edum...@google.com>
tipc: fix use-after-free

Alexander Potapenko <gli...@google.com>
sctp: fully initialize the IPv6 address in sctp_v6_to_addr()

Eric Dumazet <edum...@google.com>
tun: handle register_netdevice() failures properly

Colin Ian King <colin...@canonical.com>
nfp: fix infinite loop on umapping cleanup

Eric Dumazet <edum...@google.com>
ipv4: better IP_MAX_MTU enforcement

Eric Dumazet <edum...@google.com>
ptr_ring: use kmalloc_array()

Liping Zhang <zlpn...@gmail.com>
openvswitch: fix skb_panic due to the incorrect actions attrlen

David Ahern <dsa...@gmail.com>
net: igmp: Use ingress interface rather than vrf device

Daniel Borkmann <dan...@iogearbox.net>
bpf: fix bpf_trace_printk on 32 bit archs

Konstantin Khlebnikov <khleb...@yandex-team.ru>
net_sched: remove warning from qdisc_hash_add

Konstantin Khlebnikov <khleb...@yandex-team.ru>
net_sched/sfq: update hierarchical backlog when drop packet

Eric Dumazet <edum...@google.com>
ipv4: fix NULL dereference in free_fib_info_rcu()

Eric Dumazet <edum...@google.com>
dccp: defer ccid_hc_tx_delete() at dismantle time

Eric Dumazet <edum...@google.com>
dccp: purge write queue in dccp_destroy_sock()

Eric Dumazet <edum...@google.com>
af_key: do not use GFP_KERNEL in atomic contexts

Andreas Born <futur...@googlemail.com>
bonding: ratelimit failed speed/duplex update warning

Andreas Born <futur...@googlemail.com>
bonding: require speed/duplex only for 802.3ad, alb and tlb

Tushar Dave <tushar...@oracle.com>
sparc64: remove unnecessary log message


-------------

Diffstat:

Makefile | 4 +-
arch/arc/include/asm/cache.h | 2 +
arch/arc/include/asm/mmu.h | 2 +
arch/arc/mm/cache.c | 50 +++++-
arch/arc/mm/tlb.c | 12 +-
arch/arm64/kernel/fpsimd.c | 2 +
arch/powerpc/include/asm/mmu_context.h | 20 ++-
arch/powerpc/include/asm/pgtable-be-types.h | 1 +
arch/powerpc/include/asm/pgtable-types.h | 1 +
arch/s390/kvm/sthyi.c | 7 +-
arch/sparc/kernel/pci_sun4v.c | 2 -
arch/x86/include/asm/fpu/internal.h | 6 +-
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/include/asm/mmu_context.h | 4 +-
arch/x86/kvm/cpuid.c | 2 +-
arch/x86/kvm/kvm_cache_regs.h | 5 -
arch/x86/kvm/mmu.h | 2 +-
arch/x86/kvm/svm.c | 7 -
arch/x86/kvm/vmx.c | 25 +--
arch/x86/kvm/x86.c | 17 +-
drivers/acpi/acpi_apd.c | 4 +-
drivers/acpi/ec.c | 17 +-
drivers/acpi/internal.h | 1 -
drivers/acpi/property.c | 2 +-
drivers/acpi/scan.c | 1 -
drivers/android/binder.c | 19 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c | 2 -
drivers/gpu/drm/drm_atomic.c | 11 +-
drivers/gpu/drm/drm_gem.c | 6 +-
drivers/gpu/drm/drm_plane.c | 1 +
drivers/gpu/drm/i915/gvt/cmd_parser.c | 2 +-
drivers/gpu/drm/i915/intel_bios.c | 15 +-
drivers/gpu/drm/sun4i/sun4i_drv.c | 8 +
drivers/i2c/busses/i2c-designware-platdrv.c | 14 +-
.../iio/common/hid-sensors/hid-sensor-trigger.c | 8 +-
drivers/iio/imu/adis16480.c | 2 +-
drivers/iio/magnetometer/st_magn_core.c | 4 +-
drivers/infiniband/core/uverbs_cmd.c | 2 +-
drivers/input/mouse/alps.c | 41 +++--
drivers/input/mouse/alps.h | 8 +
drivers/input/mouse/elan_i2c_core.c | 1 +
drivers/input/mouse/trackpoint.c | 3 +-
drivers/input/mouse/trackpoint.h | 3 +-
drivers/iommu/amd_iommu_types.h | 4 +-
drivers/iommu/intel-iommu.c | 4 +-
drivers/iommu/iommu-sysfs.c | 32 ++--
drivers/net/bonding/bond_main.c | 13 +-
drivers/net/ethernet/mellanox/mlx4/main.c | 4 +-
.../net/ethernet/netronome/nfp/nfp_net_common.c | 3 +-
drivers/net/tun.c | 3 +
drivers/ntb/ntb_transport.c | 4 +-
drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 +
drivers/virtio/virtio_pci_common.c | 10 +-
fs/cifs/dir.c | 18 +-
fs/cifs/smb2pdu.c | 4 +-
fs/dax.c | 10 ++
fs/nfsd/nfs4xdr.c | 6 +-
include/asm-generic/vmlinux.lds.h | 38 ++--
include/linux/bpf_verifier.h | 1 +
include/linux/fs.h | 4 +-
include/linux/iommu.h | 12 +-
include/linux/ptr_ring.h | 9 +-
include/linux/skb_array.h | 3 +-
include/net/bonding.h | 5 +
include/net/ip.h | 4 +-
include/net/sch_generic.h | 5 +-
kernel/bpf/verifier.c | 191 ++++++++++++++++++---
kernel/events/core.c | 39 ++---
kernel/fork.c | 1 +
kernel/time/timer.c | 50 +++++-
kernel/trace/bpf_trace.c | 34 +++-
kernel/trace/ftrace.c | 4 +
kernel/trace/ring_buffer.c | 14 +-
kernel/trace/ring_buffer_benchmark.c | 2 +-
kernel/trace/trace.c | 19 +-
kernel/trace/trace_events_filter.c | 4 +
kernel/trace/tracing_map.c | 11 +-
mm/madvise.c | 2 +-
mm/memblock.c | 2 +-
mm/page_alloc.c | 20 ++-
mm/shmem.c | 4 +-
net/bluetooth/bnep/core.c | 11 +-
net/bluetooth/cmtp/core.c | 17 +-
net/bluetooth/hidp/core.c | 33 ++--
net/dccp/proto.c | 19 +-
net/ipv4/fib_semantics.c | 12 +-
net/ipv4/igmp.c | 10 +-
net/ipv4/route.c | 2 +-
net/ipv4/tcp_input.c | 3 +-
net/ipv6/ip6_fib.c | 26 +--
net/irda/af_irda.c | 2 +-
net/key/af_key.c | 48 +++---
net/netfilter/nf_conntrack_expect.c | 2 +-
net/netfilter/nf_nat_core.c | 17 +-
net/netfilter/nfnetlink.c | 6 +-
net/openvswitch/actions.c | 1 +
net/openvswitch/datapath.c | 7 +-
net/openvswitch/datapath.h | 2 +
net/sched/act_ipt.c | 2 +
net/sched/sch_api.c | 3 -
net/sched/sch_sfq.c | 5 +-
net/sctp/ipv6.c | 2 +
net/sunrpc/svcsock.c | 22 ++-
net/tipc/netlink_compat.c | 6 +-
sound/core/control.c | 2 +-
sound/firewire/iso-resources.c | 7 +-
sound/firewire/motu/motu.c | 1 +
sound/pci/hda/patch_conexant.c | 1 +
sound/usb/quirks.c | 9 +-
tools/testing/selftests/ntb/ntb_test.sh | 4 +
110 files changed, 864 insertions(+), 359 deletions(-)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <ebig...@google.com>

commit 2b7e8665b4ff51c034c55df3cff76518d1a9ee3a upstream.

Commit 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for
write killable") made it possible to kill a forking task while it is
waiting to acquire its ->mmap_sem for write, in dup_mmap().

However, it was overlooked that this introduced an new error path before
a reference is taken on the mm_struct's ->exe_file. Since the
->exe_file of the new mm_struct was already set to the old ->exe_file by
the memcpy() in dup_mm(), it was possible for the mmput() in the error
path of dup_mm() to drop a reference to ->exe_file which was never
taken.

This caused the struct file to later be freed prematurely.

Fix it by updating mm_init() to NULL out the ->exe_file, in the same
place it clears other things like the list of mmaps.

This bug was found by syzkaller. It can be reproduced using the
following C program:

#define _GNU_SOURCE
#include <pthread.h>
#include <stdlib.h>
#include <sys/mman.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static void *mmap_thread(void *_arg)
{
for (;;) {
mmap(NULL, 0x1000000, PROT_READ,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
}
}

static void *fork_thread(void *_arg)
{
usleep(rand() % 10000);
fork();
}

int main(void)
{
fork();
fork();
fork();
for (;;) {
if (fork() == 0) {
pthread_t t;

pthread_create(&t, NULL, mmap_thread, NULL);
pthread_create(&t, NULL, fork_thread, NULL);
usleep(rand() % 10000);
syscall(__NR_exit_group, 0);
}
wait(NULL);
}
}

No special kernel config options are needed. It usually causes a NULL
pointer dereference in __remove_shared_vm_struct() during exit, or in
dup_mmap() (which is usually inlined into copy_process()) during fork.
Both are due to a vm_area_struct's ->vm_file being used after it's
already been freed.

Google Bug Id: 64772007

Link: http://lkml.kernel.org/r/20170823211408.3...@gmail.com
Fixes: 7c051267931a ("mm, fork: make dup_mmap wait for mmap_sem for write killable")
Signed-off-by: Eric Biggers <ebig...@google.com>
Tested-by: Mark Rutland <mark.r...@arm.com>
Acked-by: Michal Hocko <mho...@suse.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Konstantin Khlebnikov <koc...@gmail.com>
Cc: Oleg Nesterov <ol...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Vlastimil Babka <vba...@suse.cz>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/fork.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -802,6 +802,7 @@ static struct mm_struct *mm_init(struct
mm_init_cpumask(mm);
mm_init_aio(mm);
mm_init_owner(mm, p);
+ RCU_INIT_POINTER(mm->exe_file, NULL);
mmu_notifier_mm_init(mm);
clear_tlb_flush_pending(mm);
#if defined(CONFIG_TRANSPARENT_HUGEPAGE) && !USE_SPLIT_PMD_PTLOCKS

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masaki Ota <masak...@jp.alps.com>

commit 4a646580f793d19717f7e034c8d473b509c27d49 upstream.

Fixed the issue that two finger scroll does not work correctly
on V8 protocol. The cause is that V8 protocol X-coordinate decode
is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.

Mote notes:
the problem manifests itself by the commit e7348396c6d5 ("Input: ALPS
- fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
protocol was applied. Although the culprit must have been present
beforehand, the two-finger scroll worked casually even with the
wrongly reported values by some reason. It got broken by the commit
above just because it changed x_max value, and this made libinput
correctly figuring the MT events. Since the X coord is reported as
falsely doubled, the events on the right-half side go outside the
boundary, thus they are no longer handled. This resulted as a broken
two-finger scroll.

One finger event is decoded differently, and it didn't suffer from
this problem. The problem was only about MT events. --tiwai

Fixes: e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
Signed-off-by: Masaki Ota <masak...@jp.alps.com>
Tested-by: Takashi Iwai <ti...@suse.de>
Tested-by: Paul Donohue <linux-...@PaulSD.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/input/mouse/alps.c | 41 +++++++++++++++++++++++++++++++----------
drivers/input/mouse/alps.h | 8 ++++++++
2 files changed, 39 insertions(+), 10 deletions(-)

--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -1215,14 +1215,24 @@ static int alps_decode_ss4_v2(struct alp

case SS4_PACKET_ID_TWO:
if (priv->flags & ALPS_BUTTONPAD) {
- f->mt[0].x = SS4_BTL_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_BTL_MF_X_V2(p, 1);
+ }
f->mt[0].y = SS4_BTL_MF_Y_V2(p, 0);
- f->mt[1].x = SS4_BTL_MF_X_V2(p, 1);
f->mt[1].y = SS4_BTL_MF_Y_V2(p, 1);
} else {
- f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
+ }
f->mt[0].y = SS4_STD_MF_Y_V2(p, 0);
- f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
f->mt[1].y = SS4_STD_MF_Y_V2(p, 1);
}
f->pressure = SS4_MF_Z_V2(p, 0) ? 0x30 : 0;
@@ -1239,16 +1249,27 @@ static int alps_decode_ss4_v2(struct alp

case SS4_PACKET_ID_MULTI:
if (priv->flags & ALPS_BUTTONPAD) {
- f->mt[2].x = SS4_BTL_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_BTL_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_BTL_MF_X_V2(p, 1);
+ } else {
+ f->mt[2].x = SS4_BTL_MF_X_V2(p, 0);
+ f->mt[3].x = SS4_BTL_MF_X_V2(p, 1);
+ }
+
f->mt[2].y = SS4_BTL_MF_Y_V2(p, 0);
- f->mt[3].x = SS4_BTL_MF_X_V2(p, 1);
f->mt[3].y = SS4_BTL_MF_Y_V2(p, 1);
no_data_x = SS4_MFPACKET_NO_AX_BL;
no_data_y = SS4_MFPACKET_NO_AY_BL;
} else {
- f->mt[2].x = SS4_STD_MF_X_V2(p, 0);
+ if (IS_SS4PLUS_DEV(priv->dev_id)) {
+ f->mt[0].x = SS4_PLUS_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_PLUS_STD_MF_X_V2(p, 1);
+ } else {
+ f->mt[0].x = SS4_STD_MF_X_V2(p, 0);
+ f->mt[1].x = SS4_STD_MF_X_V2(p, 1);
+ }
f->mt[2].y = SS4_STD_MF_Y_V2(p, 0);
- f->mt[3].x = SS4_STD_MF_X_V2(p, 1);
f->mt[3].y = SS4_STD_MF_Y_V2(p, 1);
no_data_x = SS4_MFPACKET_NO_AX;
no_data_y = SS4_MFPACKET_NO_AY;
@@ -2541,8 +2562,8 @@ static int alps_set_defaults_ss4_v2(stru

memset(otp, 0, sizeof(otp));

- if (alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]) ||
- alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]))
+ if (alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]) ||
+ alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]))
return -1;

alps_update_device_area_ss4_v2(otp, priv);
--- a/drivers/input/mouse/alps.h
+++ b/drivers/input/mouse/alps.h
@@ -100,6 +100,10 @@ enum SS4_PACKET_ID {
((_b[1 + _i * 3] << 5) & 0x1F00) \
)

+#define SS4_PLUS_STD_MF_X_V2(_b, _i) (((_b[0 + (_i) * 3] << 4) & 0x0070) | \
+ ((_b[1 + (_i) * 3] << 4) & 0x0F80) \
+ )
+
#define SS4_STD_MF_Y_V2(_b, _i) (((_b[1 + (_i) * 3] << 3) & 0x0010) | \
((_b[2 + (_i) * 3] << 5) & 0x01E0) | \
((_b[2 + (_i) * 3] << 4) & 0x0E00) \
@@ -109,6 +113,10 @@ enum SS4_PACKET_ID {
((_b[0 + (_i) * 3] >> 3) & 0x0010) \
)

+#define SS4_PLUS_BTL_MF_X_V2(_b, _i) (SS4_PLUS_STD_MF_X_V2(_b, _i) | \
+ ((_b[0 + (_i) * 3] >> 4) & 0x0008) \
+ )
+
#define SS4_BTL_MF_Y_V2(_b, _i) (SS4_STD_MF_Y_V2(_b, _i) | \
((_b[0 + (_i) * 3] >> 3) & 0x0008) \
)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexander Potapenko <gli...@google.com>


[ Upstream commit 15339e441ec46fbc3bf3486bb1ae4845b0f1bb8d ]

KMSAN reported use of uninitialized sctp_addr->v4.sin_addr.s_addr and
sctp_addr->v6.sin6_scope_id in sctp_v6_cmp_addr() (see below).
Make sure all fields of an IPv6 address are initialized, which
guarantees that the IPv4 fields are also initialized.

==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================
BUG: KMSAN: use of uninitialized memory in sctp_v6_cmp_addr+0x8d4/0x9f0
net/sctp/ipv6.c:517
CPU: 2 PID: 31056 Comm: syz-executor1 Not tainted 4.11.0-rc5+ #2944
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs
01/01/2011
Call Trace:
dump_stack+0x172/0x1c0 lib/dump_stack.c:42
is_logbuf_locked mm/kmsan/kmsan.c:59 [inline]
kmsan_report+0x12a/0x180 mm/kmsan/kmsan.c:938
native_save_fl arch/x86/include/asm/irqflags.h:18 [inline]
arch_local_save_flags arch/x86/include/asm/irqflags.h:72 [inline]
arch_local_irq_save arch/x86/include/asm/irqflags.h:113 [inline]
__msan_warning_32+0x61/0xb0 mm/kmsan/kmsan_instr.c:467
sctp_v6_cmp_addr+0x8d4/0x9f0 net/sctp/ipv6.c:517
sctp_v6_get_dst+0x8c7/0x1630 net/sctp/ipv6.c:290
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
sctp_assoc_add_peer+0x66d/0x16f0 net/sctp/associola.c:651
sctp_sendmsg+0x35a5/0x4f90 net/sctp/socket.c:1871
inet_sendmsg+0x498/0x670 net/ipv4/af_inet.c:762
sock_sendmsg_nosec net/socket.c:633 [inline]
sock_sendmsg net/socket.c:643 [inline]
SYSC_sendto+0x608/0x710 net/socket.c:1696
SyS_sendto+0x8a/0xb0 net/socket.c:1664
entry_SYSCALL_64_fastpath+0x13/0x94
RIP: 0033:0x44b479
RSP: 002b:00007f6213f21c08 EFLAGS: 00000286 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 0000000020000000 RCX: 000000000044b479
RDX: 0000000000000041 RSI: 0000000020edd000 RDI: 0000000000000006
RBP: 00000000007080a8 R08: 0000000020b85fe4 R09: 000000000000001c
R10: 0000000000040005 R11: 0000000000000286 R12: 00000000ffffffff
R13: 0000000000003760 R14: 00000000006e5820 R15: 0000000000ff8000
origin description: ----dst_saddr@sctp_v6_get_dst
local variable created at:
sk_fullsock include/net/sock.h:2321 [inline]
inet6_sk include/linux/ipv6.h:309 [inline]
sctp_v6_get_dst+0x91/0x1630 net/sctp/ipv6.c:241
sctp_transport_route+0x101/0x570 net/sctp/transport.c:292
==================================================================

Signed-off-by: Alexander Potapenko <gli...@google.com>
Reviewed-by: Xin Long <lucie...@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo...@gmail.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/sctp/ipv6.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/sctp/ipv6.c
+++ b/net/sctp/ipv6.c
@@ -510,7 +510,9 @@ static void sctp_v6_to_addr(union sctp_a
{
addr->sa.sa_family = AF_INET6;
addr->v6.sin6_port = port;
+ addr->v6.sin6_flowinfo = 0;
addr->v6.sin6_addr = *saddr;
+ addr->v6.sin6_scope_id = 0;
}

/* Compare addresses exactly.

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:17 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lorenzo Bianconi <lorenzo.b...@gmail.com>

commit 541ee9b24fca587f510fe1bc58508d5cf40707af upstream.

Fixes: 97865fe41322 (iio: st_sensors: verify interrupt event to status)
Signed-off-by: Lorenzo Bianconi <lorenzo....@st.com>
Reviewed-by: Linus Walleij <linus....@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan...@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/iio/magnetometer/st_magn_core.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/iio/magnetometer/st_magn_core.c
+++ b/drivers/iio/magnetometer/st_magn_core.c
@@ -358,7 +358,7 @@ static const struct st_sensor_settings s
.mask_int1 = 0x01,
.addr_ihl = 0x63,
.mask_ihl = 0x04,
- .addr_stat_drdy = ST_SENSORS_DEFAULT_STAT_ADDR,
+ .addr_stat_drdy = 0x67,
},
.multi_read_bit = false,
.bootime = 2,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:17 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Daniel Borkmann <dan...@iogearbox.net>


[ Upstream commit 88a5c690b66110ad255380d8f629c629cf6ca559 ]

James reported that on MIPS32 bpf_trace_printk() is currently
broken while MIPS64 works fine:

bpf_trace_printk() uses conditional operators to attempt to
pass different types to __trace_printk() depending on the
format operators. This doesn't work as intended on 32-bit
architectures where u32 and long are passed differently to
u64, since the result of C conditional operators follows the
"usual arithmetic conversions" rules, such that the values
passed to __trace_printk() will always be u64 [causing issues
later in the va_list handling for vscnprintf()].

For example the samples/bpf/tracex5 test printed lines like
below on MIPS32, where the fd and buf have come from the u64
fd argument, and the size from the buf argument:

[...] 1180.941542: 0x00000001: write(fd=1, buf= (null), size=6258688)

Instead of this:

[...] 1625.616026: 0x00000001: write(fd=1, buf=009e4000, size=512)

One way to get it working is to expand various combinations
of argument types into 8 different combinations for 32 bit
and 64 bit kernels. Fix tested by James on MIPS32 and MIPS64
as well that it resolves the issue.

Fixes: 9c959c863f82 ("tracing: Allow BPF programs to call bpf_trace_printk()")
Reported-by: James Hogan <james...@imgtec.com>
Tested-by: James Hogan <james...@imgtec.com>
Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
kernel/trace/bpf_trace.c | 34 ++++++++++++++++++++++++++++++----
1 file changed, 30 insertions(+), 4 deletions(-)

--- a/kernel/trace/bpf_trace.c
+++ b/kernel/trace/bpf_trace.c
@@ -203,10 +203,36 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt
fmt_cnt++;
}

- return __trace_printk(1/* fake ip will not be printed */, fmt,
- mod[0] == 2 ? arg1 : mod[0] == 1 ? (long) arg1 : (u32) arg1,
- mod[1] == 2 ? arg2 : mod[1] == 1 ? (long) arg2 : (u32) arg2,
- mod[2] == 2 ? arg3 : mod[2] == 1 ? (long) arg3 : (u32) arg3);
+/* Horrid workaround for getting va_list handling working with different
+ * argument type combinations generically for 32 and 64 bit archs.
+ */
+#define __BPF_TP_EMIT() __BPF_ARG3_TP()
+#define __BPF_TP(...) \
+ __trace_printk(1 /* Fake ip will not be printed. */, \
+ fmt, ##__VA_ARGS__)
+
+#define __BPF_ARG1_TP(...) \
+ ((mod[0] == 2 || (mod[0] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_TP(arg1, ##__VA_ARGS__) \
+ : ((mod[0] == 1 || (mod[0] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_TP((long)arg1, ##__VA_ARGS__) \
+ : __BPF_TP((u32)arg1, ##__VA_ARGS__)))
+
+#define __BPF_ARG2_TP(...) \
+ ((mod[1] == 2 || (mod[1] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG1_TP(arg2, ##__VA_ARGS__) \
+ : ((mod[1] == 1 || (mod[1] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG1_TP((long)arg2, ##__VA_ARGS__) \
+ : __BPF_ARG1_TP((u32)arg2, ##__VA_ARGS__)))
+
+#define __BPF_ARG3_TP(...) \
+ ((mod[2] == 2 || (mod[2] == 1 && __BITS_PER_LONG == 64)) \
+ ? __BPF_ARG2_TP(arg3, ##__VA_ARGS__) \
+ : ((mod[2] == 1 || (mod[2] == 0 && __BITS_PER_LONG == 32)) \
+ ? __BPF_ARG2_TP((long)arg3, ##__VA_ARGS__) \
+ : __BPF_ARG2_TP((u32)arg3, ##__VA_ARGS__)))
+
+ return __BPF_TP_EMIT();
}

static const struct bpf_func_proto bpf_trace_printk_proto = {

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:18 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit 81fbfe8adaf38d4f5a98c19bebfd41c5d6acaee8 ]

As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.

Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.

Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jaso...@redhat.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
include/linux/ptr_ring.h | 9 +++++----
include/linux/skb_array.h | 3 ++-
2 files changed, 7 insertions(+), 5 deletions(-)

--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -371,9 +371,9 @@ static inline void *ptr_ring_consume_bh(
__PTR_RING_PEEK_CALL_v; \
})

-static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
+static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
{
- return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
+ return kcalloc(size, sizeof(void *), gfp);
}

static inline void __ptr_ring_set_size(struct ptr_ring *r, int size)
@@ -462,7 +462,8 @@ static inline int ptr_ring_resize(struct
* In particular if you consume ring in interrupt or BH context, you must
* disable interrupts/BH when doing so.
*/
-static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
+ unsigned int nrings,
int size,
gfp_t gfp, void (*destroy)(void *))
{
@@ -470,7 +471,7 @@ static inline int ptr_ring_resize_multip
void ***queues;
int i;

- queues = kmalloc(nrings * sizeof *queues, gfp);
+ queues = kmalloc_array(nrings, sizeof(*queues), gfp);
if (!queues)
goto noqueues;

--- a/include/linux/skb_array.h
+++ b/include/linux/skb_array.h
@@ -162,7 +162,8 @@ static inline int skb_array_resize(struc
}

static inline int skb_array_resize_multiple(struct skb_array **rings,
- int nrings, int size, gfp_t gfp)
+ int nrings, unsigned int size,
+ gfp_t gfp)
{
BUILD_BUG_ON(offsetof(struct skb_array, ring));
return ptr_ring_resize_multiple((struct ptr_ring **)rings,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:23 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Charles Milette <charles...@gmail.com>

commit f299aec6ebd747298e35934cff7709c6b119ca52 upstream.

Add support for USB Device Rosewill RNX-N150NUB.
VendorID: 0x0bda, ProductID: 0xffef

Signed-off-by: Charles Milette <charles...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/staging/rtl8188eu/os_dep/usb_intf.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/staging/rtl8188eu/os_dep/usb_intf.c
+++ b/drivers/staging/rtl8188eu/os_dep/usb_intf.c
@@ -45,6 +45,7 @@ static struct usb_device_id rtw_usb_id_t
{USB_DEVICE(0x2001, 0x3311)}, /* DLink GO-USB-N150 REV B1 */
{USB_DEVICE(0x2357, 0x010c)}, /* TP-Link TL-WN722N v2 */
{USB_DEVICE(0x0df6, 0x0076)}, /* Sitecom N150 v2 */
+ {USB_DEVICE(USB_VENDER_ID_REALTEK, 0xffef)}, /* Rosewill RNX-N150NUB */
{} /* Terminating entry */
};

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:10:23 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Sakamoto <o-ta...@sakamocchi.jp>

commit 0c264af7be2013266c5b4c644f3f366399ee490a upstream.

When calling 'iso_resource_free()' for uninitialized data, this function
causes NULL pointer dereference due to its 'unit' member. This occurs when
unplugging audio and music units on IEEE 1394 bus at failure of card
registration.

This commit fixes the bug. The bug exists since kernel v4.5.

Fixes: 324540c4e05c ('ALSA: fireface: postpone sound card registration') at v4.12
Fixes: 8865a31e0fd8 ('ALSA: firewire-motu: postpone sound card registration') at v4.12
Fixes: b610386c8afb ('ALSA: firewire-tascam: deleyed registration of sound card') at v4.7
Fixes: 86c8dd7f4da3 ('ALSA: firewire-digi00x: delayed registration of sound card') at v4.7
Fixes: 6c29230e2a5f ('ALSA: oxfw: delayed registration of sound card') at v4.7
Fixes: 7d3c1d5901aa ('ALSA: fireworks: delayed registration of sound card') at v4.7
Fixes: 04a2c73c97eb ('ALSA: bebob: delayed registration of sound card') at v4.7
Fixes: b59fb1900b4f ('ALSA: dice: postpone card registration') at v4.5
Signed-off-by: Takashi Sakamoto <o-ta...@sakamocchi.jp>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
sound/firewire/iso-resources.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

--- a/sound/firewire/iso-resources.c
+++ b/sound/firewire/iso-resources.c
@@ -210,9 +210,14 @@ EXPORT_SYMBOL(fw_iso_resources_update);
*/
void fw_iso_resources_free(struct fw_iso_resources *r)
{
- struct fw_card *card = fw_parent_device(r->unit)->card;
+ struct fw_card *card;
int bandwidth, channel;

+ /* Not initialized. */
+ if (r->unit == NULL)
+ return;
+ card = fw_parent_device(r->unit)->card;
+
mutex_lock(&r->mutex);

if (r->allocated) {

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:08 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joerg Roedel <jro...@suse.de>

commit 2926a2aa5c14fb2add75e6584845b1c03022235f upstream.

The struct iommu_device has a 'struct device' embedded into
it, not as a pointer, but the whole struct. In the
conversion of the iommu drivers to use struct iommu_device
it was forgotten that the relase function for that struct
device simply calls kfree() on the pointer.

This frees memory that was never allocated and causes memory
corruption.

To fix this issue, use a pointer to struct device instead of
embedding the whole struct. This needs some updates in the
iommu sysfs code as well as the Intel VT-d and AMD IOMMU
driver.

Reported-by: Sebastian Ott <seb...@linux.vnet.ibm.com>
Fixes: 39ab9555c241 ('iommu: Add sysfs bindings for struct iommu_device')
Signed-off-by: Joerg Roedel <jro...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/iommu/amd_iommu_types.h | 4 +++-
drivers/iommu/intel-iommu.c | 4 +++-
drivers/iommu/iommu-sysfs.c | 34 +++++++++++++++++++++-------------
include/linux/iommu.h | 12 +++++++++++-
4 files changed, 38 insertions(+), 16 deletions(-)

--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -571,7 +571,9 @@ struct amd_iommu {

static inline struct amd_iommu *dev_to_amd_iommu(struct device *dev)
{
- return container_of(dev, struct amd_iommu, iommu.dev);
+ struct iommu_device *iommu = dev_to_iommu_device(dev);
+
+ return container_of(iommu, struct amd_iommu, iommu);
}

#define ACPIHID_UID_LEN 256
--- a/drivers/iommu/intel-iommu.c
+++ b/drivers/iommu/intel-iommu.c
@@ -4749,7 +4749,9 @@ static void intel_disable_iommus(void)

static inline struct intel_iommu *dev_to_intel_iommu(struct device *dev)
{
- return container_of(dev, struct intel_iommu, iommu.dev);
+ struct iommu_device *iommu_dev = dev_to_iommu_device(dev);
+
+ return container_of(iommu_dev, struct intel_iommu, iommu);
}

static ssize_t intel_iommu_show_version(struct device *dev,
--- a/drivers/iommu/iommu-sysfs.c
+++ b/drivers/iommu/iommu-sysfs.c
@@ -62,32 +62,40 @@ int iommu_device_sysfs_add(struct iommu_
va_list vargs;
int ret;

- device_initialize(&iommu->dev);
-
- iommu->dev.class = &iommu_class;
- iommu->dev.parent = parent;
- iommu->dev.groups = groups;
+ iommu->dev = kzalloc(sizeof(*iommu->dev), GFP_KERNEL);
+ if (!iommu->dev)
+ return -ENOMEM;
+
+ device_initialize(iommu->dev);
+
+ iommu->dev->class = &iommu_class;
+ iommu->dev->parent = parent;
+ iommu->dev->groups = groups;

va_start(vargs, fmt);
- ret = kobject_set_name_vargs(&iommu->dev.kobj, fmt, vargs);
+ ret = kobject_set_name_vargs(&iommu->dev->kobj, fmt, vargs);
va_end(vargs);
if (ret)
goto error;

- ret = device_add(&iommu->dev);
+ ret = device_add(iommu->dev);
if (ret)
goto error;

+ dev_set_drvdata(iommu->dev, iommu);
+
return 0;

error:
- put_device(&iommu->dev);
+ put_device(iommu->dev);
return ret;
}

void iommu_device_sysfs_remove(struct iommu_device *iommu)
{
- device_unregister(&iommu->dev);
+ dev_set_drvdata(iommu->dev, NULL);
+ device_unregister(iommu->dev);
+ iommu->dev = NULL;
}
/*
* IOMMU drivers can indicate a device is managed by a given IOMMU using
@@ -102,14 +110,14 @@ int iommu_device_link(struct iommu_devic
if (!iommu || IS_ERR(iommu))
return -ENODEV;

- ret = sysfs_add_link_to_group(&iommu->dev.kobj, "devices",
+ ret = sysfs_add_link_to_group(&iommu->dev->kobj, "devices",
&link->kobj, dev_name(link));
if (ret)
return ret;

- ret = sysfs_create_link_nowarn(&link->kobj, &iommu->dev.kobj, "iommu");
+ ret = sysfs_create_link_nowarn(&link->kobj, &iommu->dev->kobj, "iommu");
if (ret)
- sysfs_remove_link_from_group(&iommu->dev.kobj, "devices",
+ sysfs_remove_link_from_group(&iommu->dev->kobj, "devices",
dev_name(link));

return ret;
@@ -121,5 +129,5 @@ void iommu_device_unlink(struct iommu_de
return;

sysfs_remove_link(&link->kobj, "iommu");
- sysfs_remove_link_from_group(&iommu->dev.kobj, "devices", dev_name(link));
+ sysfs_remove_link_from_group(&iommu->dev->kobj, "devices", dev_name(link));
}
--- a/include/linux/iommu.h
+++ b/include/linux/iommu.h
@@ -240,7 +240,7 @@ struct iommu_device {
struct list_head list;
const struct iommu_ops *ops;
struct fwnode_handle *fwnode;
- struct device dev;
+ struct device *dev;
};

int iommu_device_register(struct iommu_device *iommu);
@@ -265,6 +265,11 @@ static inline void iommu_device_set_fwno
iommu->fwnode = fwnode;
}

+static inline struct iommu_device *dev_to_iommu_device(struct device *dev)
+{
+ return (struct iommu_device *)dev_get_drvdata(dev);
+}
+
#define IOMMU_GROUP_NOTIFY_ADD_DEVICE 1 /* Device added */
#define IOMMU_GROUP_NOTIFY_DEL_DEVICE 2 /* Pre Device removed */
#define IOMMU_GROUP_NOTIFY_BIND_DRIVER 3 /* Pre Driver bind */
@@ -589,6 +594,11 @@ static inline void iommu_device_set_fwno
{
}

+static inline struct iommu_device *dev_to_iommu_device(struct device *dev)
+{
+ return NULL;
+}
+
static inline void iommu_device_unregister(struct iommu_device *iommu)
{
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:11 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>

Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Cc: Jon Maloy <jon....@ericsson.com>
Cc: Ying Xue <ying...@windriver.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:11 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <be...@kernel.crashing.org>

commit 1a92a80ad386a1a6e3b36d576d52a1a456394b70 upstream.

There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: sta...@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npi...@gmail.com>
[mpe: Add comments in the code]
[mpe: Backport to 4.12, minor context change]
Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/powerpc/include/asm/mmu_context.h | 20 +++++++++++++++++++-
arch/powerpc/include/asm/pgtable-be-types.h | 1 +
arch/powerpc/include/asm/pgtable-types.h | 1 +
3 files changed, 21 insertions(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -80,9 +80,27 @@ static inline void switch_mm_irqs_off(st
struct task_struct *tsk)
{
/* Mark this context has been used on the new CPU */
- if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next)))
+ if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));

+ /*
+ * This full barrier orders the store to the cpumask above vs
+ * a subsequent operation which allows this CPU to begin loading
+ * translations for next.
+ *
+ * When using the radix MMU that operation is the load of the
+ * MMU context id, which is then moved to SPRN_PID.
+ *
+ * For the hash MMU it is either the first load from slb_cache
+ * in switch_slb(), and/or the store of paca->mm_ctx_id in
+ * copy_mm_to_paca().
+ *
+ * On the read side the barrier is in pte_xchg(), which orders
+ * the store to the PTE vs the load of mm_cpumask.
+ */
+ smp_mb();
+ }
+
/* 32-bit keeps track of the current PGDIR in the thread struct */
#ifdef CONFIG_PPC32
tsk->thread.pgdir = next->pgd;
--- a/arch/powerpc/include/asm/pgtable-be-types.h
+++ b/arch/powerpc/include/asm/pgtable-be-types.h
@@ -87,6 +87,7 @@ static inline bool pte_xchg(pte_t *ptep,
unsigned long *p = (unsigned long *)ptep;
__be64 prev;

+ /* See comment in switch_mm_irqs_off() */
prev = (__force __be64)__cmpxchg_u64(p, (__force unsigned long)pte_raw(old),
(__force unsigned long)pte_raw(new));

--- a/arch/powerpc/include/asm/pgtable-types.h
+++ b/arch/powerpc/include/asm/pgtable-types.h
@@ -62,6 +62,7 @@ static inline bool pte_xchg(pte_t *ptep,
{
unsigned long *p = (unsigned long *)ptep;

+ /* See comment in switch_mm_irqs_off() */
return pte_val(old) == __cmpxchg_u64(p, pte_val(old), pte_val(new));
}
#endif

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Westphal <f...@strlen.de>

commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream.

When doing initial conversion to rhashtable I replaced the bucket
walk with a single rhashtable_lookup_fast().

When moving to rhlist I failed to properly walk the list of identical
tuples, but that is what is needed for this to work correctly.
The table contains the original tuples, so the reply tuples are all
distinct.

We currently decide that mapping is (not) in range only based on the
first entry, but in case its not we need to try the reply tuple of the
next entry until we either find an in-range mapping or we checked
all the entries.

This bug makes nat core attempt collision resolution while it might be
able to use the mapping as-is.

Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Reported-by: Jaco Kroon <ja...@uls.co.za>
Tested-by: Jaco Kroon <ja...@uls.co.za>
Signed-off-by: Florian Westphal <f...@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/netfilter/nf_nat_core.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -222,20 +222,21 @@ find_appropriate_src(struct net *net,
.tuple = tuple,
.zone = zone
};
- struct rhlist_head *hl;
+ struct rhlist_head *hl, *h;

hl = rhltable_lookup(&nf_nat_bysource_table, &key,
nf_nat_bysource_params);
- if (!hl)
- return 0;

- ct = container_of(hl, typeof(*ct), nat_bysource);
+ rhl_for_each_entry_rcu(ct, h, hl, nat_bysource) {
+ nf_ct_invert_tuplepr(result,
+ &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
+ result->dst = tuple->dst;

- nf_ct_invert_tuplepr(result,
- &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
- result->dst = tuple->dst;
+ if (in_range(l3proto, l4proto, result, range))
+ return 1;
+ }

- return in_range(l3proto, l4proto, result, range);
+ return 0;
}

/* For [FUTURE] fragmentation handling, we want the least-used

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:38 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Nicholas Piggin <npi...@gmail.com>

commit 2fe59f507a65dbd734b990a11ebc7488f6f87a24 upstream.

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
it is marked not-idle and before the timer tick on this CPU, where a
timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

max avg std
upstream 506.0 1.20 4.68
patched 2.0 1.08 0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

max avg std
upstream 9.0 1.05 0.19
patched 2.0 1.04 0.11

Fixes: Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <npi...@gmail.com>
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Tested-by: Jonathan Cameron <Jonathan...@huawei.com>
Tested-by: David Miller <da...@davemloft.net>
Cc: dzi...@redhat.com
Cc: s...@canb.auug.org.au
Cc: m...@ellerman.id.au
Cc: Stephen Boyd <sb...@codeaurora.org>
Cc: linu...@huawei.com
Cc: abdh...@linux.vnet.ibm.com
Cc: John Stultz <john....@linaro.org>
Cc: ak...@linux-foundation.org
Cc: pau...@linux.vnet.ibm.com
Cc: torv...@linux-foundation.org
Link: http://lkml.kernel.org/r/20170822084348....@gmail.com
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/time/timer.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 41 insertions(+), 9 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -203,6 +203,7 @@ struct timer_base {
bool migration_enabled;
bool nohz_active;
bool is_idle;
+ bool must_forward_clk;
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -856,13 +857,19 @@ get_target_base(struct timer_base *base,

static inline void forward_timer_base(struct timer_base *base)
{
- unsigned long jnow = READ_ONCE(jiffies);
+ unsigned long jnow;

/*
- * We only forward the base when it's idle and we have a delta between
- * base clock and jiffies.
+ * We only forward the base when we are idle or have just come out of
+ * idle (must_forward_clk logic), and have a delta between base clock
+ * and jiffies. In the common case, run_timers will take care of it.
*/
- if (!base->is_idle || (long) (jnow - base->clk) < 2)
+ if (likely(!base->must_forward_clk))
+ return;
+
+ jnow = READ_ONCE(jiffies);
+ base->must_forward_clk = base->is_idle;
+ if ((long)(jnow - base->clk) < 2)
return;

/*
@@ -938,6 +945,11 @@ __mod_timer(struct timer_list *timer, un
* same array bucket then just return:
*/
if (timer_pending(timer)) {
+ /*
+ * The downside of this optimization is that it can result in
+ * larger granularity than you would get from adding a new
+ * timer with this expiry.
+ */
if (timer->expires == expires)
return 1;

@@ -948,6 +960,7 @@ __mod_timer(struct timer_list *timer, un
* dequeue/enqueue dance.
*/
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);

clk = base->clk;
idx = calc_wheel_index(expires, clk);
@@ -964,6 +977,7 @@ __mod_timer(struct timer_list *timer, un
}
} else {
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);
}

ret = detach_if_pending(timer, base, false);
@@ -991,12 +1005,10 @@ __mod_timer(struct timer_list *timer, un
spin_lock(&base->lock);
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | base->cpu);
+ forward_timer_base(base);
}
}

- /* Try to forward a stale timer base clock */
- forward_timer_base(base);
-
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
@@ -1112,6 +1124,7 @@ void add_timer_on(struct timer_list *tim
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | cpu);
}
+ forward_timer_base(base);

debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
@@ -1497,10 +1510,16 @@ u64 get_next_timer_interrupt(unsigned lo
if (!is_max_delta)
expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
/*
- * If we expect to sleep more than a tick, mark the base idle:
+ * If we expect to sleep more than a tick, mark the base idle.
+ * Also the tick is stopped so any added timer must forward
+ * the base clk itself to keep granularity small. This idle
+ * logic is only maintained for the BASE_STD base, deferrable
+ * timers may still see large granularity skew (by design).
*/
- if ((expires - basem) > TICK_NSEC)
+ if ((expires - basem) > TICK_NSEC) {
+ base->must_forward_clk = true;
base->is_idle = true;
+ }
}
spin_unlock(&base->lock);

@@ -1611,6 +1630,19 @@ static __latent_entropy void run_timer_s
{
struct timer_base *base = this_cpu_ptr(&timer_bases[BASE_STD]);

+ /*
+ * must_forward_clk must be cleared before running timers so that any
+ * timer functions that call mod_timer will not try to forward the
+ * base. idle trcking / clock forwarding logic is only used with
+ * BASE_STD timers.
+ *
+ * The deferrable base does not do idle tracking at all, so we do
+ * not forward it. This can result in very large variations in
+ * granularity for deferrable timers, but they can be deferred for
+ * long periods due to idle.
+ */
+ base->must_forward_clk = false;
+
__run_timers(base);
if (IS_ENABLED(CONFIG_NO_HZ_COMMON) && base->nohz_active)
__run_timers(this_cpu_ptr(&timer_bases[BASE_DEF]));

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:20:38 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Vadim Lomovtsev <vlom...@redhat.com>

commit eebe53e87f97975ee58a21693e44797608bf679c upstream.

While running nfs/connectathon tests kernel NULL-pointer exception
has been observed due to races in svcsock.c.

Race is appear when kernel accepts connection by kernel_accept
(which creates new socket) and start queuing ingress packets
to new socket. This happens in ksoftirq context which could run
concurrently on a different core while new socket setup is not done yet.

The fix is to re-order socket user data init sequence and add
write/read barrier calls to be sure that we got proper values
for callback pointers before actually calling them.

Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.

Crash log:
---<-snip->---
[ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 6708.647093] pgd = ffff0000094e0000
[ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
[ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
[ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
[ 6708.736446] ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
[ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G W OE 4.11.0-4.el7.aarch64 #1
[ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
[ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
[ 6708.773822] PC is at 0x0
[ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
[ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
[ 6708.789248] sp : ffff810ffbad3900
[ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
[ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
[ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
[ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
[ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
[ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
[ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
[ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
[ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
[ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
[ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
[ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
[ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
[ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
[ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
[ 6708.872084]
[ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
[ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
[ 6708.886075] Call trace:
[ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
[ 6708.894942] 3700: ffff810012412400 0001000000000000
[ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
[ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
[ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
[ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
[ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
[ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
[ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
[ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
[ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
[ 6708.973117] [< (null)>] (null)
[ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
[ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
[ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
[ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
[ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
[ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
[ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
[ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
[ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
[ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
[ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
[ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
[ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
[ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
[ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
[ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
[ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
[ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
[ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
[ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
[ 6709.092929] fde0: 0000810ff2ec0000 ffff000008c10000
[ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
[ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
[ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
[ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
[ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
[ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
[ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
[ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
[ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
[ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
[ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
[ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
[ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
[ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
[ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
[ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
[ 6709.207967] Code: bad PC value
[ 6709.211061] SMP: stopping secondary CPUs
[ 6709.218830] Starting crashdump kernel...
[ 6709.222749] Bye!
---<-snip>---

Signed-off-by: Vadim Lomovtsev <vlom...@redhat.com>
Reviewed-by: Jeff Layton <jla...@redhat.com>
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/sunrpc/svcsock.c | 22 ++++++++++++++++++++--
1 file changed, 20 insertions(+), 2 deletions(-)

--- a/net/sunrpc/svcsock.c
+++ b/net/sunrpc/svcsock.c
@@ -421,6 +421,9 @@ static void svc_data_ready(struct sock *
dprintk("svc: socket %p(inet %p), busy=%d\n",
svsk, sk,
test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_odata(sk);
if (!test_and_set_bit(XPT_DATA, &svsk->sk_xprt.xpt_flags))
svc_xprt_enqueue(&svsk->sk_xprt);
@@ -437,6 +440,9 @@ static void svc_write_space(struct sock
if (svsk) {
dprintk("svc: socket %p(inet %p), write_space busy=%d\n",
svsk, sk, test_bit(XPT_BUSY, &svsk->sk_xprt.xpt_flags));
+
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_owspace(sk);
svc_xprt_enqueue(&svsk->sk_xprt);
}
@@ -760,8 +766,12 @@ static void svc_tcp_listen_data_ready(st
dprintk("svc: socket %p TCP (listen) state change %d\n",
sk, sk->sk_state);

- if (svsk)
+ if (svsk) {
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_odata(sk);
+ }
+
/*
* This callback may called twice when a new connection
* is established as a child socket inherits everything
@@ -794,6 +804,8 @@ static void svc_tcp_state_change(struct
if (!svsk)
printk("svc: socket %p: no user data\n", sk);
else {
+ /* Refer to svc_setup_socket() for details. */
+ rmb();
svsk->sk_ostate(sk);
if (sk->sk_state != TCP_ESTABLISHED) {
set_bit(XPT_CLOSE, &svsk->sk_xprt.xpt_flags);
@@ -1381,12 +1393,18 @@ static struct svc_sock *svc_setup_socket
return ERR_PTR(err);
}

- inet->sk_user_data = svsk;
svsk->sk_sock = sock;
svsk->sk_sk = inet;
svsk->sk_ostate = inet->sk_state_change;
svsk->sk_odata = inet->sk_data_ready;
svsk->sk_owspace = inet->sk_write_space;
+ /*
+ * This barrier is necessary in order to prevent race condition
+ * with svc_data_ready(), svc_listen_data_ready() and others
+ * when calling callbacks above.
+ */
+ wmb();
+ inet->sk_user_data = svsk;

/* Initialize the socket */
if (sock->type == SOCK_DGRAM)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:21:36 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <jeffy...@rock-chips.com>

commit 5da8e47d849d3d37b14129f038782a095b9ad049 upstream.

It looks like hidp_session_thread has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <jeffy...@rock-chips.com>
Tested-by: AL Yu-Chen Cho <ac...@suse.com>
Tested-by: Rohit Vaswani <rvas...@nvidia.com>
Signed-off-by: Marcel Holtmann <mar...@holtmann.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/bluetooth/hidp/core.c | 33 ++++++++++++++++++++++-----------
1 file changed, 22 insertions(+), 11 deletions(-)

--- a/net/bluetooth/hidp/core.c
+++ b/net/bluetooth/hidp/core.c
@@ -36,6 +36,7 @@
#define VERSION "1.2"

static DECLARE_RWSEM(hidp_session_sem);
+static DECLARE_WAIT_QUEUE_HEAD(hidp_session_wq);
static LIST_HEAD(hidp_session_list);

static unsigned char hidp_keycode[256] = {
@@ -1068,12 +1069,12 @@ static int hidp_session_start_sync(struc
* Wake up session thread and notify it to stop. This is asynchronous and
* returns immediately. Call this whenever a runtime error occurs and you want
* the session to stop.
- * Note: wake_up_process() performs any necessary memory-barriers for us.
+ * Note: wake_up_interruptible() performs any necessary memory-barriers for us.
*/
static void hidp_session_terminate(struct hidp_session *session)
{
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+ wake_up_interruptible(&hidp_session_wq);
}

/*
@@ -1180,7 +1181,9 @@ static void hidp_session_run(struct hidp
struct sock *ctrl_sk = session->ctrl_sock->sk;
struct sock *intr_sk = session->intr_sock->sk;
struct sk_buff *skb;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

+ add_wait_queue(&hidp_session_wq, &wait);
for (;;) {
/*
* This thread can be woken up two ways:
@@ -1188,12 +1191,10 @@ static void hidp_session_run(struct hidp
* session->terminate flag and wakes this thread up.
* - Via modifying the socket state of ctrl/intr_sock. This
* thread is woken up by ->sk_state_changed().
- *
- * Note: set_current_state() performs any necessary
- * memory-barriers for us.
*/
- set_current_state(TASK_INTERRUPTIBLE);

+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();
if (atomic_read(&session->terminate))
break;

@@ -1227,11 +1228,22 @@ static void hidp_session_run(struct hidp
hidp_process_transmit(session, &session->ctrl_transmit,
session->ctrl_sock);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
+ remove_wait_queue(&hidp_session_wq, &wait);

atomic_inc(&session->terminate);
- set_current_state(TASK_RUNNING);
+
+ /* Ensure session->terminate is updated */
+ smp_mb__after_atomic();
+}
+
+static int hidp_session_wake_function(wait_queue_t *wait,
+ unsigned int mode,
+ int sync, void *key)
+{
+ wake_up_interruptible(&hidp_session_wq);
+ return false;
}

/*
@@ -1244,7 +1256,8 @@ static void hidp_session_run(struct hidp
static int hidp_session_thread(void *arg)
{
struct hidp_session *session = arg;
- wait_queue_t ctrl_wait, intr_wait;
+ DEFINE_WAIT_FUNC(ctrl_wait, hidp_session_wake_function);
+ DEFINE_WAIT_FUNC(intr_wait, hidp_session_wake_function);

BT_DBG("session %p", session);

@@ -1254,8 +1267,6 @@ static int hidp_session_thread(void *arg
set_user_nice(current, -15);
hidp_set_timer(session);

- init_waitqueue_entry(&ctrl_wait, current);
- init_waitqueue_entry(&intr_wait, current);
add_wait_queue(sk_sleep(session->ctrl_sock->sk), &ctrl_wait);
add_wait_queue(sk_sleep(session->intr_sock->sk), &intr_wait);
/* This memory barrier is paired with wq_has_sleeper(). See

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:25:29 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Westphal <f...@strlen.de>

commit 97772bcd56efa21d9d8976db6f205574ea602f51 upstream.

When doing initial conversion to rhashtable I replaced the bucket
walk with a single rhashtable_lookup_fast().

When moving to rhlist I failed to properly walk the list of identical
tuples, but that is what is needed for this to work correctly.
The table contains the original tuples, so the reply tuples are all
distinct.

We currently decide that mapping is (not) in range only based on the
first entry, but in case its not we need to try the reply tuple of the
next entry until we either find an in-range mapping or we checked
all the entries.

This bug makes nat core attempt collision resolution while it might be
able to use the mapping as-is.

Fixes: 870190a9ec90 ("netfilter: nat: convert nat bysrc hash to rhashtable")
Reported-by: Jaco Kroon <ja...@uls.co.za>
Tested-by: Jaco Kroon <ja...@uls.co.za>
Signed-off-by: Florian Westphal <f...@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/netfilter/nf_nat_core.c | 17 +++++++++--------
1 file changed, 9 insertions(+), 8 deletions(-)

--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -225,20 +225,21 @@ find_appropriate_src(struct net *net,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:25:30 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <be...@kernel.crashing.org>

commit 1a92a80ad386a1a6e3b36d576d52a1a456394b70 upstream.

There is no guarantee that the various isync's involved with
the context switch will order the update of the CPU mask with
the first TLB entry for the new context being loaded by the HW.

Be safe here and add a memory barrier to order any subsequent
load/store which may bring entries into the TLB.

The corresponding barrier on the other side already exists as
pte updates use pte_xchg() which uses __cmpxchg_u64 which has
a sync after the atomic operation.

Cc: sta...@vger.kernel.org
Signed-off-by: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Reviewed-by: Nicholas Piggin <npi...@gmail.com>
[mpe: Add comments in the code]
[mpe: Backport to 4.12, minor context change]
Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/powerpc/include/asm/mmu_context.h | 20 +++++++++++++++++++-
arch/powerpc/include/asm/pgtable-be-types.h | 1 +
arch/powerpc/include/asm/pgtable-types.h | 1 +
3 files changed, 21 insertions(+), 1 deletion(-)

--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -75,9 +75,27 @@ static inline void switch_mm_irqs_off(st
struct task_struct *tsk)
{
/* Mark this context has been used on the new CPU */
- if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next)))
+ if (!cpumask_test_cpu(smp_processor_id(), mm_cpumask(next))) {
cpumask_set_cpu(smp_processor_id(), mm_cpumask(next));

+ /*

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:26:01 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Charles Milette <charles...@gmail.com>

commit f299aec6ebd747298e35934cff7709c6b119ca52 upstream.

Add support for USB Device Rosewill RNX-N150NUB.
VendorID: 0x0bda, ProductID: 0xffef

Signed-off-by: Charles Milette <charles...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:26:28 AM8/28/17
to
4.4-stable review patch. If anyone has any objections, please let me know.

------------------
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:50:07 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/time/timer.c | 50 +++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 41 insertions(+), 9 deletions(-)

--- a/kernel/time/timer.c
+++ b/kernel/time/timer.c
@@ -201,6 +201,7 @@ struct timer_base {
bool migration_enabled;
bool nohz_active;
bool is_idle;
+ bool must_forward_clk;
DECLARE_BITMAP(pending_map, WHEEL_SIZE);
struct hlist_head vectors[WHEEL_SIZE];
} ____cacheline_aligned;
@@ -891,13 +892,19 @@ get_target_base(struct timer_base *base,

static inline void forward_timer_base(struct timer_base *base)
{
- unsigned long jnow = READ_ONCE(jiffies);
+ unsigned long jnow;

/*
- * We only forward the base when it's idle and we have a delta between
- * base clock and jiffies.
+ * We only forward the base when we are idle or have just come out of
+ * idle (must_forward_clk logic), and have a delta between base clock
+ * and jiffies. In the common case, run_timers will take care of it.
*/
- if (!base->is_idle || (long) (jnow - base->clk) < 2)
+ if (likely(!base->must_forward_clk))
+ return;
+
+ jnow = READ_ONCE(jiffies);
+ base->must_forward_clk = base->is_idle;
+ if ((long)(jnow - base->clk) < 2)
return;

/*
@@ -973,6 +980,11 @@ __mod_timer(struct timer_list *timer, un
* same array bucket then just return:
*/
if (timer_pending(timer)) {
+ /*
+ * The downside of this optimization is that it can result in
+ * larger granularity than you would get from adding a new
+ * timer with this expiry.
+ */
if (timer->expires == expires)
return 1;

@@ -983,6 +995,7 @@ __mod_timer(struct timer_list *timer, un
* dequeue/enqueue dance.
*/
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);

clk = base->clk;
idx = calc_wheel_index(expires, clk);
@@ -999,6 +1012,7 @@ __mod_timer(struct timer_list *timer, un
}
} else {
base = lock_timer_base(timer, &flags);
+ forward_timer_base(base);
}

timer_stats_timer_set_start_info(timer);
@@ -1028,12 +1042,10 @@ __mod_timer(struct timer_list *timer, un
spin_lock(&base->lock);
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | base->cpu);
+ forward_timer_base(base);
}
}

- /* Try to forward a stale timer base clock */
- forward_timer_base(base);
-
timer->expires = expires;
/*
* If 'idx' was calculated above and the base time did not advance
@@ -1150,6 +1162,7 @@ void add_timer_on(struct timer_list *tim
WRITE_ONCE(timer->flags,
(timer->flags & ~TIMER_BASEMASK) | cpu);
}
+ forward_timer_base(base);

debug_activate(timer, timer->expires);
internal_add_timer(base, timer);
@@ -1538,10 +1551,16 @@ u64 get_next_timer_interrupt(unsigned lo
if (!is_max_delta)
expires = basem + (u64)(nextevt - basej) * TICK_NSEC;
/*
- * If we expect to sleep more than a tick, mark the base idle:
+ * If we expect to sleep more than a tick, mark the base idle.
+ * Also the tick is stopped so any added timer must forward
+ * the base clk itself to keep granularity small. This idle
+ * logic is only maintained for the BASE_STD base, deferrable
+ * timers may still see large granularity skew (by design).
*/
- if ((expires - basem) > TICK_NSEC)
+ if ((expires - basem) > TICK_NSEC) {
+ base->must_forward_clk = true;
base->is_idle = true;
+ }
}
spin_unlock(&base->lock);

@@ -1651,6 +1670,19 @@ static __latent_entropy void run_timer_s

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:50:16 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <jeffy...@rock-chips.com>

commit 5da8e47d849d3d37b14129f038782a095b9ad049 upstream.

It looks like hidp_session_thread has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <jeffy...@rock-chips.com>
Tested-by: AL Yu-Chen Cho <ac...@suse.com>
Tested-by: Rohit Vaswani <rvas...@nvidia.com>
Signed-off-by: Marcel Holtmann <mar...@holtmann.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---

Greg Kroah-Hartman

unread,
Aug 28, 2017, 4:50:33 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit 81fbfe8adaf38d4f5a98c19bebfd41c5d6acaee8 ]

As found by syzkaller, malicious users can set whatever tx_queue_len
on a tun device and eventually crash the kernel.

Lets remove the ALIGN(XXX, SMP_CACHE_BYTES) thing since a small
ring buffer is not fast anyway.

Fixes: 2e0ab8ca83c1 ("ptr_ring: array based FIFO for pointers")
Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Cc: Michael S. Tsirkin <m...@redhat.com>
Cc: Jason Wang <jaso...@redhat.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
include/linux/ptr_ring.h | 9 +++++----
include/linux/skb_array.h | 3 ++-
2 files changed, 7 insertions(+), 5 deletions(-)

--- a/include/linux/ptr_ring.h
+++ b/include/linux/ptr_ring.h
@@ -340,9 +340,9 @@ static inline void *ptr_ring_consume_bh(
__PTR_RING_PEEK_CALL_v; \
})

-static inline void **__ptr_ring_init_queue_alloc(int size, gfp_t gfp)
+static inline void **__ptr_ring_init_queue_alloc(unsigned int size, gfp_t gfp)
{
- return kzalloc(ALIGN(size * sizeof(void *), SMP_CACHE_BYTES), gfp);
+ return kcalloc(size, sizeof(void *), gfp);
}

static inline int ptr_ring_init(struct ptr_ring *r, int size, gfp_t gfp)
@@ -417,7 +417,8 @@ static inline int ptr_ring_resize(struct
* In particular if you consume ring in interrupt or BH context, you must
* disable interrupts/BH when doing so.
*/
-static inline int ptr_ring_resize_multiple(struct ptr_ring **rings, int nrings,
+static inline int ptr_ring_resize_multiple(struct ptr_ring **rings,
+ unsigned int nrings,
int size,
gfp_t gfp, void (*destroy)(void *))
{
@@ -425,7 +426,7 @@ static inline int ptr_ring_resize_multip

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:00:07 AM8/28/17
to
4.9-stable review patch. If anyone has any objections, please let me know.

------------------

From: Masaki Ota <masak...@jp.alps.com>

commit 4a646580f793d19717f7e034c8d473b509c27d49 upstream.

Fixed the issue that two finger scroll does not work correctly
on V8 protocol. The cause is that V8 protocol X-coordinate decode
is wrong at SS4 PLUS device. I added SS4 PLUS X decode definition.

Mote notes:
the problem manifests itself by the commit e7348396c6d5 ("Input: ALPS
- fix V8+ protocol handling (73 03 28)"), where a fix for the V8+
protocol was applied. Although the culprit must have been present
beforehand, the two-finger scroll worked casually even with the
wrongly reported values by some reason. It got broken by the commit
above just because it changed x_max value, and this made libinput
correctly figuring the MT events. Since the X coord is reported as
falsely doubled, the events on the right-half side go outside the
boundary, thus they are no longer handled. This resulted as a broken
two-finger scroll.

One finger event is decoded differently, and it didn't suffer from
this problem. The problem was only about MT events. --tiwai

Fixes: e7348396c6d5 ("Input: ALPS - fix V8+ protocol handling (73 03 28)")
Signed-off-by: Masaki Ota <masak...@jp.alps.com>
Tested-by: Takashi Iwai <ti...@suse.de>
Tested-by: Paul Donohue <linux-...@PaulSD.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/input/mouse/alps.c | 41 +++++++++++++++++++++++++++++++----------
drivers/input/mouse/alps.h | 8 ++++++++
2 files changed, 39 insertions(+), 10 deletions(-)

--- a/drivers/input/mouse/alps.c
+++ b/drivers/input/mouse/alps.c
@@ -1212,14 +1212,24 @@ static int alps_decode_ss4_v2(struct alp
@@ -1236,16 +1246,27 @@ static int alps_decode_ss4_v2(struct alp
@@ -2535,8 +2556,8 @@ static int alps_set_defaults_ss4_v2(stru

memset(otp, 0, sizeof(otp));

- if (alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]) ||
- alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]))
+ if (alps_get_otp_values_ss4_v2(psmouse, 1, &otp[1][0]) ||
+ alps_get_otp_values_ss4_v2(psmouse, 0, &otp[0][0]))
return -1;

alps_update_device_area_ss4_v2(otp, priv);
--- a/drivers/input/mouse/alps.h
+++ b/drivers/input/mouse/alps.h
@@ -91,6 +91,10 @@ enum SS4_PACKET_ID {
((_b[1 + _i * 3] << 5) & 0x1F00) \
)

+#define SS4_PLUS_STD_MF_X_V2(_b, _i) (((_b[0 + (_i) * 3] << 4) & 0x0070) | \
+ ((_b[1 + (_i) * 3] << 4) & 0x0F80) \
+ )
+
#define SS4_STD_MF_Y_V2(_b, _i) (((_b[1 + (_i) * 3] << 3) & 0x0010) | \
((_b[2 + (_i) * 3] << 5) & 0x01E0) | \
((_b[2 + (_i) * 3] << 4) & 0x0E00) \
@@ -100,6 +104,10 @@ enum SS4_PACKET_ID {

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <ros...@goodmis.org>

commit a8f0f9e49956a74718874b800251455680085600 upstream.

There's a small race when function graph shutsdown and the calling of the
registered function graph entry callback. The callback must not reference
the task's ret_stack without first checking that it is not NULL. Note, when
a ret_stack is allocated for a task, it stays allocated until the task exits.
The problem here, is that function_graph is shutdown, and a new task was
created, which doesn't have its ret_stack allocated. But since some of the
functions are still being traced, the callbacks can still be called.

The normal function_graph code handles this, but starting with commit
8861dd303c ("ftrace: Access ret_stack->subtime only in the function
profiler") the profiler code references the ret_stack on function entry, but
doesn't check if it is NULL first.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=196611

Fixes: 8861dd303c ("ftrace: Access ret_stack->subtime only in the function profiler")
Reported-by: lily...@gmail.com
Signed-off-by: Steven Rostedt (VMware) <ros...@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/trace/ftrace.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -878,6 +878,10 @@ static int profile_graph_entry(struct ft

function_profile_call(trace->func, 0, NULL, NULL);

+ /* If function graph is shutting down, ret_stack can be NULL */
+ if (!current->ret_stack)
+ return 0;
+
if (index >= 0 && index < FTRACE_RETFUNC_DEPTH)
current->ret_stack[index].subtime = 0;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Logan Gunthorpe <log...@deltatee.com>

commit 0eb46345364d7318b11068c46e8a68d5dc10f65e upstream.

After the link tests, there is a race on one side of the test for
the link coming up. It's possible, in some cases, for the test script
to write to the 'peer_trans' files before the link has come up.

To fix this, we simply use the link event file to ensure both sides
see the link as up before continuning.

Signed-off-by: Logan Gunthorpe <log...@deltatee.com>
Acked-by: Allen Hubbe <Allen...@dell.com>
Signed-off-by: Jon Mason <jdm...@kudzu.us>
Fixes: a9c59ef77458 ("ntb_test: Add a selftest script for the NTB subsystem")
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
tools/testing/selftests/ntb/ntb_test.sh | 4 ++++
1 file changed, 4 insertions(+)

--- a/tools/testing/selftests/ntb/ntb_test.sh
+++ b/tools/testing/selftests/ntb/ntb_test.sh
@@ -326,6 +326,10 @@ function ntb_tool_tests()
link_test $LOCAL_TOOL $REMOTE_TOOL
link_test $REMOTE_TOOL $LOCAL_TOOL

+ #Ensure the link is up on both sides before continuing
+ write_file Y $LOCAL_TOOL/link_event
+ write_file Y $REMOTE_TOOL/link_event
+
for PEER_TRANS in $(ls $LOCAL_TOOL/peer_trans*); do
PT=$(basename $PEER_TRANS)
write_file $MW_SIZE $LOCAL_TOOL/$PT

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Srinivas Pandruvada <srinivas....@linux.intel.com>

commit f1664eaacec31035450132c46ed2915fd2b2049a upstream.

It has been reported for a while that with iio-sensor-proxy service the
rotation only works after one suspend/resume cycle. This required a wait
in the systemd unit file to avoid race. I found a Yoga 900 where I could
reproduce this.

The problem scenerio is:
- During sensor driver init, enable run time PM and also set a
auto-suspend for 3 seconds.
This result in one runtime resume. But there is a check to avoid
a powerup in this sequence, but rpm is active
- User space iio-sensor-proxy tries to power up the sensor. Since rpm is
active it will simply return. But sensors were not actually
powered up in the prior sequence, so actaully the sensors will not work
- After 3 seconds the auto suspend kicks

If we add a wait in systemd service file to fire iio-sensor-proxy after
3 seconds, then now everything will work as the runtime resume will
actually powerup the sensor as this is a user request.

To avoid this:
- Remove the check to match user requested state, this will cause a
brief powerup, but if the iio-sensor-proxy starts immediately it will
still work as the sensors are ON.
- Also move the autosuspend delay to place when user requested turn off
of sensors, like after user finished raw read or buffer disable

Signed-off-by: Srinivas Pandruvada <srinivas....@linux.intel.com>
Tested-by: Bastien Nocera <had...@hadess.net>
Signed-off-by: Jonathan Cameron <Jonathan...@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/iio/common/hid-sensors/hid-sensor-trigger.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/iio/common/hid-sensors/hid-sensor-trigger.c
+++ b/drivers/iio/common/hid-sensors/hid-sensor-trigger.c
@@ -36,8 +36,6 @@ static int _hid_sensor_power_state(struc
s32 poll_value = 0;

if (state) {
- if (!atomic_read(&st->user_requested_state))
- return 0;
if (sensor_hub_device_open(st->hsdev))
return -EIO;

@@ -86,6 +84,9 @@ static int _hid_sensor_power_state(struc
&report_val);
}

+ pr_debug("HID_SENSOR %s set power_state %d report_state %d\n",
+ st->pdev->name, state_val, report_val);
+
sensor_hub_get_feature(st->hsdev, st->power_state.report_id,
st->power_state.index,
sizeof(state_val), &state_val);
@@ -107,6 +108,7 @@ int hid_sensor_power_state(struct hid_se
ret = pm_runtime_get_sync(&st->pdev->dev);
else {
pm_runtime_mark_last_busy(&st->pdev->dev);
+ pm_runtime_use_autosuspend(&st->pdev->dev);
ret = pm_runtime_put_autosuspend(&st->pdev->dev);
}
if (ret < 0) {
@@ -205,8 +207,6 @@ int hid_sensor_setup_trigger(struct iio_
/* Default to 3 seconds, but can be changed from sysfs */
pm_runtime_set_autosuspend_delay(&attrb->pdev->dev,
3000);
- pm_runtime_use_autosuspend(&attrb->pdev->dev);
-
return ret;
error_unreg_trigger:
iio_trigger_unregister(trig);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.ca...@oracle.com>

commit 147d88e0b5eb90191bc5c12ca0a3c410b75a13d2 upstream.

If ring_buffer_alloc() or one of the next couple function calls fail
then we should return -ENOMEM but the current code returns success.

Link: http://lkml.kernel.org/r/20170801110201.ajdkct7vwzixahvx@mwanda

Cc: Sebastian Andrzej Siewior <big...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Fixes: b32614c03413 ('tracing/rb: Convert to hotplug state machine')
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Steven Rostedt (VMware) <ros...@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/trace/trace.c | 1 +
1 file changed, 1 insertion(+)

--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -8110,6 +8110,7 @@ __init static int tracer_alloc_buffers(v
if (ret < 0)
goto out_free_cpumask;
/* Used for event triggers */
+ ret = -ENOMEM;
temp_buffer = ring_buffer_alloc(PAGE_SIZE, RB_FL_OVERWRITE);
if (!temp_buffer)
goto out_rm_hp_state;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lorenzo Bianconi <lorenzo.b...@gmail.com>

commit 8b35a5f87a73842601cd376e0f5b9b25831390f4 upstream.

Remove IRQ active low support for LSM303AGR since the sensor does not
support that capability for data-ready line

Fixes: a9fd053b56c6 (iio: st_sensors: support active-low interrupts)
Signed-off-by: Lorenzo Bianconi <lorenzo....@st.com>
Reviewed-by: Linus Walleij <linus....@linaro.org>
Signed-off-by: Jonathan Cameron <Jonathan...@huawei.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/iio/magnetometer/st_magn_core.c | 2 --
1 file changed, 2 deletions(-)

--- a/drivers/iio/magnetometer/st_magn_core.c
+++ b/drivers/iio/magnetometer/st_magn_core.c
@@ -356,8 +356,6 @@ static const struct st_sensor_settings s
.drdy_irq = {
.addr = 0x62,
.mask_int1 = 0x01,
- .addr_ihl = 0x63,
- .mask_ihl = 0x04,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mateusz Jurczyk <mjur...@google.com>

commit f55ce7b024090a51382ccab2730b96e2f7b4e9cf upstream.

Verify that the length of the socket buffer is sufficient to cover the
nlmsghdr structure before accessing the nlh->nlmsg_len field for further
input sanitization. If the client only supplies 1-3 bytes of data in
sk_buff, then nlh->nlmsg_len remains partially uninitialized and
contains leftover memory from the corresponding kernel allocation.
Operating on such data may result in indeterminate evaluation of the
nlmsg_len < NLMSG_HDRLEN expression.

The bug was discovered by a runtime instrumentation designed to detect
use of uninitialized memory in the kernel. The patch prevents this and
other similar tools (e.g. KMSAN) from flagging this behavior in the future.

Signed-off-by: Mateusz Jurczyk <mjur...@google.com>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
Cc: Florian Westphal <f...@strlen.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/netfilter/nfnetlink.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)

--- a/net/netfilter/nfnetlink.c
+++ b/net/netfilter/nfnetlink.c
@@ -463,8 +463,7 @@ static void nfnetlink_rcv_skb_batch(stru
if (msglen > skb->len)
msglen = skb->len;

- if (nlh->nlmsg_len < NLMSG_HDRLEN ||
- skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg))
+ if (skb->len < NLMSG_HDRLEN + sizeof(struct nfgenmsg))
return;

err = nla_parse(cda, NFNL_BATCH_MAX, attr, attrlen, nfnl_batch_policy,
@@ -491,7 +490,8 @@ static void nfnetlink_rcv(struct sk_buff
{
struct nlmsghdr *nlh = nlmsg_hdr(skb);

- if (nlh->nlmsg_len < NLMSG_HDRLEN ||
+ if (skb->len < NLMSG_HDRLEN ||
+ nlh->nlmsg_len < NLMSG_HDRLEN ||
skb->len < nlh->nlmsg_len)
return;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:07 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Jiang <dave....@intel.com>

commit f3fd2afed8eee91620d05b69ab94c14793c849d7 upstream.

It seems that under certain scenarios the SPAD can have bogus values caused
by an agent (i.e. BIOS or other software) that is not the kernel driver, and
that causes memory window setup failure. This should not cause the link to
be disabled because if we do that, the driver will never recover again. We
have verified in testing that this issue happens and prevents proper link
recovery.

Signed-off-by: Dave Jiang <dave....@intel.com>
Acked-by: Allen Hubbe <Allen...@dell.com>
Signed-off-by: Jon Mason <jdm...@kudzu.us>
Fixes: 84f766855f61 ("ntb: stop link work when we do not have memory")
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/ntb/ntb_transport.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/drivers/ntb/ntb_transport.c
+++ b/drivers/ntb/ntb_transport.c
@@ -920,10 +920,8 @@ out1:
ntb_free_mw(nt, i);

/* if there's an actual failure, we should just bail */
- if (rc < 0) {
- ntb_link_disable(ndev);
+ if (rc < 0)
return;
- }

out:
if (ntb_link_is_up(ndev, NULL, NULL) == 1)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torv...@linux-foundation.org>

commit 0cc3b0ec23ce4c69e1e890ed2b8d2fa932b14aad upstream.

We have a MAX_LFS_FILESIZE macro that is meant to be filled in by
filesystems (and other IO targets) that know they are 64-bit clean and
don't have any 32-bit limits in their IO path.

It turns out that our 32-bit value for that limit was bogus. On 32-bit,
the VM layer is limited by the page cache to only 32-bit index values,
but our logic for that was confusing and actually wrong. We used to
define that value to

(((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)

which is actually odd in several ways: it limits the index to 31 bits,
and then it limits files so that they can't have data in that last byte
of a page that has the highest 31-bit index (ie page index 0x7fffffff).

Neither of those limitations make sense. The index is actually the full
32 bit unsigned value, and we can use that whole full page. So the
maximum size of the file would logically be "PAGE_SIZE << BITS_PER_LONG".

However, we do wan tto avoid the maximum index, because we have code
that iterates over the page indexes, and we don't want that code to
overflow. So the maximum size of a file on a 32-bit host should
actually be one page less than the full 32-bit index.

So the actual limit is ULONG_MAX << PAGE_SHIFT. That means that we will
not actually be using the page of that last index (ULONG_MAX), but we
can grow a file up to that limit.

The wrong value of MAX_LFS_FILESIZE actually caused problems for Doug
Nazar, who was still using a 32-bit host, but with a 9.7TB 2 x RAID5
volume. It turns out that our old MAX_LFS_FILESIZE was 8TiB (well, one
byte less), but the actual true VM limit is one page less than 16TiB.

This was invisible until commit c2a9737f45e2 ("vfs,mm: fix a dead loop
in truncate_inode_pages_range()"), which started applying that
MAX_LFS_FILESIZE limit to block devices too.

NOTE! On 64-bit, the page index isn't a limiter at all, and the limit is
actually just the offset type itself (loff_t), which is signed. But for
clarity, on 64-bit, just use the maximum signed value, and don't make
people have to count the number of 'f' characters in the hex constant.

So just use LLONG_MAX for the 64-bit case. That was what the value had
been before too, just written out as a hex constant.

Fixes: c2a9737f45e2 ("vfs,mm: fix a dead loop in truncate_inode_pages_range()")
Reported-and-tested-by: Doug Nazar <naz...@nazar.ca>
Cc: Andreas Dilger <adi...@dilger.ca>
Cc: Mark Fasheh <mfa...@versity.com>
Cc: Joel Becker <jl...@evilplan.org>
Cc: Dave Kleikamp <sha...@kernel.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
include/linux/fs.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -891,9 +891,9 @@ static inline struct file *get_file(stru
/* Page cache limit. The filesystems should put that into their s_maxbytes
limits, otherwise bad things can happen in VM. */
#if BITS_PER_LONG==32
-#define MAX_LFS_FILESIZE (((loff_t)PAGE_SIZE << (BITS_PER_LONG-1))-1)
+#define MAX_LFS_FILESIZE ((loff_t)ULONG_MAX << PAGE_SHIFT)
#elif BITS_PER_LONG==64
-#define MAX_LFS_FILESIZE ((loff_t)0x7fffffffffffffffLL)
+#define MAX_LFS_FILESIZE ((loff_t)LLONG_MAX)
#endif

#define FL_POSIX 1

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <jeffy...@rock-chips.com>

commit 25717382c1dd0ddced2059053e3ca5088665f7a5 upstream.

It looks like bnep_session has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <jeffy...@rock-chips.com>
Reviewed-by: Brian Norris <brian...@chromium.org>
Reviewed-by: AL Yu-Chen Cho <ac...@suse.com>
Signed-off-by: Marcel Holtmann <mar...@holtmann.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/bluetooth/bnep/core.c | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)

--- a/net/bluetooth/bnep/core.c
+++ b/net/bluetooth/bnep/core.c
@@ -484,16 +484,16 @@ static int bnep_session(void *arg)
struct net_device *dev = s->dev;
struct sock *sk = s->sock->sk;
struct sk_buff *skb;
- wait_queue_t wait;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

BT_DBG("");

set_user_nice(current, -15);

- init_waitqueue_entry(&wait, current);
add_wait_queue(sk_sleep(sk), &wait);
while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();

if (atomic_read(&s->terminate))
break;
@@ -515,9 +515,8 @@ static int bnep_session(void *arg)
break;
netif_wake_queue(dev);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
remove_wait_queue(sk_sleep(sk), &wait);

/* Cleanup session */
@@ -666,7 +665,7 @@ int bnep_del_connection(struct bnep_conn
s = __bnep_get_session(req->dst);
if (s) {
atomic_inc(&s->terminate);
- wake_up_process(s->task);
+ wake_up_interruptible(sk_sleep(s->sock->sk));
} else
err = -ENOENT;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ronnie Sahlberg <lsah...@redhat.com>

commit d3edede29f74d335f81d95a4588f5f136a9f7dcf upstream.

Add checking for the path component length and verify it is <= the maximum
that the server advertizes via FileFsAttributeInformation.

With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
when users to access an overlong path.

To test this, try to cd into a (non-existing) directory on a CIFS share
that has a too long name:
cd /mnt/aaaaaaaaaaaaaaa...

and it now should show a good error message from the shell:
bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long

rh bz 1153996

Signed-off-by: Ronnie Sahlberg <lsah...@redhat.com>
Signed-off-by: Steve French <smfr...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
fs/cifs/dir.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)

--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -194,15 +194,20 @@ cifs_bp_rename_retry:
}

/*
+ * Don't allow path components longer than the server max.
* Don't allow the separator character in a path component.
* The VFS will not allow "/", but "\" is allowed by posix.
*/
static int
-check_name(struct dentry *direntry)
+check_name(struct dentry *direntry, struct cifs_tcon *tcon)
{
struct cifs_sb_info *cifs_sb = CIFS_SB(direntry->d_sb);
int i;

+ if (unlikely(direntry->d_name.len >
+ tcon->fsAttrInfo.MaxPathNameComponentLength))
+ return -ENAMETOOLONG;
+
if (!(cifs_sb->mnt_cifs_flags & CIFS_MOUNT_POSIX_PATHS)) {
for (i = 0; i < direntry->d_name.len; i++) {
if (direntry->d_name.name[i] == '\\') {
@@ -500,10 +505,6 @@ cifs_atomic_open(struct inode *inode, st
return finish_no_open(file, res);
}

- rc = check_name(direntry);
- if (rc)
- return rc;
-
xid = get_xid();

cifs_dbg(FYI, "parent inode = 0x%p name is: %pd and dentry = 0x%p\n",
@@ -516,6 +517,11 @@ cifs_atomic_open(struct inode *inode, st
}

tcon = tlink_tcon(tlink);
+
+ rc = check_name(direntry, tcon);
+ if (rc)
+ goto out_free_xid;
+
server = tcon->ses->server;

if (server->ops->new_lease_key)
@@ -776,7 +782,7 @@ cifs_lookup(struct inode *parent_dir_ino
}
pTcon = tlink_tcon(tlink);

- rc = check_name(direntry);
+ rc = check_name(direntry, pTcon);
if (rc)
goto lookup_out;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Jeffy Chen <jeffy...@rock-chips.com>

commit f06d977309d09253c744e54e75c5295ecc52b7b4 upstream.

It looks like cmtp_session has same pattern as the issue reported in
old rfcomm:

while (1) {
set_current_state(TASK_INTERRUPTIBLE);
if (condition)
break;
// may call might_sleep here
schedule();
}
__set_current_state(TASK_RUNNING);

Which fixed at:
dfb2fae Bluetooth: Fix nested sleeps

So let's fix it at the same way, also follow the suggestion of:
https://lwn.net/Articles/628628/

Signed-off-by: Jeffy Chen <jeffy...@rock-chips.com>
Reviewed-by: Brian Norris <brian...@chromium.org>
Reviewed-by: AL Yu-Chen Cho <ac...@suse.com>
Signed-off-by: Marcel Holtmann <mar...@holtmann.org>
Cc: Jiri Slaby <jsl...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/bluetooth/cmtp/core.c | 17 ++++++++++-------
1 file changed, 10 insertions(+), 7 deletions(-)

--- a/net/bluetooth/cmtp/core.c
+++ b/net/bluetooth/cmtp/core.c
@@ -280,16 +280,16 @@ static int cmtp_session(void *arg)
struct cmtp_session *session = arg;
struct sock *sk = session->sock->sk;
struct sk_buff *skb;
- wait_queue_t wait;
+ DEFINE_WAIT_FUNC(wait, woken_wake_function);

BT_DBG("session %p", session);

set_user_nice(current, -15);

- init_waitqueue_entry(&wait, current);
add_wait_queue(sk_sleep(sk), &wait);
while (1) {
- set_current_state(TASK_INTERRUPTIBLE);
+ /* Ensure session->terminate is updated */
+ smp_mb__before_atomic();

if (atomic_read(&session->terminate))
break;
@@ -306,9 +306,8 @@ static int cmtp_session(void *arg)

cmtp_process_transmit(session);

- schedule();
+ wait_woken(&wait, TASK_INTERRUPTIBLE, MAX_SCHEDULE_TIMEOUT);
}
- __set_current_state(TASK_RUNNING);
remove_wait_queue(sk_sleep(sk), &wait);

down_write(&cmtp_session_sem);
@@ -393,7 +392,7 @@ int cmtp_add_connection(struct cmtp_conn
err = cmtp_attach_device(session);
if (err < 0) {
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+ wake_up_interruptible(sk_sleep(session->sock->sk));
up_write(&cmtp_session_sem);
return err;
}
@@ -431,7 +430,11 @@ int cmtp_del_connection(struct cmtp_conn

/* Stop session thread */
atomic_inc(&session->terminate);
- wake_up_process(session->task);
+
+ /* Ensure session->terminate is updated */
+ smp_mb__after_atomic();
+
+ wake_up_interruptible(sk_sleep(session->sock->sk));
} else
err = -ENOENT;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Todd Kjos <tk...@android.com>

commit c4ea41ba195d01c9af66fb28711a16cc97caa9c5 upstream.

The binder allocator assumes that the thread that
called binder_open will never die for the lifetime of
that proc. That thread is normally the group_leader,
however it may not be. Use the group_leader instead
of current.

Signed-off-by: Todd Kjos <tk...@google.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/android/binder.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3460,8 +3460,8 @@ static int binder_open(struct inode *nod
proc = kzalloc(sizeof(*proc), GFP_KERNEL);
if (proc == NULL)
return -ENOMEM;
- get_task_struct(current);
- proc->tsk = current;
+ get_task_struct(current->group_leader);
+ proc->tsk = current->group_leader;
INIT_LIST_HEAD(&proc->todo);
init_waitqueue_head(&proc->wait);
proc->default_priority = task_nice(current);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Martijn Coenen <ma...@android.com>

commit b2a6d1b999a4c13e5997bb864694e77172d45250 upstream.

Commit c4ea41ba195d ("binder: use group leader instead of open thread")'
was incomplete and didn't update a check in binder_mmap(), causing all
mmap() calls into the binder driver to fail.

Signed-off-by: Martijn Coenen <ma...@android.com>
Tested-by: John Stultz <john....@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/android/binder.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -3362,7 +3362,7 @@ static int binder_mmap(struct file *filp
const char *failure_string;
struct binder_buffer *buffer;

- if (proc->tsk != current)
+ if (proc->tsk != current->group_leader)
return -EINVAL;

if ((vma->vm_end - vma->vm_start) > SZ_4M)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Hanjun Guo <hanju...@linaro.org>

commit f7f3dd5b4cbb138ed4559b0d096bab76a8f476de upstream.

ACPI HID for Hisilicon Hip07/08 should be HISI02A1/2,
not HISI0A21/2, HISI02A1/2 was tested ok but was modified
by the stupid typo when upstream the patches (by me),
correct them to the right IDs (matching the IDs in
drivers/i2c/busses/i2c-designware-platdrv.c).

Fixes: 6e14cf361a0c (ACPI / APD: Add clock frequency for Hisilicon Hip07/08 I2C controller)
Reported-by: Tao Tian <tian...@huawei.com>
Signed-off-by: Hanjun Guo <hanju...@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/acpi/acpi_apd.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/drivers/acpi/acpi_apd.c
+++ b/drivers/acpi/acpi_apd.c
@@ -180,8 +180,8 @@ static const struct acpi_device_id acpi_
{ "APMC0D0F", APD_ADDR(xgene_i2c_desc) },
{ "BRCM900D", APD_ADDR(vulcan_spi_desc) },
{ "CAV900D", APD_ADDR(vulcan_spi_desc) },
- { "HISI0A21", APD_ADDR(hip07_i2c_desc) },
- { "HISI0A22", APD_ADDR(hip08_i2c_desc) },
+ { "HISI02A1", APD_ADDR(hip07_i2c_desc) },
+ { "HISI02A2", APD_ADDR(hip08_i2c_desc) },
#endif
{ }
};

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <ros...@goodmis.org>

commit a7e52ad7ed82e21273eccff93d1477a7b313aabb upstream.

Chunyu Hu reported:
"per_cpu trace directories and files are created for all possible cpus,
but only the cpus which have ever been on-lined have their own per cpu
ring buffer (allocated by cpuhp threads). While trace_buffers_open, the
open handler for trace file 'trace_pipe_raw' is always trying to access
field of ring_buffer_per_cpu, and would panic with the NULL pointer.

Align the behavior of trace_pipe_raw with trace_pipe, that returns -NODEV
when openning it if that cpu does not have trace ring buffer.

Reproduce:
cat /sys/kernel/debug/tracing/per_cpu/cpu31/trace_pipe_raw
(cpu31 is never on-lined, this is a 16 cores x86_64 box)

Tested with:
1) boot with maxcpus=14, read trace_pipe_raw of cpu15.
Got -NODEV.
2) oneline cpu15, read trace_pipe_raw of cpu15.
Get the raw trace data.

Call trace:
[ 5760.950995] RIP: 0010:ring_buffer_alloc_read_page+0x32/0xe0
[ 5760.961678] tracing_buffers_read+0x1f6/0x230
[ 5760.962695] __vfs_read+0x37/0x160
[ 5760.963498] ? __vfs_read+0x5/0x160
[ 5760.964339] ? security_file_permission+0x9d/0xc0
[ 5760.965451] ? __vfs_read+0x5/0x160
[ 5760.966280] vfs_read+0x8c/0x130
[ 5760.967070] SyS_read+0x55/0xc0
[ 5760.967779] do_syscall_64+0x67/0x150
[ 5760.968687] entry_SYSCALL64_slow_path+0x25/0x25"

This was introduced by the addition of the feature to reuse reader pages
instead of re-allocating them. The problem is that the allocation of a
reader page (which is per cpu) does not check if the cpu is online and set
up for the ring buffer.

Link: http://lkml.kernel.org/r/1500880866-1177-1-g...@redhat.com

Fixes: 73a757e63114 ("ring-buffer: Return reader page back into existing ring buffer")
Reported-by: Chunyu Hu <ch...@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <ros...@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/trace/ring_buffer.c | 14 +++++++++-----
kernel/trace/ring_buffer_benchmark.c | 2 +-
kernel/trace/trace.c | 16 +++++++++++-----
3 files changed, 21 insertions(+), 11 deletions(-)

--- a/kernel/trace/ring_buffer.c
+++ b/kernel/trace/ring_buffer.c
@@ -4386,15 +4386,19 @@ EXPORT_SYMBOL_GPL(ring_buffer_swap_cpu);
* the page that was allocated, with the read page of the buffer.
*
* Returns:
- * The page allocated, or NULL on error.
+ * The page allocated, or ERR_PTR
*/
void *ring_buffer_alloc_read_page(struct ring_buffer *buffer, int cpu)
{
- struct ring_buffer_per_cpu *cpu_buffer = buffer->buffers[cpu];
+ struct ring_buffer_per_cpu *cpu_buffer;
struct buffer_data_page *bpage = NULL;
unsigned long flags;
struct page *page;

+ if (!cpumask_test_cpu(cpu, buffer->cpumask))
+ return ERR_PTR(-ENODEV);
+
+ cpu_buffer = buffer->buffers[cpu];
local_irq_save(flags);
arch_spin_lock(&cpu_buffer->lock);

@@ -4412,7 +4416,7 @@ void *ring_buffer_alloc_read_page(struct
page = alloc_pages_node(cpu_to_node(cpu),
GFP_KERNEL | __GFP_NORETRY, 0);
if (!page)
- return NULL;
+ return ERR_PTR(-ENOMEM);

bpage = page_address(page);

@@ -4467,8 +4471,8 @@ EXPORT_SYMBOL_GPL(ring_buffer_free_read_
*
* for example:
* rpage = ring_buffer_alloc_read_page(buffer, cpu);
- * if (!rpage)
- * return error;
+ * if (IS_ERR(rpage))
+ * return PTR_ERR(rpage);
* ret = ring_buffer_read_page(buffer, &rpage, len, cpu, 0);
* if (ret >= 0)
* process_page(rpage, ret);
--- a/kernel/trace/ring_buffer_benchmark.c
+++ b/kernel/trace/ring_buffer_benchmark.c
@@ -113,7 +113,7 @@ static enum event_status read_page(int c
int i;

bpage = ring_buffer_alloc_read_page(buffer, cpu);
- if (!bpage)
+ if (IS_ERR(bpage))
return EVENT_DROPPED;

ret = ring_buffer_read_page(buffer, &bpage, PAGE_SIZE, cpu, 1);
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -6403,7 +6403,7 @@ tracing_buffers_read(struct file *filp,
{
struct ftrace_buffer_info *info = filp->private_data;
struct trace_iterator *iter = &info->iter;
- ssize_t ret;
+ ssize_t ret = 0;
ssize_t size;

if (!count)
@@ -6417,10 +6417,15 @@ tracing_buffers_read(struct file *filp,
if (!info->spare) {
info->spare = ring_buffer_alloc_read_page(iter->trace_buffer->buffer,
iter->cpu_file);
- info->spare_cpu = iter->cpu_file;
+ if (IS_ERR(info->spare)) {
+ ret = PTR_ERR(info->spare);
+ info->spare = NULL;
+ } else {
+ info->spare_cpu = iter->cpu_file;
+ }
}
if (!info->spare)
- return -ENOMEM;
+ return ret;

/* Do we have previous read data to read? */
if (info->read < PAGE_SIZE)
@@ -6595,8 +6600,9 @@ tracing_buffers_splice_read(struct file
ref->ref = 1;
ref->buffer = iter->trace_buffer->buffer;
ref->page = ring_buffer_alloc_read_page(ref->buffer, iter->cpu_file);
- if (!ref->page) {
- ret = -ENOMEM;
+ if (IS_ERR(ref->page)) {
+ ret = PTR_ERR(ref->page);
+ ref->page = NULL;
kfree(ref);
break;
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:10 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Florian Westphal <f...@strlen.de>

commit 36ac344e16e04e3e55e8fed7446095a6458c64e6 upstream.

We crash in __nf_ct_expect_check, it calls nf_ct_remove_expect on the
uninitialised expectation instead of existing one, so del_timer chokes
on random memory address.

Fixes: ec0e3f01114ad32711243 ("netfilter: nf_ct_expect: Add nf_ct_remove_expect()")
Reported-by: Sergey Kvachonok <rave...@gmail.com>
Tested-by: Sergey Kvachonok <rave...@gmail.com>
Cc: Gao Feng <fg...@ikuai8.com>
Signed-off-by: Florian Westphal <f...@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pa...@netfilter.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
net/netfilter/nf_conntrack_expect.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/net/netfilter/nf_conntrack_expect.c
+++ b/net/netfilter/nf_conntrack_expect.c
@@ -422,7 +422,7 @@ static inline int __nf_ct_expect_check(s
h = nf_ct_expect_dst_hash(net, &expect->tuple);
hlist_for_each_entry_safe(i, next, &nf_ct_expect_hash[h], hnode) {
if (expect_matches(i, expect)) {
- if (nf_ct_remove_expect(expect))
+ if (nf_ct_remove_expect(i))
break;
} else if (expect_clash(i, expect)) {
ret = -EBUSY;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:10 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Lv Zheng <lv.z...@intel.com>

commit 98529b9272e06a7767034fb8a32e43cdecda240a upstream.

Commit 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle
EC events) introduced acpi_ec_ecdt_start(), but that function is
invoked before acpi_ec_query_init(), which is too early. This causes
the kernel to crash if an EC event occurs after boot, when ec_query_wq
is not valid:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000102
...
Workqueue: events acpi_ec_event_handler
task: ffff9f539790dac0 task.stack: ffffb437c0e10000
RIP: 0010:__queue_work+0x32/0x430

Normally, the DSDT EC should always be valid, so acpi_ec_ecdt_start()
is actually a no-op in the majority of cases. However, commit
c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe)
caused the probing of the DSDT EC as the "boot EC" to be skipped when
the ECDT EC is valid and uncovered the bug.

Fix this issue by invoking acpi_ec_ecdt_start() after acpi_ec_query_init()
in acpi_ec_init().

Link: https://jira01.devtools.intel.com/browse/LCK-4348
Fixes: 2a5708409e4e (ACPI / EC: Fix a gap that ECDT EC cannot handle EC events)
Fixes: c712bb58d827 (ACPI / EC: Add support to skip boot stage DSDT probe)
Reported-by: Wang Wendy <wendy...@intel.com>
Tested-by: Feng Chenzhou <chenzho...@intel.com>
Signed-off-by: Lv Zheng <lv.z...@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/acpi/ec.c | 17 +++++++----------
drivers/acpi/internal.h | 1 -
drivers/acpi/scan.c | 1 -
3 files changed, 7 insertions(+), 12 deletions(-)

--- a/drivers/acpi/ec.c
+++ b/drivers/acpi/ec.c
@@ -1703,7 +1703,7 @@ error:
* functioning ECDT EC first in order to handle the events.
* https://bugzilla.kernel.org/show_bug.cgi?id=115021
*/
-int __init acpi_ec_ecdt_start(void)
+static int __init acpi_ec_ecdt_start(void)
{
acpi_handle handle;

@@ -1906,20 +1906,17 @@ static inline void acpi_ec_query_exit(vo
int __init acpi_ec_init(void)
{
int result;
+ int ecdt_fail, dsdt_fail;

/* register workqueue for _Qxx evaluations */
result = acpi_ec_query_init();
if (result)
- goto err_exit;
- /* Now register the driver for the EC */
- result = acpi_bus_register_driver(&acpi_ec_driver);
- if (result)
- goto err_exit;
+ return result;

-err_exit:
- if (result)
- acpi_ec_query_exit();
- return result;
+ /* Drivers must be started after acpi_ec_query_init() */
+ ecdt_fail = acpi_ec_ecdt_start();
+ dsdt_fail = acpi_bus_register_driver(&acpi_ec_driver);
+ return ecdt_fail && dsdt_fail ? -ENODEV : 0;
}

/* EC driver currently not unloadable */
--- a/drivers/acpi/internal.h
+++ b/drivers/acpi/internal.h
@@ -185,7 +185,6 @@ typedef int (*acpi_ec_query_func) (void
int acpi_ec_init(void);
int acpi_ec_ecdt_probe(void);
int acpi_ec_dsdt_probe(void);
-int acpi_ec_ecdt_start(void);
void acpi_ec_block_transactions(void);
void acpi_ec_unblock_transactions(void);
int acpi_ec_add_query_handler(struct acpi_ec *ec, u8 query_bit,
--- a/drivers/acpi/scan.c
+++ b/drivers/acpi/scan.c
@@ -2085,7 +2085,6 @@ int __init acpi_scan_init(void)

acpi_gpe_apply_masked_gpes();
acpi_update_all_gpes();
- acpi_ec_ecdt_start();

acpi_scan_initialized = true;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:11 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chuck Lever <chuck...@oracle.com>

commit fc788f64f1f3eb31e87d4f53bcf1ab76590d5838 upstream.

When processing an NFSv4 WRITE operation, argp->end should never
point past the end of the data in the final page of the page list.
Otherwise, nfsd4_decode_compound can walk into uninitialized memory.

More critical, nfsd4_decode_write is failing to increment argp->pagelen
when it increments argp->pagelist. This can cause later xdr decoders
to assume more data is available than really is, which can cause server
crashes on malformed requests.

Signed-off-by: Chuck Lever <chuck...@oracle.com>
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
fs/nfsd/nfs4xdr.c | 6 ++----
1 file changed, 2 insertions(+), 4 deletions(-)

--- a/fs/nfsd/nfs4xdr.c
+++ b/fs/nfsd/nfs4xdr.c
@@ -144,7 +144,7 @@ static void next_decode_page(struct nfsd
argp->p = page_address(argp->pagelist[0]);
argp->pagelist++;
if (argp->pagelen < PAGE_SIZE) {
- argp->end = argp->p + (argp->pagelen>>2);
+ argp->end = argp->p + XDR_QUADLEN(argp->pagelen);
argp->pagelen = 0;
} else {
argp->end = argp->p + (PAGE_SIZE>>2);
@@ -1279,9 +1279,7 @@ nfsd4_decode_write(struct nfsd4_compound
argp->pagelen -= pages * PAGE_SIZE;
len -= pages * PAGE_SIZE;

- argp->p = (__be32 *)page_address(argp->pagelist[0]);
- argp->pagelist++;
- argp->end = argp->p + XDR_QUADLEN(PAGE_SIZE);
+ next_decode_page(argp);
}
argp->p += XDR_QUADLEN(len);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:11 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Joakim Tjernlund <joakim.t...@infinera.com>

commit 07b3b5e9ed807a0d2077319b8e43a42e941db818 upstream.

These headsets reports a lot of: cannot set freq 44100 to ep 0x81
and need a small delay between sample rate settings, just like
Zoom R16/24. Add both headsets to the Zoom R16/24 quirk for
a 1 ms delay between control msgs.

Signed-off-by: Joakim Tjernlund <joakim.t...@infinera.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
sound/usb/quirks.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)

--- a/sound/usb/quirks.c
+++ b/sound/usb/quirks.c
@@ -1309,10 +1309,13 @@ void snd_usb_ctl_msg_quirk(struct usb_de
&& (requesttype & USB_TYPE_MASK) == USB_TYPE_CLASS)
mdelay(20);

- /* Zoom R16/24 needs a tiny delay here, otherwise requests like
- * get/set frequency return as failed despite actually succeeding.
+ /* Zoom R16/24, Logitech H650e, Jabra 550a needs a tiny delay here,
+ * otherwise requests like get/set frequency return as failed despite
+ * actually succeeding.
*/
- if (chip->usb_id == USB_ID(0x1686, 0x00dd) &&
+ if ((chip->usb_id == USB_ID(0x1686, 0x00dd) ||
+ chip->usb_id == USB_ID(0x046d, 0x0a46) ||
+ chip->usb_id == USB_ID(0x0b0e, 0x0349)) &&
(requesttype & USB_TYPE_MASK) == USB_TYPE_CLASS)
mdelay(1);
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:16 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <ebig...@google.com>

commit ccd5b3235180eef3cfec337df1c8554ab151b5cc upstream.

The following commit:

39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")

renamed init_new_context() to init_new_context_ldt() and added a new
init_new_context() which calls init_new_context_ldt(). However, the
error code of init_new_context_ldt() was ignored. Consequently, if a
memory allocation in alloc_ldt_struct() failed during a fork(), the
->context.ldt of the new task remained the same as that of the old task
(due to the memcpy() in dup_mm()). ldt_struct's are not intended to be
shared, so a use-after-free occurred after one task exited.

Fix the bug by making init_new_context() pass through the error code of
init_new_context_ldt().

This bug was found by syzkaller, which encountered the following splat:

BUG: KASAN: use-after-free in free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
Read of size 4 at addr ffff88006d2cb7c8 by task kworker/u9:0/3710

CPU: 1 PID: 3710 Comm: kworker/u9:0 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x73/0x250 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x24e/0x340 mm/kasan/report.c:409
__asan_report_load4_noabort+0x14/0x20 mm/kasan/report.c:429
free_ldt_struct.part.2+0x10a/0x150 arch/x86/kernel/ldt.c:116
free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
__mmdrop+0xe9/0x530 kernel/fork.c:889
mmdrop include/linux/sched/mm.h:42 [inline]
exec_mmap fs/exec.c:1061 [inline]
flush_old_exec+0x173c/0x1ff0 fs/exec.c:1291
load_elf_binary+0x81f/0x4ba0 fs/binfmt_elf.c:855
search_binary_handler+0x142/0x6b0 fs/exec.c:1652
exec_binprm fs/exec.c:1694 [inline]
do_execveat_common.isra.33+0x1746/0x22e0 fs/exec.c:1816
do_execve+0x31/0x40 fs/exec.c:1860
call_usermodehelper_exec_async+0x457/0x8f0 kernel/umh.c:100
ret_from_fork+0x2a/0x40 arch/x86/entry/entry_64.S:431

Allocated by task 3700:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:551
kmem_cache_alloc_trace+0x136/0x750 mm/slab.c:3627
kmalloc include/linux/slab.h:493 [inline]
alloc_ldt_struct+0x52/0x140 arch/x86/kernel/ldt.c:67
write_ldt+0x7b7/0xab0 arch/x86/kernel/ldt.c:277
sys_modify_ldt+0x1ef/0x240 arch/x86/kernel/ldt.c:307
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 3700:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x71/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xca/0x250 mm/slab.c:3820
free_ldt_struct.part.2+0xdd/0x150 arch/x86/kernel/ldt.c:121
free_ldt_struct arch/x86/kernel/ldt.c:173 [inline]
destroy_context_ldt+0x60/0x80 arch/x86/kernel/ldt.c:171
destroy_context arch/x86/include/asm/mmu_context.h:157 [inline]
__mmdrop+0xe9/0x530 kernel/fork.c:889
mmdrop include/linux/sched/mm.h:42 [inline]
__mmput kernel/fork.c:916 [inline]
mmput+0x541/0x6e0 kernel/fork.c:927
copy_process.part.36+0x22e1/0x4af0 kernel/fork.c:1931
copy_process kernel/fork.c:1546 [inline]
_do_fork+0x1ef/0xfb0 kernel/fork.c:2025
SYSC_clone kernel/fork.c:2135 [inline]
SyS_clone+0x37/0x50 kernel/fork.c:2129
do_syscall_64+0x26c/0x8c0 arch/x86/entry/common.c:287
return_from_SYSCALL_64+0x0/0x7a

Here is a C reproducer:

#include <asm/ldt.h>
#include <pthread.h>
#include <signal.h>
#include <stdlib.h>
#include <sys/syscall.h>
#include <sys/wait.h>
#include <unistd.h>

static void *fork_thread(void *_arg)
{
fork();
}

int main(void)
{
struct user_desc desc = { .entry_number = 8191 };

syscall(__NR_modify_ldt, 1, &desc, sizeof(desc));

for (;;) {
if (fork() == 0) {
pthread_t t;

srand(getpid());
pthread_create(&t, NULL, fork_thread, NULL);
usleep(rand() % 10000);
syscall(__NR_exit_group, 0);
}
wait(NULL);
}
}

Note: the reproducer takes advantage of the fact that alloc_ldt_struct()
may use vmalloc() to allocate a large ->entries array, and after
commit:

5d17a73a2ebe ("vmalloc: back off when the current task is killed")

it is possible for userspace to fail a task's vmalloc() by
sending a fatal signal, e.g. via exit_group(). It would be more
difficult to reproduce this bug on kernels without that commit.

This bug only affected kernels with CONFIG_MODIFY_LDT_SYSCALL=y.

Signed-off-by: Eric Biggers <ebig...@google.com>
Acked-by: Dave Hansen <dave....@linux.intel.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Cc: Andy Lutomirski <lu...@amacapital.net>
Cc: Borislav Petkov <b...@alien8.de>
Cc: Brian Gerst <brg...@gmail.com>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Denys Vlasenko <dvla...@redhat.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Michal Hocko <mho...@suse.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Rik van Riel <ri...@redhat.com>
Cc: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: linu...@kvack.org
Fixes: 39a0526fb3f7 ("x86/mm: Factor out LDT init from context init")
Link: http://lkml.kernel.org/r/20170824175029.7...@gmail.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/include/asm/mmu_context.h | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -116,9 +116,7 @@ static inline int init_new_context(struc
mm->context.execute_only_pkey = -1;
}
#endif
- init_new_context_ldt(tsk, mm);
-
- return 0;
+ return init_new_context_ldt(tsk, mm);
}
static inline void destroy_context(struct mm_struct *mm)
{

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:16 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Christoph Hellwig <h...@lst.de>

commit ba74b6f7fcc07355d087af6939712eed4a454821 upstream.

Commit 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for
virtqueues"") removed the adjustment of the pre_vectors for the virtio
MSI-X vector allocation which was added in commit fb5e31d9 ("virtio:
allow drivers to request IRQ affinity when creating VQs"). This will
lead to an incorrect assignment of MSI-X vectors, and potential
deadlocks when offlining cpus.

Signed-off-by: Christoph Hellwig <h...@lst.de>
Fixes: 0b0f9dc5 ("Revert "virtio_pci: use shared interrupts for virtqueues")
Reported-by: YASUAKI ISHIMATSU <yasu.i...@gmail.com>
Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/virtio/virtio_pci_common.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)

--- a/drivers/virtio/virtio_pci_common.c
+++ b/drivers/virtio/virtio_pci_common.c
@@ -107,6 +107,7 @@ static int vp_request_msix_vectors(struc
{
struct virtio_pci_device *vp_dev = to_vp_device(vdev);
const char *name = dev_name(&vp_dev->vdev.dev);
+ unsigned flags = PCI_IRQ_MSIX;
unsigned i, v;
int err = -ENOMEM;

@@ -126,10 +127,13 @@ static int vp_request_msix_vectors(struc
GFP_KERNEL))
goto error;

+ if (desc) {
+ flags |= PCI_IRQ_AFFINITY;
+ desc->pre_vectors++; /* virtio config vector */
+ }
+
err = pci_alloc_irq_vectors_affinity(vp_dev->pci_dev, nvectors,
- nvectors, PCI_IRQ_MSIX |
- (desc ? PCI_IRQ_AFFINITY : 0),
- desc);
+ nvectors, flags, desc);
if (err < 0)
goto error;
vp_dev->msix_enabled = 1;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:16 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <ti...@suse.de>

commit bbba6f9d3da357bbabc6fda81e99ff5584500e76 upstream.

Lenovo G50-70 (17aa:3978) with Conexant codec chip requires the
similar workaround for the inverted stereo dmic like other Lenovo
models.

Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1020657
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
sound/pci/hda/patch_conexant.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/pci/hda/patch_conexant.c
+++ b/sound/pci/hda/patch_conexant.c
@@ -947,6 +947,7 @@ static const struct snd_pci_quirk cxt506
SND_PCI_QUIRK(0x17aa, 0x390b, "Lenovo G50-80", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x3975, "Lenovo U300s", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x3977, "Lenovo IdeaPad U310", CXT_FIXUP_STEREO_DMIC),
+ SND_PCI_QUIRK(0x17aa, 0x3978, "Lenovo G50-70", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK(0x17aa, 0x397b, "Lenovo S205", CXT_FIXUP_STEREO_DMIC),
SND_PCI_QUIRK_VENDOR(0x17aa, "Thinkpad", CXT_FIXUP_THINKPAD_ACPI),
SND_PCI_QUIRK(0x1c06, 0x2011, "Lemote A1004", CXT_PINCFG_LEMOTE_A1004),

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:16 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Mark Rutland <mark.r...@arm.com>

commit 64aee2a965cf2954a038b5522f11d2cd2f0f8f3e upstream.

Regardless of which events form a group, it does not make sense for the
events to target different tasks and/or CPUs, as this leaves the group
inconsistent and impossible to schedule. The core perf code assumes that
these are consistent across (successfully intialised) groups.

Core perf code only verifies this when moving SW events into a HW
context. Thus, we can violate this requirement for pure SW groups and
pure HW groups, unless the relevant PMU driver happens to perform this
verification itself. These mismatched groups subsequently wreak havoc
elsewhere.

For example, we handle watchpoints as SW events, and reserve watchpoint
HW on a per-CPU basis at pmu::event_init() time to ensure that any event
that is initialised is guaranteed to have a slot at pmu::add() time.
However, the core code only checks the group leader's cpu filter (via
event_filter_match()), and can thus install follower events onto CPUs
violating thier (mismatched) CPU filters, potentially installing them
into a CPU without sufficient reserved slots.

This can be triggered with the below test case, resulting in warnings
from arch backends.

#define _GNU_SOURCE
#include <linux/hw_breakpoint.h>
#include <linux/perf_event.h>
#include <sched.h>
#include <stdio.h>
#include <sys/prctl.h>
#include <sys/syscall.h>
#include <unistd.h>

static int perf_event_open(struct perf_event_attr *attr, pid_t pid, int cpu,
int group_fd, unsigned long flags)
{
return syscall(__NR_perf_event_open, attr, pid, cpu, group_fd, flags);
}

char watched_char;

struct perf_event_attr wp_attr = {
.type = PERF_TYPE_BREAKPOINT,
.bp_type = HW_BREAKPOINT_RW,
.bp_addr = (unsigned long)&watched_char,
.bp_len = 1,
.size = sizeof(wp_attr),
};

int main(int argc, char *argv[])
{
int leader, ret;
cpu_set_t cpus;

/*
* Force use of CPU0 to ensure our CPU0-bound events get scheduled.
*/
CPU_ZERO(&cpus);
CPU_SET(0, &cpus);
ret = sched_setaffinity(0, sizeof(cpus), &cpus);
if (ret) {
printf("Unable to set cpu affinity\n");
return 1;
}

/* open leader event, bound to this task, CPU0 only */
leader = perf_event_open(&wp_attr, 0, 0, -1, 0);
if (leader < 0) {
printf("Couldn't open leader: %d\n", leader);
return 1;
}

/*
* Open a follower event that is bound to the same task, but a
* different CPU. This means that the group should never be possible to
* schedule.
*/
ret = perf_event_open(&wp_attr, 0, 1, leader, 0);
if (ret < 0) {
printf("Couldn't open mismatched follower: %d\n", ret);
return 1;
} else {
printf("Opened leader/follower with mismastched CPUs\n");
}

/*
* Open as many independent events as we can, all bound to the same
* task, CPU0 only.
*/
do {
ret = perf_event_open(&wp_attr, 0, 0, -1, 0);
} while (ret >= 0);

/*
* Force enable/disble all events to trigger the erronoeous
* installation of the follower event.
*/
printf("Opened all events. Toggling..\n");
for (;;) {
prctl(PR_TASK_PERF_EVENTS_DISABLE, 0, 0, 0, 0);
prctl(PR_TASK_PERF_EVENTS_ENABLE, 0, 0, 0, 0);
}

return 0;
}

Fix this by validating this requirement regardless of whether we're
moving events.

Signed-off-by: Mark Rutland <mark.r...@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
Cc: Alexander Shishkin <alexander...@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <ac...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Zhou Chengming <zhouche...@huawei.com>
Link: http://lkml.kernel.org/r/1498142498-15758-1-git-...@arm.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/events/core.c | 39 +++++++++++++++++++--------------------
1 file changed, 19 insertions(+), 20 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9996,28 +9996,27 @@ SYSCALL_DEFINE5(perf_event_open,
goto err_context;

/*
- * Do not allow to attach to a group in a different
- * task or CPU context:
+ * Make sure we're both events for the same CPU;
+ * grouping events for different CPUs is broken; since
+ * you can never concurrently schedule them anyhow.
*/
- if (move_group) {
- /*
- * Make sure we're both on the same task, or both
- * per-cpu events.
- */
- if (group_leader->ctx->task != ctx->task)
- goto err_context;
+ if (group_leader->cpu != event->cpu)
+ goto err_context;
+
+ /*
+ * Make sure we're both on the same task, or both
+ * per-CPU events.
+ */
+ if (group_leader->ctx->task != ctx->task)
+ goto err_context;

- /*
- * Make sure we're both events for the same CPU;
- * grouping events for different CPUs is broken; since
- * you can never concurrently schedule them anyhow.
- */
- if (group_leader->cpu != event->cpu)
- goto err_context;
- } else {
- if (group_leader->ctx != ctx)
- goto err_context;
- }
+ /*
+ * Do not allow to attach to a group in a different task
+ * or CPU context. If we're moving SW events, we'll fix
+ * this up later, so allow that.
+ */
+ if (!move_group && group_leader->ctx != ctx)
+ goto err_context;

/*
* Only a group leader can be exclusive or pinned

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:10:16 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Paolo Bonzini <pbon...@redhat.com>

commit 38cfd5e3df9c4f88e76b547eee2087ee5c042ae2 upstream.

The host pkru is restored right after vcpu exit (commit 1be0e61), so
KVM_GET_XSAVE will return the host PKRU value instead. Fix this by
using the guest PKRU explicitly in fill_xsave and load_xsave. This
part is based on a patch by Junkang Fu.

The host PKRU data may also not match the value in vcpu->arch.guest_fpu.state,
because it could have been changed by userspace since the last time
it was saved, so skip loading it in kvm_load_guest_fpu.

Reported-by: Junkang Fu <junka...@alibaba-inc.com>
Cc: Yang Zhang <zy10...@alibaba-inc.com>
Fixes: 1be0e61c1f255faaeab04a390e00c8b9b9042870
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/x86/include/asm/fpu/internal.h | 6 +++---
arch/x86/kvm/x86.c | 17 ++++++++++++++---
2 files changed, 17 insertions(+), 6 deletions(-)

--- a/arch/x86/include/asm/fpu/internal.h
+++ b/arch/x86/include/asm/fpu/internal.h
@@ -450,10 +450,10 @@ static inline int copy_fpregs_to_fpstate
return 0;
}

-static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate)
+static inline void __copy_kernel_to_fpregs(union fpregs_state *fpstate, u64 mask)
{
if (use_xsave()) {
- copy_kernel_to_xregs(&fpstate->xsave, -1);
+ copy_kernel_to_xregs(&fpstate->xsave, mask);
} else {
if (use_fxsr())
copy_kernel_to_fxregs(&fpstate->fxsave);
@@ -477,7 +477,7 @@ static inline void copy_kernel_to_fpregs
: : [addr] "m" (fpstate));
}

- __copy_kernel_to_fpregs(fpstate);
+ __copy_kernel_to_fpregs(fpstate, -1);
}

extern int copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size);
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3236,7 +3236,12 @@ static void fill_xsave(u8 *dest, struct
u32 size, offset, ecx, edx;
cpuid_count(XSTATE_CPUID, index,
&size, &offset, &ecx, &edx);
- memcpy(dest + offset, src, size);
+ if (feature == XFEATURE_MASK_PKRU)
+ memcpy(dest + offset, &vcpu->arch.pkru,
+ sizeof(vcpu->arch.pkru));
+ else
+ memcpy(dest + offset, src, size);
+
}

valid -= feature;
@@ -3274,7 +3279,11 @@ static void load_xsave(struct kvm_vcpu *
u32 size, offset, ecx, edx;
cpuid_count(XSTATE_CPUID, index,
&size, &offset, &ecx, &edx);
- memcpy(dest, src + offset, size);
+ if (feature == XFEATURE_MASK_PKRU)
+ memcpy(&vcpu->arch.pkru, src + offset,
+ sizeof(vcpu->arch.pkru));
+ else
+ memcpy(dest, src + offset, size);
}

valid -= feature;
@@ -7616,7 +7625,9 @@ void kvm_load_guest_fpu(struct kvm_vcpu
*/
vcpu->guest_fpu_loaded = 1;
__kernel_fpu_begin();
- __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state);
+ /* PKRU is separately restored in kvm_x86_ops->run. */
+ __copy_kernel_to_fpregs(&vcpu->arch.guest_fpu.state,
+ ~XFEATURE_MASK_PKRU);
trace_kvm_fpu(1);
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Maarten Lankhorst <maarten....@linux.intel.com>

commit a0ffc51e20e90e0c1c2491de2b4b03f48b6caaba upstream.

The last part of drm_atomic_check_only is testing whether we need to
fail with -EINVAL when modeset is not allowed, but forgets to return
the value when atomic_check() fails first.

This results in -EDEADLK being replaced by -EINVAL, and the sanity
check in drm_modeset_drop_locks kicks in:

[ 308.531734] ------------[ cut here ]------------
[ 308.531791] WARNING: CPU: 0 PID: 1886 at drivers/gpu/drm/drm_modeset_lock.c:217 drm_modeset_drop_locks+0x33/0xc0 [drm]
[ 308.531828] Modules linked in:
[ 308.532050] CPU: 0 PID: 1886 Comm: kms_atomic Tainted: G U W 4.13.0-rc5-patser+ #5225
[ 308.532082] Hardware name: NUC5i7RYB, BIOS RYBDWi35.86A.0246.2015.0309.1355 03/09/2015
[ 308.532124] task: ffff8800cd9dae00 task.stack: ffff8800ca3b8000
[ 308.532168] RIP: 0010:drm_modeset_drop_locks+0x33/0xc0 [drm]
[ 308.532189] RSP: 0018:ffff8800ca3bf980 EFLAGS: 00010282
[ 308.532211] RAX: dffffc0000000000 RBX: ffff8800ca3bfaf8 RCX: 0000000013a171e6
[ 308.532235] RDX: 1ffff10019477f69 RSI: ffffffffa8ba4fa0 RDI: ffff8800ca3bfb48
[ 308.532258] RBP: ffff8800ca3bf998 R08: 0000000000000000 R09: 0000000000000003
[ 308.532281] R10: 0000000079dbe066 R11: 00000000f760b34b R12: 0000000000000001
[ 308.532304] R13: dffffc0000000000 R14: 00000000ffffffea R15: ffff880096889680
[ 308.532328] FS: 00007ff00959cec0(0000) GS:ffff8800d4e00000(0000) knlGS:0000000000000000
[ 308.532359] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 308.532380] CR2: 0000000000000008 CR3: 00000000ca2e3000 CR4: 00000000003406f0
[ 308.532402] Call Trace:
[ 308.532440] drm_mode_atomic_ioctl+0x19fa/0x1c00 [drm]
[ 308.532488] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532565] ? avc_has_extended_perms+0xc39/0xff0
[ 308.532593] ? lock_downgrade+0x610/0x610
[ 308.532640] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532680] drm_ioctl_kernel+0x154/0x1a0 [drm]
[ 308.532755] drm_ioctl+0x624/0x8f0 [drm]
[ 308.532858] ? drm_atomic_set_property+0x1220/0x1220 [drm]
[ 308.532976] ? drm_getunique+0x210/0x210 [drm]
[ 308.533061] do_vfs_ioctl+0xd92/0xe40
[ 308.533121] ? ioctl_preallocate+0x1b0/0x1b0
[ 308.533160] ? selinux_capable+0x20/0x20
[ 308.533191] ? do_fcntl+0x1b1/0xbf0
[ 308.533219] ? kasan_slab_free+0xa2/0xb0
[ 308.533249] ? f_getown+0x4b/0xa0
[ 308.533278] ? putname+0xcf/0xe0
[ 308.533309] ? security_file_ioctl+0x57/0x90
[ 308.533342] SyS_ioctl+0x4e/0x80
[ 308.533374] entry_SYSCALL_64_fastpath+0x18/0xad
[ 308.533405] RIP: 0033:0x7ff00779e4d7
[ 308.533431] RSP: 002b:00007fff66a043d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 308.533481] RAX: ffffffffffffffda RBX: 000000e7c7ca5910 RCX: 00007ff00779e4d7
[ 308.533560] RDX: 00007fff66a04430 RSI: 00000000c03864bc RDI: 0000000000000003
[ 308.533608] RBP: 00007ff007a5fb00 R08: 000000e7c7ca4620 R09: 000000e7c7ca5e60
[ 308.533647] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000070
[ 308.533685] R13: 0000000000000000 R14: 0000000000000000 R15: 000000e7c7ca5930
[ 308.533770] Code: ff df 55 48 89 e5 41 55 41 54 53 48 89 fb 48 83 c7
50 48 89 fa 48 c1 ea 03 80 3c 02 00 74 05 e8 94 d4 16 e7 48 83 7b 50 00
74 02 <0f> ff 4c 8d 6b 58 48 b8 00 00 00 00 00 fc ff df 4c 89 ea 48 c1
[ 308.534086] ---[ end trace 77f11e53b1df44ad ]---

Solve this by adding the missing return.

This is also a bugfix because we could end up rejecting updates with
-EINVAL because of a early -EDEADLK, while if atomic_check ran to
completion it might have downgraded the modeset to a fastset.

Signed-off-by: Maarten Lankhorst <maarten....@linux.intel.com>
Testcase: kms_atomic
Link: https://patchwork.freedesktop.org/patch/msgid/20170815095706.23624...@linux.intel.com
Fixes: d34f20d6e2f2 ("drm: Atomic modeset ioctl")
Reviewed-by: Daniel Vetter <daniel...@ffwll.ch>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/gpu/drm/drm_atomic.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/drivers/gpu/drm/drm_atomic.c
+++ b/drivers/gpu/drm/drm_atomic.c
@@ -1581,6 +1581,9 @@ int drm_atomic_check_only(struct drm_ato
if (config->funcs->atomic_check)
ret = config->funcs->atomic_check(state->dev, state);

+ if (ret)
+ return ret;
+
if (!state->allow_modeset) {
for_each_new_crtc_in_state(state, crtc, crtc_state, i) {
if (drm_atomic_crtc_needs_modeset(crtc_state)) {
@@ -1591,7 +1594,7 @@ int drm_atomic_check_only(struct drm_ato
}
}

- return ret;
+ return 0;
}
EXPORT_SYMBOL(drm_atomic_check_only);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khleb...@yandex-team.ru>


[ Upstream commit c90e95147c27b1780e76c6e8fea1b5c78d7d387f ]

It was added in commit e57a784d8cae ("pkt_sched: set root qdisc
before change() in attach_default_qdiscs()") to hide duplicates
from "tc qdisc show" for incative deivices.

After 59cc1f61f ("net: sched: convert qdisc linked list to hashtable")
it triggered when classful qdisc is added to inactive device because
default qdiscs are added before switching root qdisc.

Anyway after commit ea3274695353 ("net: sched: avoid duplicates in
qdisc dump") duplicates are filtered right in dumper.

Signed-off-by: Konstantin Khlebnikov <khleb...@yandex-team.ru>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/sched/sch_api.c | 3 ---
1 file changed, 3 deletions(-)

--- a/net/sched/sch_api.c
+++ b/net/sched/sch_api.c
@@ -286,9 +286,6 @@ static struct Qdisc *qdisc_match_from_ro
void qdisc_hash_add(struct Qdisc *q, bool invisible)
{
if ((q->parent != TC_H_ROOT) && !(q->flags & TCQ_F_INGRESS)) {
- struct Qdisc *root = qdisc_dev(q)->qdisc;
-
- WARN_ON_ONCE(root == &noop_qdisc);
ASSERT_RTNL();
hash_add_rcu(qdisc_dev(q)->qdisc_hash, &q->hash, q->handle);
if (invisible)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chen Yu <yu.c...@intel.com>

commit 556b969a1cfe2686aae149137fa1dfcac0eefe54 upstream.

There is a problem that when counting the pages for creating the
hibernation snapshot will take significant amount of time, especially on
system with large memory. Since the counting job is performed with irq
disabled, this might lead to NMI lockup. The following warning were
found on a system with 1.5TB DRAM:

Freezing user space processes ... (elapsed 0.002 seconds) done.
OOM killer disabled.
PM: Preallocating image memory...
NMI watchdog: Watchdog detected hard LOCKUP on cpu 27
CPU: 27 PID: 3128 Comm: systemd-sleep Not tainted 4.13.0-0.rc2.git0.1.fc27.x86_64 #1
task: ffff9f01971ac000 task.stack: ffffb1a3f325c000
RIP: 0010:memory_bm_find_bit+0xf4/0x100
Call Trace:
swsusp_set_page_free+0x2b/0x30
mark_free_pages+0x147/0x1c0
count_data_pages+0x41/0xa0
hibernate_preallocate_memory+0x80/0x450
hibernation_snapshot+0x58/0x410
hibernate+0x17c/0x310
state_store+0xdf/0xf0
kobj_attr_store+0xf/0x20
sysfs_kf_write+0x37/0x40
kernfs_fop_write+0x11c/0x1a0
__vfs_write+0x37/0x170
vfs_write+0xb1/0x1a0
SyS_write+0x55/0xc0
entry_SYSCALL_64_fastpath+0x1a/0xa5
...
done (allocated 6590003 pages)
PM: Allocated 26360012 kbytes in 19.89 seconds (1325.28 MB/s)

It has taken nearly 20 seconds(2.10GHz CPU) thus the NMI lockup was
triggered. In case the timeout of the NMI watch dog has been set to 1
second, a safe interval should be 6590003/20 = 320k pages in theory.
However there might also be some platforms running at a lower frequency,
so feed the watchdog every 100k pages.

[yu.c...@intel.com: simplification]
Link: http://lkml.kernel.org/r/1503460079-29721-1-gi...@intel.com
[yu.c...@intel.com: use interval of 128k instead of 100k to avoid modulus]
Link: http://lkml.kernel.org/r/1503328098-5120-1-git...@intel.com
Signed-off-by: Chen Yu <yu.c...@intel.com>
Reported-by: Jan Filipcewicz <jan.fil...@intel.com>
Suggested-by: Michal Hocko <mho...@suse.com>
Reviewed-by: Michal Hocko <mho...@suse.com>
Acked-by: Rafael J. Wysocki <rafael.j...@intel.com>
Cc: Mel Gorman <mgo...@techsingularity.net>
Cc: Vlastimil Babka <vba...@suse.cz>
Cc: Len Brown <le...@kernel.org>
Cc: Dan Williams <dan.j.w...@intel.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
mm/page_alloc.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)

--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -66,6 +66,7 @@
#include <linux/kthread.h>
#include <linux/memcontrol.h>
#include <linux/ftrace.h>
+#include <linux/nmi.h>

#include <asm/sections.h>
#include <asm/tlbflush.h>
@@ -2495,9 +2496,14 @@ void drain_all_pages(struct zone *zone)

#ifdef CONFIG_HIBERNATION

+/*
+ * Touch the watchdog for every WD_PAGE_COUNT pages.
+ */
+#define WD_PAGE_COUNT (128*1024)
+
void mark_free_pages(struct zone *zone)
{
- unsigned long pfn, max_zone_pfn;
+ unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT;
unsigned long flags;
unsigned int order, t;
struct page *page;
@@ -2512,6 +2518,11 @@ void mark_free_pages(struct zone *zone)
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);

+ if (!--page_count) {
+ touch_nmi_watchdog();
+ page_count = WD_PAGE_COUNT;
+ }
+
if (page_zone(page) != zone)
continue;

@@ -2525,8 +2536,13 @@ void mark_free_pages(struct zone *zone)
unsigned long i;

pfn = page_to_pfn(page);
- for (i = 0; i < (1UL << order); i++)
+ for (i = 0; i < (1UL << order); i++) {
+ if (!--page_count) {
+ touch_nmi_watchdog();
+ page_count = WD_PAGE_COUNT;
+ }
swsusp_set_page_free(pfn_to_page(pfn + i));
+ }
}
}
spin_unlock_irqrestore(&zone->lock, flags);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dave Martin <Dave....@arm.com>

commit 096622104e14d8a1db4860bd557717067a0515d2 upstream.

There are some tricky dependencies between the different stages of
flushing the FPSIMD register state during exec, and these can race
with context switch in ways that can cause the old task's regs to
leak across. In particular, a context switch during the memset() can
cause some of the task's old FPSIMD registers to reappear.

Disabling preemption for this small window would be no big deal for
performance: preemption is already disabled for similar scenarios
like updating the FPSIMD registers in sigreturn.

So, instead of rearranging things in ways that might swap existing
subtle bugs for new ones, this patch just disables preemption
around the FPSIMD state flushing so that races of this type can't
occur here. This brings fpsimd_flush_thread() into line with other
code paths.

Fixes: 674c242c9323 ("arm64: flush FP/SIMD state correctly after execve()")
Reviewed-by: Ard Biesheuvel <ard.bie...@linaro.org>
Signed-off-by: Dave Martin <Dave....@arm.com>
Signed-off-by: Will Deacon <will....@arm.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/arm64/kernel/fpsimd.c | 2 ++
1 file changed, 2 insertions(+)

--- a/arch/arm64/kernel/fpsimd.c
+++ b/arch/arm64/kernel/fpsimd.c
@@ -161,9 +161,11 @@ void fpsimd_flush_thread(void)
{
if (!system_supports_fpsimd())
return;
+ preempt_disable();
memset(&current->thread.fpsimd_state, 0, sizeof(struct fpsimd_state));
fpsimd_flush_task_state(current);
set_thread_flag(TIF_FOREIGN_FPSTATE);
+ preempt_enable();
}

/*

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Bharat Potnuri <bha...@chelsio.com>

commit 65159c051c45f269cf40a14f9404248f2d524920 upstream.

Initializing cq_context with ev_queue in create_cq(), leads to NULL pointer
dereference in ib_uverbs_comp_handler(), if application doesnot use completion
channel. This patch fixes the cq_context initialization.

Fixes: 1e7710f3f65 ("IB/core: Change completion channel to use the reworked")
Signed-off-by: Potnuri Bharat Teja <bha...@chelsio.com>
Reviewed-by: Matan Barak <mat...@mellanox.com>
Signed-off-by: Doug Ledford <dled...@redhat.com>
(cherry picked from commit 699a2d5b1b880b4e4e1c7d55fa25659322cf5b51)
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/infiniband/core/uverbs_cmd.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/infiniband/core/uverbs_cmd.c
+++ b/drivers/infiniband/core/uverbs_cmd.c
@@ -1015,7 +1015,7 @@ static struct ib_ucq_object *create_cq(s
cq->uobject = &obj->uobject;
cq->comp_handler = ib_uverbs_comp_handler;
cq->event_handler = ib_uverbs_cq_event_handler;
- cq->cq_context = &ev_file->ev_queue;
+ cq->cq_context = ev_file ? &ev_file->ev_queue : NULL;
atomic_set(&cq->usecnt, 0);

obj->uobject.object = cq;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Sakari Ailus <sakari...@linux.intel.com>

commit b5212f57da145e53df790a7e211d94daac768bf8 upstream.

acpi_graph_get_child_prop_value() is intended to find a child node with a
certain property value pair. The check

if (!fwnode_property_read_u32(fwnode, prop_name, &nr))
continue;

is faulty: fwnode_property_read_u32() returns zero on success, not on
failure, leading to comparing values only if the searched property was not
found.

Moreover, the check is made against the parent device node instead of
the child one as it should be.

Fixes: 79389a83bc38 (ACPI / property: Add support for remote endpoints)
Reported-by: Hyungwoo Yang <hyungw...@intel.com>
Signed-off-by: Sakari Ailus <sakari...@linux.intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/acpi/property.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/drivers/acpi/property.c
+++ b/drivers/acpi/property.c
@@ -1046,7 +1046,7 @@ static struct fwnode_handle *acpi_graph_
fwnode_for_each_child_node(fwnode, child) {
u32 nr;

- if (!fwnode_property_read_u32(fwnode, prop_name, &nr))
+ if (fwnode_property_read_u32(child, prop_name, &nr))
continue;

if (val == nr)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Sakamoto <o-ta...@sakamocchi.jp>

commit dbd7396b4f24e0c3284fcc05f5def24f52c09884 upstream.

When failing sound card registration after initializing stream data, this
module leaves allocated data in stream data. This commit fixes the bug.

Fixes: 9b2bb4f2f4a2 ('ALSA: firewire-motu: add stream management functionality')
Signed-off-by: Takashi Sakamoto <o-ta...@sakamocchi.jp>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
sound/firewire/motu/motu.c | 1 +
1 file changed, 1 insertion(+)

--- a/sound/firewire/motu/motu.c
+++ b/sound/firewire/motu/motu.c
@@ -128,6 +128,7 @@ static void do_registration(struct work_
return;
error:
snd_motu_transaction_unregister(motu);
+ snd_motu_stream_destroy_duplex(motu);
snd_card_free(motu->card);
dev_info(&motu->unit->device,
"Sound card registration failed: %d\n", err);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Brodkin <abro...@synopsys.com>

commit b37174d95b0251611a80ef60abf03752e9d66d67 upstream.

c70c473396cb "ARCv2: SLC: Make sure busy bit is set properly on SLC flushing"
fixes problem for entire SLC operation where the problem was initially
caught. But given a nature of the issue it is perfectly possible for
busy bit to be read incorrectly even when region operation was started.

So extending initial fix for regional operation as well.

Signed-off-by: Alexey Brodkin <abro...@synopsys.com>
Signed-off-by: Vineet Gupta <vgu...@synopsys.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/arc/mm/cache.c | 3 +++
1 file changed, 3 insertions(+)

--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -697,6 +697,9 @@ noinline void slc_op(phys_addr_t paddr,
write_aux_reg(ARC_REG_SLC_RGN_END, (paddr + sz + l2_line_sz - 1));
write_aux_reg(ARC_REG_SLC_RGN_START, paddr);

+ /* Make sure "busy" bit reports correct stataus, see STAR 9001165532 */
+ read_aux_reg(ARC_REG_SLC_CTRL);
+
while (read_aux_reg(ARC_REG_SLC_CTRL) & SLC_CTRL_BUSY);

spin_unlock_irqrestore(&lock, flags);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Konstantin Khlebnikov <khleb...@yandex-team.ru>


[ Upstream commit 325d5dc3f7e7c2840b65e4a2988c082c2c0025c5 ]

When sfq_enqueue() drops head packet or packet from another queue it
have to update backlog at upper qdiscs too.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Konstantin Khlebnikov <khleb...@yandex-team.ru>
Acked-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/sched/sch_sfq.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -435,6 +435,7 @@ congestion_drop:
qdisc_drop(head, sch, to_free);

slot_queue_add(slot, skb);
+ qdisc_tree_reduce_backlog(sch, 0, delta);
return NET_XMIT_CN;
}

@@ -466,8 +467,10 @@ enqueue:
/* Return Congestion Notification only if we dropped a packet
* from this flow.
*/
- if (qlen != slot->qlen)
+ if (qlen != slot->qlen) {
+ qdisc_tree_reduce_backlog(sch, 0, dropped - qdisc_pkt_len(skb));
return NET_XMIT_CN;
+ }

/* As we dropped a packet, better let upper stack know this */
qdisc_tree_reduce_backlog(sch, 1, dropped);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ulf Hansson <ulf.h...@linaro.org>

commit a23318feeff662c8d25d21623daebdd2e55ec221 upstream.

The commit 8503ff166504 ("i2c: designware: Avoid unnecessary resuming
during system suspend"), may suggest to the PM core to try out the so
called direct_complete path for system sleep. In this path, the PM core
treats a runtime suspended device as it's already in a proper low power
state for system sleep, which makes it skip calling the system sleep
callbacks for the device, except for the ->prepare() and the ->complete()
callbacks.

However, the PM core may unset the direct_complete flag for a parent
device, in case its child device are being system suspended before. In this
scenario, the PM core invokes the system sleep callbacks, no matter if the
device is runtime suspended or not.

Particularly in cases of an existing i2c slave device, the above path is
triggered, which breaks the assumption that the i2c device is always
runtime resumed whenever the dw_i2c_plat_suspend() is being called.

More precisely, dw_i2c_plat_suspend() calls clk_core_disable() and
clk_core_unprepare(), for an already disabled/unprepared clock, leading to
a splat in the log about clocks calls being wrongly balanced and breaking
system sleep.

To still allow the direct_complete path in cases when it's possible, but
also to keep the fix simple, let's runtime resume the i2c device in the
->suspend() callback, before continuing to put the device into low power
state.

Note, in cases when the i2c device is attached to the ACPI PM domain, this
problem doesn't occur, because ACPI's ->suspend() callback, assigned to
acpi_subsys_suspend(), already calls pm_runtime_resume() for the device.

It should also be noted that this change does not fix commit 8503ff166504
("i2c: designware: Avoid unnecessary resuming during system suspend").
Because for the non-ACPI case, the system sleep support was already broken
prior that point.

Signed-off-by: Ulf Hansson <ulf.h...@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j...@intel.com>
Tested-by: John Stultz <john....@linaro.org>
Tested-by: Jarkko Nikula <jarkko...@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko...@linux.intel.com>
Reviewed-by: Mika Westerberg <mika.we...@linux.intel.com>
Signed-off-by: Wolfram Sang <w...@the-dreams.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/i2c/busses/i2c-designware-platdrv.c | 14 ++++++++++++--
1 file changed, 12 insertions(+), 2 deletions(-)

--- a/drivers/i2c/busses/i2c-designware-platdrv.c
+++ b/drivers/i2c/busses/i2c-designware-platdrv.c
@@ -392,7 +392,7 @@ static void dw_i2c_plat_complete(struct
#endif

#ifdef CONFIG_PM
-static int dw_i2c_plat_suspend(struct device *dev)
+static int dw_i2c_plat_runtime_suspend(struct device *dev)
{
struct platform_device *pdev = to_platform_device(dev);
struct dw_i2c_dev *i_dev = platform_get_drvdata(pdev);
@@ -414,11 +414,21 @@ static int dw_i2c_plat_resume(struct dev
return 0;
}

+#ifdef CONFIG_PM_SLEEP
+static int dw_i2c_plat_suspend(struct device *dev)
+{
+ pm_runtime_resume(dev);
+ return dw_i2c_plat_runtime_suspend(dev);
+}
+#endif
+
static const struct dev_pm_ops dw_i2c_dev_pm_ops = {
.prepare = dw_i2c_plat_prepare,
.complete = dw_i2c_plat_complete,
SET_SYSTEM_SLEEP_PM_OPS(dw_i2c_plat_suspend, dw_i2c_plat_resume)
- SET_RUNTIME_PM_OPS(dw_i2c_plat_suspend, dw_i2c_plat_resume, NULL)
+ SET_RUNTIME_PM_OPS(dw_i2c_plat_runtime_suspend,
+ dw_i2c_plat_resume,
+ NULL)
};

#define DW_I2C_DEV_PMOPS (&dw_i2c_dev_pm_ops)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Chunyu Hu <ch...@redhat.com>

commit 475bb3c69ab05df2a6ecef6acc2393703d134180 upstream.

kmemleak reported the below leak when I was doing clear of the hist
trigger. With this patch, the kmeamleak is gone.

unreferenced object 0xffff94322b63d760 (size 32):
comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
hex dump (first 32 bytes):
00 01 00 00 04 00 00 00 08 00 00 00 ff 00 00 00 ................
10 00 00 00 00 00 00 00 80 a8 7a f2 31 94 ff ff ..........z.1...
backtrace:
[<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff9e424cba>] kmem_cache_alloc_trace+0xca/0x1d0
[<ffffffff9e377736>] tracing_map_array_alloc+0x26/0x140
[<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
[<ffffffff9e38b935>] create_hist_data+0x535/0x750
[<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
[<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
[<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
[<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
[<ffffffff9e450b85>] SyS_write+0x55/0xc0
[<ffffffff9e203857>] do_syscall_64+0x67/0x150
[<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
unreferenced object 0xffff9431f27aa880 (size 128):
comm "bash", pid 1522, jiffies 4403687962 (age 2442.311s)
hex dump (first 32 bytes):
00 00 8c 2a 32 94 ff ff 00 f0 8b 2a 32 94 ff ff ...*2......*2...
00 e0 8b 2a 32 94 ff ff 00 d0 8b 2a 32 94 ff ff ...*2......*2...
backtrace:
[<ffffffff9e96c27a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff9e425348>] __kmalloc+0xe8/0x220
[<ffffffff9e3777c1>] tracing_map_array_alloc+0xb1/0x140
[<ffffffff9e261be0>] kretprobe_trampoline+0x0/0x50
[<ffffffff9e38b935>] create_hist_data+0x535/0x750
[<ffffffff9e38bd47>] event_hist_trigger_func+0x1f7/0x420
[<ffffffff9e38893d>] event_trigger_write+0xfd/0x1a0
[<ffffffff9e44dfc7>] __vfs_write+0x37/0x170
[<ffffffff9e44f552>] vfs_write+0xb2/0x1b0
[<ffffffff9e450b85>] SyS_write+0x55/0xc0
[<ffffffff9e203857>] do_syscall_64+0x67/0x150
[<ffffffff9e977ce7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff

Link: http://lkml.kernel.org/r/1502705898-27571-1-...@redhat.com

Fixes: 08d43a5fa063 ("tracing: Add lock-free tracing_map")
Signed-off-by: Chunyu Hu <ch...@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <ros...@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/trace/tracing_map.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

--- a/kernel/trace/tracing_map.c
+++ b/kernel/trace/tracing_map.c
@@ -221,16 +221,19 @@ void tracing_map_array_free(struct traci
if (!a)
return;

- if (!a->pages) {
- kfree(a);
- return;
- }
+ if (!a->pages)
+ goto free;

for (i = 0; i < a->n_pages; i++) {
if (!a->pages[i])
break;
free_page((unsigned long)a->pages[i]);
}
+
+ kfree(a->pages);
+
+ free:
+ kfree(a);
}

struct tracing_map_array *tracing_map_array_alloc(unsigned int n_elts,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit 187e5b3ac84d3421d2de3aca949b2791fbcad554 ]

If fi->fib_metrics could not be allocated in fib_create_info()
we attempt to dereference a NULL pointer in free_fib_info_rcu() :

m = fi->fib_metrics;
if (m != &dst_default_metrics && atomic_dec_and_test(&m->refcnt))
kfree(m);

Before my recent patch, we used to call kfree(NULL) and nothing wrong
happened.

Instead of using RCU to defer freeing while we are under memory stress,
it seems better to take immediate action.

This was reported by syzkaller team.

Fixes: 3fb07daff8e9 ("ipv4: add reference counting to metrics")
Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/ipv4/fib_semantics.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)

--- a/net/ipv4/fib_semantics.c
+++ b/net/ipv4/fib_semantics.c
@@ -1033,15 +1033,17 @@ struct fib_info *fib_create_info(struct
fi = kzalloc(sizeof(*fi)+nhs*sizeof(struct fib_nh), GFP_KERNEL);
if (!fi)
goto failure;
- fib_info_cnt++;
if (cfg->fc_mx) {
fi->fib_metrics = kzalloc(sizeof(*fi->fib_metrics), GFP_KERNEL);
- if (!fi->fib_metrics)
- goto failure;
+ if (unlikely(!fi->fib_metrics)) {
+ kfree(fi);
+ return ERR_PTR(err);
+ }
atomic_set(&fi->fib_metrics->refcnt, 1);
- } else
+ } else {
fi->fib_metrics = (struct dst_metrics *)&dst_default_metrics;
-
+ }
+ fib_info_cnt++;
fi->fib_net = net;
fi->fib_protocol = cfg->fc_protocol;
fi->fib_scope = cfg->fc_scope;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alex Deucher <alexd...@gmail.com>

This reverts commit 2dc1889ebf8501b0edf125e89a30e1cf3744a2a7.

Fixes a suspend and resume regression.

bug: https://bugzilla.kernel.org/show_bug.cgi?id=196615
Signed-off-by: Alex Deucher <alexande...@amd.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c | 2 --
1 file changed, 2 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cgs.c
@@ -839,8 +839,6 @@ static int amdgpu_cgs_get_active_display

mode_info = info->mode_info;
if (mode_info) {
- /* if the displays are off, vblank time is max */
- mode_info->vblank_time_us = 0xffffffff;
/* always set the reference clock */
mode_info->ref_clock = adev->clock.spll.reference_freq;
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <heiko.c...@de.ibm.com>

commit 857b8de96795646c5891cf44ae6fb19b9ff74bf9 upstream.

sthyi should only generate a specification exception if the function
code is zero and the response buffer is not on a 4k boundary.

The current code would also test for unknown function codes if the
response buffer, that is currently only defined for function code 0,
is not on a 4k boundary and incorrectly inject a specification
exception instead of returning with condition code 3 and return code 4
(unsupported function code).

Fix this by moving the boundary check.

Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <fra...@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.c...@de.ibm.com>
Reviewed-by: David Hildenbrand <da...@redhat.com>
Reviewed-by: Cornelia Huck <coh...@redhat.com>
Signed-off-by: Christian Borntraeger <bornt...@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/s390/kvm/sthyi.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

--- a/arch/s390/kvm/sthyi.c
+++ b/arch/s390/kvm/sthyi.c
@@ -425,7 +425,7 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
VCPU_EVENT(vcpu, 3, "STHYI: fc: %llu addr: 0x%016llx", code, addr);
trace_kvm_s390_handle_sthyi(vcpu, code, addr);

- if (reg1 == reg2 || reg1 & 1 || reg2 & 1 || addr & ~PAGE_MASK)
+ if (reg1 == reg2 || reg1 & 1 || reg2 & 1)
return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);

if (code & 0xffff) {
@@ -433,6 +433,9 @@ int handle_sthyi(struct kvm_vcpu *vcpu)
goto out;
}

+ if (addr & ~PAGE_MASK)
+ return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION);
+
/*
* If the page has not yet been faulted in, we want to do that
* now and not after all the expensive calculations.

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andreas Born <futur...@googlemail.com>


[ Upstream commit 11e9d7829dd08dbafb24517fe922f11c3a8a9dc2 ]

bond_miimon_commit() handles the UP transition for each slave of a bond
in the case of MII. It is triggered 10 times per second for the default
MII Polling interval of 100ms. For device drivers that do not implement
__ethtool_get_link_ksettings() the call to bond_update_speed_duplex()
fails persistently while the MII status could remain UP. That is, in
this and other cases where the speed/duplex update keeps failing over a
longer period of time while the MII state is UP, a warning is printed
every MII polling interval.

To address these excessive warnings net_ratelimit() should be used.
Printing a warning once would not be sufficient since the call to
bond_update_speed_duplex() could recover to succeed and fail again
later. In that case there would be no new indication what went wrong.

Fixes: b5bf0f5b16b9c (bonding: correctly update link status during mii-commit phase)
Signed-off-by: Andreas Born <futur...@googlemail.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
drivers/net/bonding/bond_main.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -2141,9 +2141,10 @@ static void bond_miimon_commit(struct bo
if (bond_update_speed_duplex(slave) &&
bond_needs_speed_duplex(bond)) {
slave->link = BOND_LINK_DOWN;
- netdev_warn(bond->dev,
- "failed to get link speed/duplex for %s\n",
- slave->dev->name);
+ if (net_ratelimit())
+ netdev_warn(bond->dev,
+ "failed to get link speed/duplex for %s\n",
+ slave->dev->name);
continue;
}
bond_set_slave_link_state(slave, BOND_LINK_UP,

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:09 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Biggers <ebig...@google.com>

commit 263630e8d176d87308481ebdcd78ef9426739c6b upstream.

If madvise(..., MADV_FREE) split a transparent hugepage, it called
put_page() before unlock_page().

This was wrong because put_page() can free the page, e.g. if a
concurrent madvise(..., MADV_DONTNEED) has removed it from the memory
mapping. put_page() then rightfully complained about freeing a locked
page.

Fix this by moving the unlock_page() before put_page().

This bug was found by syzkaller, which encountered the following splat:

BUG: Bad page state in process syzkaller412798 pfn:1bd800
page:ffffea0006f60000 count:0 mapcount:0 mapping: (null) index:0x20a00
flags: 0x200000000040019(locked|uptodate|dirty|swapbacked)
raw: 0200000000040019 0000000000000000 0000000000020a00 00000000ffffffff
raw: ffffea0006f60020 ffffea0006f60020 0000000000000000 0000000000000000
page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
bad because of flags: 0x1(locked)
Modules linked in:
CPU: 1 PID: 3037 Comm: syzkaller412798 Not tainted 4.13.0-rc5+ #35
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
bad_page+0x230/0x2b0 mm/page_alloc.c:565
free_pages_check_bad+0x1f0/0x2e0 mm/page_alloc.c:943
free_pages_check mm/page_alloc.c:952 [inline]
free_pages_prepare mm/page_alloc.c:1043 [inline]
free_pcp_prepare mm/page_alloc.c:1068 [inline]
free_hot_cold_page+0x8cf/0x12b0 mm/page_alloc.c:2584
__put_single_page mm/swap.c:79 [inline]
__put_page+0xfb/0x160 mm/swap.c:113
put_page include/linux/mm.h:814 [inline]
madvise_free_pte_range+0x137a/0x1ec0 mm/madvise.c:371
walk_pmd_range mm/pagewalk.c:50 [inline]
walk_pud_range mm/pagewalk.c:108 [inline]
walk_p4d_range mm/pagewalk.c:134 [inline]
walk_pgd_range mm/pagewalk.c:160 [inline]
__walk_page_range+0xc3a/0x1450 mm/pagewalk.c:249
walk_page_range+0x200/0x470 mm/pagewalk.c:326
madvise_free_page_range.isra.9+0x17d/0x230 mm/madvise.c:444
madvise_free_single_vma+0x353/0x580 mm/madvise.c:471
madvise_dontneed_free mm/madvise.c:555 [inline]
madvise_vma mm/madvise.c:664 [inline]
SYSC_madvise mm/madvise.c:832 [inline]
SyS_madvise+0x7d3/0x13c0 mm/madvise.c:760
entry_SYSCALL_64_fastpath+0x1f/0xbe

Here is a C reproducer:

#define _GNU_SOURCE
#include <pthread.h>
#include <sys/mman.h>
#include <unistd.h>

#define MADV_FREE 8
#define PAGE_SIZE 4096

static void *mapping;
static const size_t mapping_size = 0x1000000;

static void *madvise_thrproc(void *arg)
{
madvise(mapping, mapping_size, (long)arg);
}

int main(void)
{
pthread_t t[2];

for (;;) {
mapping = mmap(NULL, mapping_size, PROT_WRITE,
MAP_POPULATE|MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);

munmap(mapping + mapping_size / 2, PAGE_SIZE);

pthread_create(&t[0], 0, madvise_thrproc, (void*)MADV_DONTNEED);
pthread_create(&t[1], 0, madvise_thrproc, (void*)MADV_FREE);
pthread_join(t[0], NULL);
pthread_join(t[1], NULL);
munmap(mapping, mapping_size);
}
}

Note: to see the splat, CONFIG_TRANSPARENT_HUGEPAGE=y and
CONFIG_DEBUG_VM=y are needed.

Google Bug Id: 64696096

Link: http://lkml.kernel.org/r/20170823205235.1...@gmail.com
Fixes: 854e9ed09ded ("mm: support madvise(MADV_FREE)")
Signed-off-by: Eric Biggers <ebig...@google.com>
Acked-by: David Rientjes <rien...@google.com>
Acked-by: Minchan Kim <min...@kernel.org>
Acked-by: Michal Hocko <mho...@suse.com>
Cc: Dmitry Vyukov <dvy...@google.com>
Cc: Hugh Dickins <hu...@google.com>
Cc: Andrea Arcangeli <aarc...@redhat.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
mm/madvise.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -368,8 +368,8 @@ static int madvise_free_pte_range(pmd_t
pte_offset_map_lock(mm, pmd, addr, &ptl);
goto out;
}
- put_page(page);
unlock_page(page);
+ put_page(page);
pte = pte_offset_map_lock(mm, pmd, addr, &ptl);
pte--;
addr -= PAGE_SIZE;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:11 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Kirill A. Shutemov <kirill....@linux.intel.com>

commit 435c0b87d661da83771c30ed775f7c37eed193fb upstream.

/sys/kernel/mm/transparent_hugepage/shmem_enabled controls if we want
to allocate huge pages when allocate pages for private in-kernel shmem
mount.

Unfortunately, as Dan noticed, I've screwed it up and the only way to
make kernel allocate huge page for the mount is to use "force" there.
All other values will be effectively ignored.

Link: http://lkml.kernel.org/r/20170822144254.6643...@linux.intel.com
Fixes: 5a6e75f8110c ("shmem: prepare huge= mount option and sysfs knob")
Signed-off-by: Kirill A. Shutemov <kirill....@linux.intel.com>
Reported-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
mm/shmem.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -3964,7 +3964,7 @@ int __init shmem_init(void)
}

#ifdef CONFIG_TRANSPARENT_HUGE_PAGECACHE
- if (has_transparent_hugepage() && shmem_huge < SHMEM_HUGE_DENY)
+ if (has_transparent_hugepage() && shmem_huge > SHMEM_HUGE_DENY)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
else
shmem_huge = 0; /* just in case it was patched */
@@ -4025,7 +4025,7 @@ static ssize_t shmem_enabled_store(struc
return -EINVAL;

shmem_huge = huge;
- if (shmem_huge < SHMEM_HUGE_DENY)
+ if (shmem_huge > SHMEM_HUGE_DENY)
SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
return count;
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:11 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: KT Liao <kt....@emc.com.tw>

commit 1d2226e45040ed4aee95b633cbd64702bf7fc2a1 upstream.

Add ELAN0602 to the list of known ACPI IDs to enable support for ELAN
touchpads found in Lenovo Yoga310.

Signed-off-by: KT Liao <kt....@emc.com.tw>
Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
drivers/input/mouse/elan_i2c_core.c | 1 +
1 file changed, 1 insertion(+)

--- a/drivers/input/mouse/elan_i2c_core.c
+++ b/drivers/input/mouse/elan_i2c_core.c
@@ -1223,6 +1223,7 @@ static const struct acpi_device_id elan_
{ "ELAN0000", 0 },
{ "ELAN0100", 0 },
{ "ELAN0600", 0 },
+ { "ELAN0602", 0 },
{ "ELAN0605", 0 },
{ "ELAN0608", 0 },
{ "ELAN0605", 0 },

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit 7749d4ff88d31b0be17c8683143135adaaadc6a7 ]

syzkaller reported that DCCP could have a non empty
write queue at dismantle time.

WARNING: CPU: 1 PID: 2953 at net/core/stream.c:199 sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
Kernel panic - not syncing: panic_on_warn set ...

CPU: 1 PID: 2953 Comm: syz-executor0 Not tainted 4.13.0-rc4+ #2
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
panic+0x1e4/0x417 kernel/panic.c:180
__warn+0x1c4/0x1d9 kernel/panic.c:541
report_bug+0x211/0x2d0 lib/bug.c:183
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:846
RIP: 0010:sk_stream_kill_queues+0x3ce/0x520 net/core/stream.c:199
RSP: 0018:ffff8801d182f108 EFLAGS: 00010297
RAX: ffff8801d1144140 RBX: ffff8801d13cb280 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff85137b00 RDI: ffff8801d13cb280
RBP: ffff8801d182f148 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff8801d13cb4d0
R13: ffff8801d13cb3b8 R14: ffff8801d13cb300 R15: ffff8801d13cb3b8
inet_csk_destroy_sock+0x175/0x3f0 net/ipv4/inet_connection_sock.c:835
dccp_close+0x84d/0xc10 net/dccp/proto.c:1067
inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
sock_release+0x8d/0x1e0 net/socket.c:597
sock_close+0x16/0x20 net/socket.c:1126
__fput+0x327/0x7e0 fs/file_table.c:210
____fput+0x15/0x20 fs/file_table.c:246
task_work_run+0x18a/0x260 kernel/task_work.c:116
exit_task_work include/linux/task_work.h:21 [inline]
do_exit+0xa32/0x1b10 kernel/exit.c:865
do_group_exit+0x149/0x400 kernel/exit.c:969
get_signal+0x7e8/0x17e0 kernel/signal.c:2330
do_signal+0x94/0x1ee0 arch/x86/kernel/signal.c:808
exit_to_usermode_loop+0x21c/0x2d0 arch/x86/entry/common.c:157
prepare_exit_to_usermode arch/x86/entry/common.c:194 [inline]
syscall_return_slowpath+0x3a7/0x450 arch/x86/entry/common.c:263

Signed-off-by: Eric Dumazet <edum...@google.com>
Reported-by: Dmitry Vyukov <dvy...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/dccp/proto.c | 5 +----
1 file changed, 1 insertion(+), 4 deletions(-)

--- a/net/dccp/proto.c
+++ b/net/dccp/proto.c
@@ -201,10 +201,7 @@ void dccp_destroy_sock(struct sock *sk)
{
struct dccp_sock *dp = dccp_sk(sk);

- /*
- * DCCP doesn't use sk_write_queue, just sk_send_head
- * for retransmissions
- */
+ __skb_queue_purge(&sk->sk_write_queue);
if (sk->sk_send_head != NULL) {
kfree_skb(sk->sk_send_head);
sk->sk_send_head = NULL;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Ross Zwisler <ross.z...@linux.intel.com>

commit fffa281b48a91ad6dac1a18c5907ece58fa3879b upstream.

In DAX there are two separate places where the 2MiB range of a PMD is
defined.

The first is in the page tables, where a PMD mapping inserted for a
given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
PMD_MASK) + PMD_SIZE - 1). That is, from the 2MiB boundary below the
address to the 2MiB boundary above the address.

So, for example, a fault at address 3MiB (0x30 0000) falls within the
PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).

The second PMD range is in the mapping->page_tree, where a given file
offset is covered by a radix tree entry that spans from one 2MiB aligned
file offset to another 2MiB aligned file offset.

So, for example, the file offset for 3MiB (pgoff 768) falls within the
PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
512) to 4MiB (pgoff 1024).

This system works so long as the addresses and file offsets for a given
mapping both have the same offsets relative to the start of each PMD.

Consider the case where the starting address for a given file isn't 2MiB
aligned - say our faulting address is 3 MiB (0x30 0000), but that
corresponds to the beginning of our file (pgoff 0). Now all the PMDs in
the mapping are misaligned so that the 2MiB range defined in the page
tables never matches up with the 2MiB range defined in the radix tree.

The current code notices this case for DAX faults to storage with the
following test in dax_pmd_insert_mapping():

if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
goto unlock_fallback;

This test makes sure that the pfn we get from the driver is 2MiB
aligned, and relies on the assumption that the 2MiB alignment of the pfn
we get back from the driver matches the 2MiB alignment of the faulting
address.

However, faults to holes were not checked and we could hit the problem
described above.

This was reported in response to the NVML nvml/src/test/pmempool_sync
TEST5:

$ cd nvml/src/test/pmempool_sync
$ make TEST5

You can grab NVML here:

https://github.com/pmem/nvml/

The dmesg warning you see when you hit this error is:

WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310

Where we notice in dax_insert_mapping_entry() that the radix tree entry
we are about to replace doesn't match the locked entry that we had
previously inserted into the tree. This happens because the initial
insertion was done in grab_mapping_entry() using a pgoff calculated from
the faulting address (vmf->address), and the replacement in
dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
vmf->pgoff.

In our failure case those two page offsets (one calculated from
vmf->address, one using vmf->pgoff) point to different order 9 radix
tree entries.

This failure case can result in a deadlock because the radix tree unlock
also happens on the pgoff calculated from vmf->address. This means that
the locked radix tree entry that we swapped in to the tree in
dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
future faults to that 2MiB range will block forever.

Fix this by validating that the faulting address's PMD offset matches
the PMD offset from the start of the file. This check is done at the
very beginning of the fault and covers faults that would have mapped to
storage as well as faults to holes. I left the COLOUR check in
dax_pmd_insert_mapping() in place in case we ever hit the insanity
condition where the alignment of the pfn we get from the driver doesn't
match the alignment of the userspace address.

Link: http://lkml.kernel.org/r/20170822222436.18...@linux.intel.com
Signed-off-by: Ross Zwisler <ross.z...@linux.intel.com>
Reported-by: "Slusarz, Marcin" <marcin....@intel.com>
Reviewed-by: Jan Kara <ja...@suse.cz>
Cc: Alexander Viro <vi...@zeniv.linux.org.uk>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Dan Williams <dan.j.w...@intel.com>
Cc: Dave Chinner <da...@fromorbit.com>
Cc: Matthew Wilcox <mawi...@microsoft.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
fs/dax.c | 10 ++++++++++
1 file changed, 10 insertions(+)

--- a/fs/dax.c
+++ b/fs/dax.c
@@ -1380,6 +1380,16 @@ static int dax_iomap_pmd_fault(struct vm

trace_dax_pmd_fault(inode, vmf, max_pgoff, 0);

+ /*
+ * Make sure that the faulting address's PMD offset (color) matches
+ * the PMD offset from the start of the file. This is necessary so
+ * that a PMD range in the page table overlaps exactly with a PMD
+ * range in the radix tree.
+ */
+ if ((vmf->pgoff & PG_PMD_COLOUR) !=
+ ((vmf->address >> PAGE_SHIFT) & PG_PMD_COLOUR))
+ goto fallback;
+
/* Fall back to PTEs if we're going to COW */
if (write && !(vma->vm_flags & VM_SHARED))
goto fallback;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Alexey Brodkin <Alexey....@synopsys.com>

commit 7d79cee2c6540ea64dd917a14e2fd63d4ac3d3c0 upstream.

It is necessary to explicitly set both SLC_AUX_RGN_START1 and SLC_AUX_RGN_END1
which hold MSB bits of the physical address correspondingly of region start
and end otherwise SLC region operation is executed in unpredictable manner

Without this patch, SLC flushes on HSDK (IOC disabled) were taking
seconds.

Reported-by: Vladimir Kondratiev <vladimir....@intel.com>
Signed-off-by: Alexey Brodkin <abro...@synopsys.com>
Signed-off-by: Vineet Gupta <vgu...@synopsys.com>
[vgupta: PAR40 regs only written if PAE40 exist]
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/arc/include/asm/cache.h | 2 ++
arch/arc/mm/cache.c | 13 +++++++++++--
2 files changed, 13 insertions(+), 2 deletions(-)

--- a/arch/arc/include/asm/cache.h
+++ b/arch/arc/include/asm/cache.h
@@ -96,7 +96,9 @@ extern unsigned long perip_base, perip_e
#define ARC_REG_SLC_FLUSH 0x904
#define ARC_REG_SLC_INVALIDATE 0x905
#define ARC_REG_SLC_RGN_START 0x914
+#define ARC_REG_SLC_RGN_START1 0x915
#define ARC_REG_SLC_RGN_END 0x916
+#define ARC_REG_SLC_RGN_END1 0x917

/* Bit val in SLC_CONTROL */
#define SLC_CTRL_DIS 0x001
--- a/arch/arc/mm/cache.c
+++ b/arch/arc/mm/cache.c
@@ -665,6 +665,7 @@ noinline void slc_op(phys_addr_t paddr,
static DEFINE_SPINLOCK(lock);
unsigned long flags;
unsigned int ctrl;
+ phys_addr_t end;

spin_lock_irqsave(&lock, flags);

@@ -694,8 +695,16 @@ noinline void slc_op(phys_addr_t paddr,
* END needs to be setup before START (latter triggers the operation)
* END can't be same as START, so add (l2_line_sz - 1) to sz
*/
- write_aux_reg(ARC_REG_SLC_RGN_END, (paddr + sz + l2_line_sz - 1));
- write_aux_reg(ARC_REG_SLC_RGN_START, paddr);
+ end = paddr + sz + l2_line_sz - 1;
+ if (is_pae40_enabled())
+ write_aux_reg(ARC_REG_SLC_RGN_END1, upper_32_bits(end));
+
+ write_aux_reg(ARC_REG_SLC_RGN_END, lower_32_bits(end));
+
+ if (is_pae40_enabled())
+ write_aux_reg(ARC_REG_SLC_RGN_START1, upper_32_bits(paddr));
+
+ write_aux_reg(ARC_REG_SLC_RGN_START, lower_32_bits(paddr));

/* Make sure "busy" bit reports correct stataus, see STAR 9001165532 */
read_aux_reg(ARC_REG_SLC_CTRL);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Steven Rostedt (VMware) <ros...@goodmis.org>

commit 8b0db1a5bdfcee0dbfa89607672598ae203c9045 upstream.

Performing the following task with kmemleak enabled:

# cd /sys/kernel/tracing/events/irq/irq_handler_entry/
# echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
# echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800b9290308 (size 32):
comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
[<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
[<ffffffff812639c9>] create_filter+0xa9/0x160
[<ffffffff81263bdc>] create_event_filter+0xc/0x10
[<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
[<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
[<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
[<ffffffff8138cf87>] __vfs_write+0xd7/0x380
[<ffffffff8138f421>] vfs_write+0x101/0x260
[<ffffffff8139187b>] SyS_write+0xab/0x130
[<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
[<ffffffffffffffff>] 0xffffffffffffffff

The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.

Link: http://lkml.kernel.org/r/1502705898-27571-2-...@redhat.com

Fixes: 38b78eb85 ("tracing: Factorize filter creation")
Reported-by: Chunyu Hu <ch...@redhat.com>
Tested-by: Chunyu Hu <ch...@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <ros...@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
kernel/trace/trace_events_filter.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/kernel/trace/trace_events_filter.c
+++ b/kernel/trace/trace_events_filter.c
@@ -1959,6 +1959,10 @@ static int create_filter(struct trace_ev
if (err && set_str)
append_filter_err(ps, filter);
}
+ if (err && !set_str) {
+ free_event_filter(filter);
+ filter = NULL;
+ }
create_filter_finish(ps);

*filterp = filter;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:20:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Heiko Carstens <heiko.c...@de.ibm.com>

commit 4a4eefcd0e49f9f339933324c1bde431186a0a7d upstream.

The sthyi inline assembly misses register r3 within the clobber
list. The sthyi instruction will always write a return code to
register "R2+1", which in this case would be r3. Due to that we may
have register corruption and see host crashes or data corruption
depending on how gcc decided to allocate and use registers during
compile time.

Fixes: 95ca2cb57985 ("KVM: s390: Add sthyi emulation")
Reviewed-by: Janosch Frank <fra...@linux.vnet.ibm.com>
Signed-off-by: Heiko Carstens <heiko.c...@de.ibm.com>
Reviewed-by: David Hildenbrand <da...@redhat.com>
Reviewed-by: Cornelia Huck <coh...@redhat.com>
Signed-off-by: Christian Borntraeger <bornt...@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

---
arch/s390/kvm/sthyi.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/s390/kvm/sthyi.c
+++ b/arch/s390/kvm/sthyi.c
@@ -394,7 +394,7 @@ static int sthyi(u64 vaddr)
"srl %[cc],28\n"
: [cc] "=d" (cc)
: [code] "d" (code), [addr] "a" (addr)
- : "memory", "cc");
+ : "3", "memory", "cc");
return cc;
}

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: John Fastabend <john.fa...@gmail.com>


[ Upstream commit 43188702b3d98d2792969a3377a30957f05695e6 ]

Currently the verifier does not track imm across alu operations when
the source register is of unknown type. This adds additional pattern
matching to catch this and track imm. We've seen LLVM generating this
pattern while working on cilium.

Signed-off-by: John Fastabend <john.fa...@gmail.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Acked-by: Alexei Starovoitov <a...@kernel.org>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
kernel/bpf/verifier.c | 62 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 62 insertions(+)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1650,6 +1650,65 @@ static int evaluate_reg_alu(struct bpf_v
return 0;
}

+static int evaluate_reg_imm_alu_unknown(struct bpf_verifier_env *env,
+ struct bpf_insn *insn)
+{
+ struct bpf_reg_state *regs = env->cur_state.regs;
+ struct bpf_reg_state *dst_reg = &regs[insn->dst_reg];
+ struct bpf_reg_state *src_reg = &regs[insn->src_reg];
+ u8 opcode = BPF_OP(insn->code);
+ s64 imm_log2 = __ilog2_u64((long long)dst_reg->imm);
+
+ /* BPF_X code with src_reg->type UNKNOWN_VALUE here. */
+ if (src_reg->imm > 0 && dst_reg->imm) {
+ switch (opcode) {
+ case BPF_ADD:
+ /* dreg += sreg
+ * where both have zero upper bits. Adding them
+ * can only result making one more bit non-zero
+ * in the larger value.
+ * Ex. 0xffff (imm=48) + 1 (imm=63) = 0x10000 (imm=47)
+ * 0xffff (imm=48) + 0xffff = 0x1fffe (imm=47)
+ */
+ dst_reg->imm = min(src_reg->imm, 63 - imm_log2);
+ dst_reg->imm--;
+ break;
+ case BPF_AND:
+ /* dreg &= sreg
+ * AND can not extend zero bits only shrink
+ * Ex. 0x00..00ffffff
+ * & 0x0f..ffffffff
+ * ----------------
+ * 0x00..00ffffff
+ */
+ dst_reg->imm = max(src_reg->imm, 63 - imm_log2);
+ break;
+ case BPF_OR:
+ /* dreg |= sreg
+ * OR can only extend zero bits
+ * Ex. 0x00..00ffffff
+ * | 0x0f..ffffffff
+ * ----------------
+ * 0x0f..00ffffff
+ */
+ dst_reg->imm = min(src_reg->imm, 63 - imm_log2);
+ break;
+ case BPF_SUB:
+ case BPF_MUL:
+ case BPF_RSH:
+ case BPF_LSH:
+ /* These may be flushed out later */
+ default:
+ mark_reg_unknown_value(regs, insn->dst_reg);
+ }
+ } else {
+ mark_reg_unknown_value(regs, insn->dst_reg);
+ }
+
+ dst_reg->type = UNKNOWN_VALUE;
+ return 0;
+}
+
static int evaluate_reg_imm_alu(struct bpf_verifier_env *env,
struct bpf_insn *insn)
{
@@ -1659,6 +1718,9 @@ static int evaluate_reg_imm_alu(struct b
u8 opcode = BPF_OP(insn->code);
u64 dst_imm = dst_reg->imm;

+ if (BPF_SRC(insn->code) == BPF_X && src_reg->type == UNKNOWN_VALUE)
+ return evaluate_reg_imm_alu_unknown(env, insn);
+
/* dst_reg->type == CONST_IMM here. Simulate execution of insns
* containing ALU ops. Don't care about overflow or negative
* values, just add/sub/... them; registers are in u64.

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Xin Long <lucie...@gmail.com>


[ Upstream commit 4f8a881acc9d1adaf1e552349a0b1df28933a04c ]

As we know in some target's checkentry it may dereference par.entryinfo
to check entry stuff inside. But when sched action calls xt_check_target,
par.entryinfo is set with NULL. It would cause kernel panic when calling
some targets.

It can be reproduce with:
# tc qd add dev eth1 ingress handle ffff:
# tc filter add dev eth1 parent ffff: u32 match u32 0 0 action xt \
-j ECN --ecn-tcp-remove

It could also crash kernel when using target CLUSTERIP or TPROXY.

By now there's no proper value for par.entryinfo in ipt_init_target,
but it can not be set with NULL. This patch is to void all these
panics by setting it with an ipt_entry obj with all members = 0.

Note that this issue has been there since the very beginning.

Signed-off-by: Xin Long <lucie...@gmail.com>
Acked-by: Pablo Neira Ayuso <pa...@netfilter.org>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/sched/act_ipt.c | 2 ++
1 file changed, 2 insertions(+)

--- a/net/sched/act_ipt.c
+++ b/net/sched/act_ipt.c
@@ -41,6 +41,7 @@ static int ipt_init_target(struct net *n
{
struct xt_tgchk_param par;
struct xt_target *target;
+ struct ipt_entry e = {};
int ret = 0;

target = xt_request_find_target(AF_INET, t->u.user.name,
@@ -52,6 +53,7 @@ static int ipt_init_target(struct net *n
memset(&par, 0, sizeof(par));
par.net = net;
par.table = table;
+ par.entryinfo = &e;
par.target = target;
par.targinfo = t->data;
par.hook_mask = hook;

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Wei Wang <wei...@google.com>


[ Upstream commit 383143f31d7d3525a1dbff733d52fff917f82f15 ]

syzcaller reported the following use-after-free issue in rt6_select():
BUG: KASAN: use-after-free in rt6_select net/ipv6/route.c:755 [inline] at addr ffff8800bc6994e8
BUG: KASAN: use-after-free in ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084 at addr ffff8800bc6994e8
Read of size 4 by task syz-executor1/439628
CPU: 0 PID: 439628 Comm: syz-executor1 Not tainted 4.3.5+ #8
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
0000000000000000 ffff88018fe435b0 ffffffff81ca384d ffff8801d3588c00
ffff8800bc699380 ffff8800bc699500 dffffc0000000000 ffff8801d40a47c0
ffff88018fe435d8 ffffffff81735751 ffff88018fe43660 ffff8800bc699380
Call Trace:
[<ffffffff81ca384d>] __dump_stack lib/dump_stack.c:15 [inline]
[<ffffffff81ca384d>] dump_stack+0xc1/0x124 lib/dump_stack.c:51
sctp: [Deprecated]: syz-executor0 (pid 439615) Use of struct sctp_assoc_value in delayed_ack socket option.
Use struct sctp_sack_info instead
[<ffffffff81735751>] kasan_object_err+0x21/0x70 mm/kasan/report.c:158
[<ffffffff817359c4>] print_address_description mm/kasan/report.c:196 [inline]
[<ffffffff817359c4>] kasan_report_error+0x1b4/0x4a0 mm/kasan/report.c:285
[<ffffffff81735d93>] kasan_report mm/kasan/report.c:305 [inline]
[<ffffffff81735d93>] __asan_report_load4_noabort+0x43/0x50 mm/kasan/report.c:325
[<ffffffff82a28e39>] rt6_select net/ipv6/route.c:755 [inline]
[<ffffffff82a28e39>] ip6_pol_route.isra.46+0x1429/0x1470 net/ipv6/route.c:1084
[<ffffffff82a28fb1>] ip6_pol_route_output+0x81/0xb0 net/ipv6/route.c:1203
[<ffffffff82ab0a50>] fib6_rule_action+0x1f0/0x680 net/ipv6/fib6_rules.c:95
[<ffffffff8265cbb6>] fib_rules_lookup+0x2a6/0x7a0 net/core/fib_rules.c:223
[<ffffffff82ab1430>] fib6_rule_lookup+0xd0/0x250 net/ipv6/fib6_rules.c:41
[<ffffffff82a22006>] ip6_route_output+0x1d6/0x2c0 net/ipv6/route.c:1224
[<ffffffff829e83d2>] ip6_dst_lookup_tail+0x4d2/0x890 net/ipv6/ip6_output.c:943
[<ffffffff829e889a>] ip6_dst_lookup_flow+0x9a/0x250 net/ipv6/ip6_output.c:1079
[<ffffffff82a9f7d8>] ip6_datagram_dst_update+0x538/0xd40 net/ipv6/datagram.c:91
[<ffffffff82aa0978>] __ip6_datagram_connect net/ipv6/datagram.c:251 [inline]
[<ffffffff82aa0978>] ip6_datagram_connect+0x518/0xe50 net/ipv6/datagram.c:272
[<ffffffff82aa1313>] ip6_datagram_connect_v6_only+0x63/0x90 net/ipv6/datagram.c:284
[<ffffffff8292f790>] inet_dgram_connect+0x170/0x1f0 net/ipv4/af_inet.c:564
[<ffffffff82565547>] SYSC_connect+0x1a7/0x2f0 net/socket.c:1582
[<ffffffff8256a649>] SyS_connect+0x29/0x30 net/socket.c:1563
[<ffffffff82c72032>] entry_SYSCALL_64_fastpath+0x12/0x17
Object at ffff8800bc699380, in cache ip6_dst_cache size: 384

The root cause of it is that in fib6_add_rt2node(), when it replaces an
existing route with the new one, it does not update fn->rr_ptr.
This commit resets fn->rr_ptr to NULL when it points to a route which is
replaced in fib6_add_rt2node().

Fixes: 27596472473a ("ipv6: fix ECMP route replacement")
Signed-off-by: Wei Wang <wei...@google.com>
Acked-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/ipv6/ip6_fib.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv6/ip6_fib.c
+++ b/net/ipv6/ip6_fib.c
@@ -912,6 +912,8 @@ add:
}
nsiblings = iter->rt6i_nsiblings;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);

if (nsiblings) {
@@ -924,6 +926,8 @@ add:
if (rt6_qualify_for_ecmp(iter)) {
*ins = iter->dst.rt6_next;
fib6_purge_rt(iter, fn, info->nl_net);
+ if (fn->rr_ptr == iter)
+ fn->rr_ptr = NULL;
rt6_release(iter);
nsiblings--;
} else {

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <ncar...@google.com>


[ Upstream commit cdbeb633ca71a02b7b63bfeb94994bf4e1a0b894 ]

In some situations tcp_send_loss_probe() can realize that it's unable
to send a loss probe (TLP), and falls back to calling tcp_rearm_rto()
to schedule an RTO timer. In such cases, sometimes tcp_rearm_rto()
realizes that the RTO was eligible to fire immediately or at some
point in the past (delta_us <= 0). Previously in such cases
tcp_rearm_rto() was scheduling such "overdue" RTOs to happen at now +
icsk_rto, which caused needless delays of hundreds of milliseconds
(and non-linear behavior that made reproducible testing
difficult). This commit changes the logic to schedule "overdue" RTOs
ASAP, rather than at now + icsk_rto.

Fixes: 6ba8a3b19e76 ("tcp: Tail loss probe (TLP)")
Suggested-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Neal Cardwell <ncar...@google.com>
Signed-off-by: Yuchung Cheng <ych...@google.com>
Signed-off-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/ipv4/tcp_input.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -3007,8 +3007,7 @@ void tcp_rearm_rto(struct sock *sk)
/* delta may not be positive if the socket is locked
* when the retrans timer fires and is rescheduled.
*/
- if (delta > 0)
- rto = delta;
+ rto = max(delta, 1);
}
inet_csk_reset_xmit_timer(sk, ICSK_TIME_RETRANS, rto,
TCP_RTO_MAX);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit ff244c6b29b176f3f448bc75e55df297225e1b3a ]

syzkaller reported a double free [1], caused by the fact
that tun driver was not updated properly when priv_destructor
was added.

When/if register_netdevice() fails, priv_destructor() must have been
called already.

[1]
BUG: KASAN: double-free or invalid-free in selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023

CPU: 0 PID: 2919 Comm: syzkaller227220 Not tainted 4.13.0-rc4+ #23
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x194/0x257 lib/dump_stack.c:52
print_address_description+0x7f/0x260 mm/kasan/report.c:252
kasan_report_double_free+0x55/0x80 mm/kasan/report.c:333
kasan_slab_free+0xa0/0xc0 mm/kasan/kasan.c:514
__cache_free mm/slab.c:3503 [inline]
kfree+0xd3/0x260 mm/slab.c:3820
selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
security_tun_dev_free_security+0x48/0x80 security/security.c:1512
tun_set_iff drivers/net/tun.c:1884 [inline]
__tun_chr_ioctl+0x2ce6/0x3d50 drivers/net/tun.c:2064
tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe
RIP: 0033:0x443ff9
RSP: 002b:00007ffc34271f68 EFLAGS: 00000217 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 00000000004002e0 RCX: 0000000000443ff9
RDX: 0000000020533000 RSI: 00000000400454ca RDI: 0000000000000003
RBP: 0000000000000086 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000217 R12: 0000000000401ce0
R13: 0000000000401d70 R14: 0000000000000000 R15: 0000000000000000

Allocated by task 2919:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_kmalloc+0xaa/0xd0 mm/kasan/kasan.c:551
kmem_cache_alloc_trace+0x101/0x6f0 mm/slab.c:3627
kmalloc include/linux/slab.h:493 [inline]
kzalloc include/linux/slab.h:666 [inline]
selinux_tun_dev_alloc_security+0x49/0x170 security/selinux/hooks.c:5012
security_tun_dev_alloc_security+0x6d/0xa0 security/security.c:1506
tun_set_iff drivers/net/tun.c:1839 [inline]
__tun_chr_ioctl+0x1730/0x3d50 drivers/net/tun.c:2064
tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe

Freed by task 2919:
save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:59
save_stack+0x43/0xd0 mm/kasan/kasan.c:447
set_track mm/kasan/kasan.c:459 [inline]
kasan_slab_free+0x6e/0xc0 mm/kasan/kasan.c:524
__cache_free mm/slab.c:3503 [inline]
kfree+0xd3/0x260 mm/slab.c:3820
selinux_tun_dev_free_security+0x15/0x20 security/selinux/hooks.c:5023
security_tun_dev_free_security+0x48/0x80 security/security.c:1512
tun_free_netdev+0x13b/0x1b0 drivers/net/tun.c:1563
register_netdevice+0x8d0/0xee0 net/core/dev.c:7605
tun_set_iff drivers/net/tun.c:1859 [inline]
__tun_chr_ioctl+0x1caf/0x3d50 drivers/net/tun.c:2064
tun_chr_ioctl+0x2a/0x40 drivers/net/tun.c:2309
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe

The buggy address belongs to the object at ffff8801d2843b40
which belongs to the cache kmalloc-32 of size 32
The buggy address is located 0 bytes inside of
32-byte region [ffff8801d2843b40, ffff8801d2843b60)
The buggy address belongs to the page:
page:ffffea000660cea8 count:1 mapcount:0 mapping:ffff8801d2843000 index:0xffff8801d2843fc1
flags: 0x200000000000100(slab)
raw: 0200000000000100 ffff8801d2843000 ffff8801d2843fc1 000000010000003f
raw: ffffea0006626a40 ffffea00066141a0 ffff8801dbc00100
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff8801d2843a00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
ffff8801d2843a80: 00 00 00 fc fc fc fc fc fb fb fb fb fc fc fc fc
>ffff8801d2843b00: 00 00 00 00 fc fc fc fc fb fb fb fb fc fc fc fc
^
ffff8801d2843b80: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc
ffff8801d2843c00: fb fb fb fb fc fc fc fc fb fb fb fb fc fc fc fc

==================================================================

Fixes: cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
Signed-off-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
drivers/net/tun.c | 3 +++
1 file changed, 3 insertions(+)

--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1876,6 +1876,9 @@ static int tun_set_iff(struct net *net,

err_detach:
tun_detach_all(dev);
+ /* register_netdevice() already called tun_free_netdev() */
+ goto err_free_dev;
+
err_free_flow:
tun_flow_uninit(tun);
security_tun_dev_free_security(tun->security);

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Andreas Born <futur...@googlemail.com>


[ Upstream commit ad729bc9acfb7c47112964b4877ef5404578ed13 ]

The patch c4adfc822bf5 ("bonding: make speed, duplex setting consistent
with link state") puts the link state to down if
bond_update_speed_duplex() cannot retrieve speed and duplex settings.
Assumably the patch was written with 802.3ad mode in mind which relies
on link speed/duplex settings. For other modes like active-backup these
settings are not required. Thus, only for these other modes, this patch
reintroduces support for slaves that do not support reporting speed or
duplex such as wireless devices. This fixes the regression reported in
bug 196547 (https://bugzilla.kernel.org/show_bug.cgi?id=196547).

Fixes: c4adfc822bf5 ("bonding: make speed, duplex setting consistent
with link state")
Signed-off-by: Andreas Born <futur...@googlemail.com>
Acked-by: Mahesh Bandewar <mah...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
drivers/net/bonding/bond_main.c | 6 ++++--
include/net/bonding.h | 5 +++++
2 files changed, 9 insertions(+), 2 deletions(-)

--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -1569,7 +1569,8 @@ int bond_enslave(struct net_device *bond
new_slave->delay = 0;
new_slave->link_failure_count = 0;

- if (bond_update_speed_duplex(new_slave))
+ if (bond_update_speed_duplex(new_slave) &&
+ bond_needs_speed_duplex(bond))
new_slave->link = BOND_LINK_DOWN;

new_slave->last_rx = jiffies -
@@ -2137,7 +2138,8 @@ static void bond_miimon_commit(struct bo
continue;

case BOND_LINK_UP:
- if (bond_update_speed_duplex(slave)) {
+ if (bond_update_speed_duplex(slave) &&
+ bond_needs_speed_duplex(bond)) {
slave->link = BOND_LINK_DOWN;
netdev_warn(bond->dev,
"failed to get link speed/duplex for %s\n",
--- a/include/net/bonding.h
+++ b/include/net/bonding.h
@@ -277,6 +277,11 @@ static inline bool bond_is_lb(const stru
BOND_MODE(bond) == BOND_MODE_ALB;
}

+static inline bool bond_needs_speed_duplex(const struct bonding *bond)
+{
+ return BOND_MODE(bond) == BOND_MODE_8023AD || bond_is_lb(bond);
+}
+
static inline bool bond_is_nondyn_tlb(const struct bonding *bond)
{
return (BOND_MODE(bond) == BOND_MODE_TLB) &&

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <edum...@google.com>


[ Upstream commit c780a049f9bf442314335372c9abc4548bfe3e44 ]

While working on yet another syzkaller report, I found
that our IP_MAX_MTU enforcements were not properly done.

gcc seems to reload dev->mtu for min(dev->mtu, IP_MAX_MTU), and
final result can be bigger than IP_MAX_MTU :/

This is a problem because device mtu can be changed on other cpus or
threads.

While this patch does not fix the issue I am working on, it is
probably worth addressing it.

Signed-off-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
include/net/ip.h | 4 ++--
net/ipv4/route.c | 2 +-
2 files changed, 3 insertions(+), 3 deletions(-)

--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -352,7 +352,7 @@ static inline unsigned int ip_dst_mtu_ma
!forwarding)
return dst_mtu(dst);

- return min(dst->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(dst->dev->mtu), IP_MAX_MTU);
}

static inline unsigned int ip_skb_dst_mtu(struct sock *sk,
@@ -364,7 +364,7 @@ static inline unsigned int ip_skb_dst_mt
return ip_dst_mtu_maybe_forward(skb_dst(skb), forwarding);
}

- return min(skb_dst(skb)->dev->mtu, IP_MAX_MTU);
+ return min(READ_ONCE(skb_dst(skb)->dev->mtu), IP_MAX_MTU);
}

u32 ip_idents_reserve(u32 hash, int segs);
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -1268,7 +1268,7 @@ static unsigned int ipv4_mtu(const struc
if (mtu)
return mtu;

- mtu = dst->dev->mtu;
+ mtu = READ_ONCE(dst->dev->mtu);

if (unlikely(dst_metric_locked(dst, RTAX_MTU))) {
if (rt->rt_uses_gateway && mtu > 576)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:08 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Liping Zhang <zlpn...@gmail.com>


[ Upstream commit 494bea39f3201776cdfddc232705f54a0bd210c4 ]

For sw_flow_actions, the actions_len only represents the kernel part's
size, and when we dump the actions to the userspace, we will do the
convertions, so it's true size may become bigger than the actions_len.

But unfortunately, for OVS_PACKET_ATTR_ACTIONS, we use the actions_len
to alloc the skbuff, so the user_skb's size may become insufficient and
oops will happen like this:
skbuff: skb_over_panic: text:ffffffff8148fabf len:1749 put:157 head:
ffff881300f39000 data:ffff881300f39000 tail:0x6d5 end:0x6c0 dev:<NULL>
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:129!
[...]
Call Trace:
<IRQ>
[<ffffffff8148be82>] skb_put+0x43/0x44
[<ffffffff8148fabf>] skb_zerocopy+0x6c/0x1f4
[<ffffffffa0290d36>] queue_userspace_packet+0x3a3/0x448 [openvswitch]
[<ffffffffa0292023>] ovs_dp_upcall+0x30/0x5c [openvswitch]
[<ffffffffa028d435>] output_userspace+0x132/0x158 [openvswitch]
[<ffffffffa01e6890>] ? ip6_rcv_finish+0x74/0x77 [ipv6]
[<ffffffffa028e277>] do_execute_actions+0xcc1/0xdc8 [openvswitch]
[<ffffffffa028e3f2>] ovs_execute_actions+0x74/0x106 [openvswitch]
[<ffffffffa0292130>] ovs_dp_process_packet+0xe1/0xfd [openvswitch]
[<ffffffffa0292b77>] ? key_extract+0x63c/0x8d5 [openvswitch]
[<ffffffffa029848b>] ovs_vport_receive+0xa1/0xc3 [openvswitch]
[...]

Also we can find that the actions_len is much little than the orig_len:
crash> struct sw_flow_actions 0xffff8812f539d000
struct sw_flow_actions {
rcu = {
next = 0xffff8812f5398800,
func = 0xffffe3b00035db32
},
orig_len = 1384,
actions_len = 592,
actions = 0xffff8812f539d01c
}

So as a quick fix, use the orig_len instead of the actions_len to alloc
the user_skb.

Last, this oops happened on our system running a relative old kernel, but
the same risk still exists on the mainline, since we use the wrong
actions_len from the beginning.

Fixes: ccea74457bbd ("openvswitch: include datapath actions with sampled-packet upcall to userspace")
Cc: Neil McKee <neil....@inmon.com>
Signed-off-by: Liping Zhang <zlpn...@gmail.com>
Acked-by: Pravin B Shelar <psh...@ovn.org>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
net/openvswitch/actions.c | 1 +
net/openvswitch/datapath.c | 7 ++++---
net/openvswitch/datapath.h | 2 ++
3 files changed, 7 insertions(+), 3 deletions(-)

--- a/net/openvswitch/actions.c
+++ b/net/openvswitch/actions.c
@@ -1337,6 +1337,7 @@ int ovs_execute_actions(struct datapath
goto out;
}

+ OVS_CB(skb)->acts_origlen = acts->orig_len;
err = do_execute_actions(dp, skb, key,
acts->actions, acts->actions_len);

--- a/net/openvswitch/datapath.c
+++ b/net/openvswitch/datapath.c
@@ -381,7 +381,7 @@ static int queue_gso_packets(struct data
}

static size_t upcall_msg_size(const struct dp_upcall_info *upcall_info,
- unsigned int hdrlen)
+ unsigned int hdrlen, int actions_attrlen)
{
size_t size = NLMSG_ALIGN(sizeof(struct ovs_header))
+ nla_total_size(hdrlen) /* OVS_PACKET_ATTR_PACKET */
@@ -398,7 +398,7 @@ static size_t upcall_msg_size(const stru

/* OVS_PACKET_ATTR_ACTIONS */
if (upcall_info->actions_len)
- size += nla_total_size(upcall_info->actions_len);
+ size += nla_total_size(actions_attrlen);

/* OVS_PACKET_ATTR_MRU */
if (upcall_info->mru)
@@ -465,7 +465,8 @@ static int queue_userspace_packet(struct
else
hlen = skb->len;

- len = upcall_msg_size(upcall_info, hlen - cutlen);
+ len = upcall_msg_size(upcall_info, hlen - cutlen,
+ OVS_CB(skb)->acts_origlen);
user_skb = genlmsg_new(len, GFP_ATOMIC);
if (!user_skb) {
err = -ENOMEM;
--- a/net/openvswitch/datapath.h
+++ b/net/openvswitch/datapath.h
@@ -98,12 +98,14 @@ struct datapath {
* @input_vport: The original vport packet came in on. This value is cached
* when a packet is received by OVS.
* @mru: The maximum received fragement size; 0 if the packet is not
+ * @acts_origlen: The netlink size of the flow actions applied to this skb.
* @cutlen: The number of bytes from the packet end to be removed.
* fragmented.
*/
struct ovs_skb_cb {
struct vport *input_vport;
u16 mru;
+ u16 acts_origlen;
u32 cutlen;
};
#define OVS_CB(skb) ((struct ovs_skb_cb *)(skb)->cb)

Greg Kroah-Hartman

unread,
Aug 28, 2017, 5:30:15 AM8/28/17
to
4.12-stable review patch. If anyone has any objections, please let me know.

------------------

From: Edward Cree <ec...@solarflare.com>


[ Upstream commit 9305706c2e808ae59f1eb201867f82f1ddf6d7a6 ]

We have to subtract the src max from the dst min, and vice-versa, since
(e.g.) the smallest result comes from the largest subtrahend.

Fixes: 484611357c19 ("bpf: allow access into map value arrays")
Signed-off-by: Edward Cree <ec...@solarflare.com>
Acked-by: Daniel Borkmann <dan...@iogearbox.net>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
---
kernel/bpf/verifier.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -1858,10 +1858,12 @@ static void adjust_reg_min_max_vals(stru
* do our normal operations to the register, we need to set the values
* to the min/max since they are undefined.
*/
- if (min_val == BPF_REGISTER_MIN_RANGE)
- dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
- if (max_val == BPF_REGISTER_MAX_RANGE)
- dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ if (opcode != BPF_SUB) {
+ if (min_val == BPF_REGISTER_MIN_RANGE)
+ dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
+ if (max_val == BPF_REGISTER_MAX_RANGE)
+ dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ }

switch (opcode) {
case BPF_ADD:
@@ -1872,10 +1874,17 @@ static void adjust_reg_min_max_vals(stru
dst_reg->min_align = min(src_align, dst_align);
break;
case BPF_SUB:
+ /* If one of our values was at the end of our ranges, then the
+ * _opposite_ value in the dst_reg goes to the end of our range.
+ */
+ if (min_val == BPF_REGISTER_MIN_RANGE)
+ dst_reg->max_value = BPF_REGISTER_MAX_RANGE;
+ if (max_val == BPF_REGISTER_MAX_RANGE)
+ dst_reg->min_value = BPF_REGISTER_MIN_RANGE;
if (dst_reg->min_value != BPF_REGISTER_MIN_RANGE)
- dst_reg->min_value -= min_val;
+ dst_reg->min_value -= max_val;
if (dst_reg->max_value != BPF_REGISTER_MAX_RANGE)
- dst_reg->max_value -= max_val;
+ dst_reg->max_value -= min_val;
dst_reg->min_align = min(src_align, dst_align);
break;
case BPF_MUL:

Shuah Khan

unread,
Aug 28, 2017, 3:50:09 PM8/28/17
to
On 08/28/2017 02:03 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.12.10 release.
> There are 99 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 30 08:04:17 UTC 2017.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.12.10-rc1.gz
> or in the git tree and branch at:
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git linux-4.12.y
> and the diffstat can be found below.
>
> thanks,
>
> greg k-h
>

Compiled and booted on my test system. No dmesg regressions.

thanks,
-- Shuah

Guenter Roeck

unread,
Aug 28, 2017, 8:20:08 PM8/28/17
to
On 08/28/2017 01:03 AM, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.12.10 release.
> There are 99 patches in this series, all will be posted as a response
> to this one. If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed Aug 30 08:04:17 UTC 2017.
> Anything received after that time might be too late.
>


Build results:
total: 145 pass: 145 fail: 0
Qemu test results:
total: 122 pass: 122 fail: 0

Details are available at http://kerneltests.org/builders.

Guenter

Greg Kroah-Hartman

unread,
Aug 29, 2017, 1:00:05 AM8/29/17
to
Thanks for testing all of these and letting me know.

greg k-h

Greg Kroah-Hartman

unread,
Aug 29, 2017, 1:00:06 AM8/29/17
to
Great, thanks for testing all of these and letting me know.

greg k-h
0 new messages