Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

[PATCH 3.4 000/177] 3.4.106-rc1 review

61 views
Skip to first unread message

li...@kernel.org

unread,
Jan 27, 2015, 11:10:35 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, li...@roeck-us.net, satoru....@gmail.com, Zefan Li
From: Zefan Li <liz...@huawei.com>

This is the start of the stable review cycle for the 3.4.106 release.
There are 177 patches in this series, all will be posted as a response
to this one. If anyone has any issues with these being applied, please
let me know.

Responses should be made by Fri Jan 30 03:58:34 UTC 2015.
Anything received after that time might be too late.

A combined patch relative to 3.4.105 will be posted as an additional
response to this. A shortlog and diffstat can be found below.

thanks,

Zefan Li

--------------------

Aaro Koskinen (1):
MIPS: oprofile: Fix backtrace on 64-bit kernel

Adel Gadllah (2):
USB: quirks: enable device-qualifier quirk for another Elan
touchscreen
USB: quirks: enable device-qualifier quirk for yet another Elan
touchscreen

Al Viro (1):
fix misuses of f_count() in ppp and netlink

Alan Stern (1):
usb-storage: handle a skipped data phase

Alex Deucher (2):
drm/radeon: remove invalid pci id
drm/radeon: add missing crtc unlock when setting up the MC

Alexey Khoroshilov (2):
dm log userspace: fix memory leak in dm_ulog_tfr_init failure path
can: esd_usb2: fix memory leak on disconnect

Anatol Pomozov (1):
ALSA: pcm: use the same dma mmap codepath both for arm and arm64

Andreas Bomholtz (1):
USB: cp210x: add support for Seluxit USB dongle

Andreas Noever (1):
PCI: pciehp: Prevent NULL dereference during probe

Andy Adamson (1):
NFSv4.1: Fix an NFSv4.1 state renewal regression

Andy Honig (2):
KVM: x86: Prevent host from panicking on shared MSR writes.
KVM: x86: Improve thread safety in pit

Andy Lutomirski (8):
x86,kvm,vmx: Preserve CR4 across VM entry
x86, apic: Handle a bad TSC more gracefully
x86_64, traps: Stop using IST for #SS
x86_64, traps: Fix the espfix64 #DF fixup and rewrite it in C
x86_64, traps: Rework bad_iret
x86/asm/traps: Disable tracing and kprobes in fixup_bad_iret and
sync_regs
x86, kvm: Clear paravirt_enabled on KVM guests for espfix32's benefit
x86/tls: Validate TLS entries to protect espfix

Andy Shevchenko (2):
spi: dw-mid: respect 8 bit mode
spi: dw-mid: terminate ongoing transfers at exit

Anton Blanchard (1):
powerpc: do_notify_resume can be called with bad thread_info flags
argument

Artem Bityutskiy (3):
UBIFS: remove mst_mutex
UBIFS: fix a race condition
UBIFS: fix free log space calculation

Bart Van Assche (1):
srp-target: Retry when QP creation fails with ENOMEM

Ben Hutchings (1):
x86: Reject x32 executables if x32 ABI not supported

Benjamin Coddington (1):
lockd: Try to reconnect if statd has moved

Benjamin Herrenschmidt (1):
of/base: Fix PowerPC address parsing hack

Borislav Petkov (1):
mpc85xx_edac: Make L2 interrupt shared too

Brian Silverman (1):
futex: Fix a race condition between REQUEUE_PI and task death

Bryan O'Donoghue (2):
x86/intel/quark: Switch off CR4.PGE so TLB flush uses CR3 instead
serial: 8250: Add Quark X1000 to 8250_pci.c

Catalin Marinas (1):
futex: Ensure get_futex_key_refs() always implies a barrier

Cesar Eduardo Barros (1):
crypto: more robust crypto_memneq

Champion Chen (1):
Bluetooth: Fix issue with USB suspend in btusb driver

Chao Yu (1):
ecryptfs: avoid to access NULL pointer when write metadata in xattr

Chris Mason (1):
Btrfs: fix kfree on list_head in btrfs_lookup_csums_range error
cleanup

Christian Borntraeger (1):
KVM: s390: unintended fallthrough for external call

Christoph Hellwig (1):
scsi: only re-lock door after EH on devices that were reset

Cong Wang (1):
freezer: Do not freeze tasks killed by OOM killer

Cyril Brulebois (1):
wireless: rt2x00: add new rt2800usb device

Dan Williams (1):
USB: option: add Haier CE81B CDMA modem

Daniel Borkmann (1):
random: add and use memzero_explicit() for clearing data

Daniele Palmas (1):
usb: option: add support for Telit LE910

Darrick J. Wong (1):
ext4: check EA value offset when loading

Dave Hansen (1):
x86: Require exact match for 'noxsave' command line option

David Daney (1):
MIPS: tlbex: Properly fix HUGE TLB Refill exception handler

David Matlack (2):
kvm: x86: fix stale mmio cache bug
kvm: don't take vcpu mutex for obviously invalid vcpu ioctls

David Rientjes (1):
mm, thp: fix collapsing of hugepages on madvise

Dirk Brandewie (1):
cpufreq: expose scaling_cur_freq sysfs file for set_policy() drivers

Dmitry Kasatkin (1):
evm: check xattr value length and type in evm_inode_setxattr()

Dmitry Torokhov (1):
Input: synaptics - gate forcepad support by DMI check

Douglas Lehr (1):
PCI: Increase IBM ipr SAS Crocodile BARs to at least system page size

Eric Sandeen (1):
ext4: fix reservation overflow in ext4_da_write_begin

Eric W. Biederman (1):
mnt: Prevent pivot_root from creating a loop in the mount tree

Fabio Estevam (1):
ASoC: sgtl5000: Fix SMALL_POP bit definition

Felipe Balbi (1):
usb: dwc3: gadget: fix set_halt() bug with pending transfers

Frans Klaver (1):
usb: serial: ftdi_sio: add Awinda Station and Dongle products

Geert Uytterhoeven (1):
m68k: Disable/restore interrupts in hwreg_present()/hwreg_write()

Grant Likely (1):
of: Fix overflow bug in string property parsing functions

Guenter Roeck (1):
Revert "percpu: free percpu allocation info for uniprocessor system"

Hans de Goede (5):
Input: i8042 - add noloop quirk for Asus X750LN
Input: i8042 - quirks for Fujitsu Lifebook A544 and Lifebook AH544
samsung-laptop: Add broken-acpi-video quirk for NC210/NC110
acer-wmi: Add acpi_backlight=video quirk for the Acer KAV80
usb: Do not allow usb_alloc_streams on unconfigured devices

Heinz Mauelshagen (1):
dm raid: ensure superblock's size matches device's logical block size

Herbert Xu (1):
macvtap: Fix csum_start when VLAN tags are present

Huacai Chen (1):
MIPS: tlbex: Fix a missing statement for HUGETLB

Ilya Dryomov (1):
libceph: do not crash on large auth tickets

Imre Deak (2):
PM / Sleep: fix recovery during resuming from hibernation
tty/vt: don't set font mappings on vc not supporting this

J. Bruce Fields (1):
nfsd4: fix crash on unknown operation number

Jack Pham (1):
usb: dwc3: gadget: Properly initialize LINK TRB

James Ralston (1):
ahci: Add Device IDs for Intel Sunrise Point PCH

Jan Kara (11):
ext4: don't check quota format when there are no quota files
vfs: fix data corruption when blocksize < pagesize for mmaped data
ext3: Don't check quota format when there are no quota files
scsi: Fix error handling in SCSI_IOCTL_SEND_COMMAND
lib/bitmap.c: fix undefined shift in __bitmap_shift_{left|right}()
ext4: fix overflow when updating superblock backups after resize
ext4: fix oops when loading block bitmap failed
ext4: bail out from make_indexed_dir() on first error
block: Fix computation of merged request priority
nfs: Fix use of uninitialized variable in nfs_getattr()
mm: Remove false WARN_ON from pagecache_isize_extended()

Joe Savage (1):
USB: serial: cp210x: added Ketra N1 wireless interface support

Joe Thornber (1):
dm bufio: update last_accessed when relinking a buffer

Johan Hedberg (1):
Bluetooth: Fix setting correct security level when initiating SMP

Johan Hovold (5):
USB: kobil_sct: fix non-atomic allocation in write path
USB: opticon: fix non-atomic allocation in write path
USB: cdc-acm: add device id for GW Instek AFG-2225
USB: core: add device-qualifier quirk
USB: cdc-acm: only raise DTR on transitions from B0

Johannes Berg (2):
mac80211: properly flush delayed scan work on interface removal
mac80211: fix use-after-free in defragmentation

Jonathan Cameron (1):
staging:iio:impedance-analyzer:ad5933 unwind use of IIO_CHAN macro.

K. Y. Srinivasan (5):
Drivers: hv: vmbus: Cleanup vmbus_post_msg()
Drivers: hv: vmbus: Cleanup vmbus_teardown_gpadl()
Drivers: hv: vmbus: Cleanup vmbus_establish_gpadl()
Drivers: hv: vmbus: Fix a bug in vmbus_open()
Drivers: hv: vmbus: Cleanup vmbus_close_internal()

Kees Cook (1):
firmware_class: make sure fw requests contain a name

Krzysztof Kozlowski (1):
power: charger-manager: Fix NULL pointer exception with missing
cm-fuel-gauge

Kuninori Morimoto (1):
ASoC: fsi: remove unsupported PAUSE flag

Lars-Peter Clausen (2):
staging:iio:ad5933: Drop "raw" from channel names
staging:iio:ade7758: Fix check if channels are enabled in prenable

Lu Baolu (1):
USB: Add device quirk for ASUS T100 Base Station keyboard

Mathias Krause (1):
posix-timers: Fix stack info leak in timer_create()

Max Filippov (1):
xtensa: re-wire umount syscall to sys_oldumount

Michael S. Tsirkin (2):
virtio_pci: fix virtio spec compliance on restore
kvm: x86: don't kill guest on unknown exit reason

Michal Hocko (1):
OOM, PM: OOM killed task shouldn't escape PM suspend

Mike Snitzer (1):
block: fix alignment_offset math that assumes io_min is a power-of-2

Miklos Szeredi (1):
audit: keep inode pinned

Mikulas Patocka (3):
framebuffer: fix border color
fs: make cont_expand_zero interruptible
dm bufio: change __GFP_IO to __GFP_FS in shrinker callbacks

Nadav Amit (5):
KVM: x86: Check non-canonical addresses upon WRMSR
KVM: x86: Fix wrong masking on relative jump/call
KVM: x86: Emulator fixes for eip canonical checks on near branches
KVM: x86: Handle errors when RIP is set during far jumps
KVM: x86: Fix uninitialized op->type for some immediate values

Nadav Har'El (1):
nEPT: Nested INVEPT

Nathaniel Ting (1):
USB: serial: cp210x: add Silicon Labs 358x VID and PID

Oleg Nesterov (2):
kernel/fork.c:copy_process(): unify
CLONE_THREAD-or-thread_group_leader code
introduce for_each_thread() to replace the buggy while_each_thread()

Oliver Neukum (1):
xhci: no switching back on non-ULT Haswell

Ondrej Zary (1):
libata-sff: Fix controllers with no ctl port

Pali Rohár (2):
Input: alps - ignore potential bare packets when device is out of sync
Input: alps - allow up to 2 invalid packets without resetting device

Paolo Bonzini (1):
KVM: x86: use new CS.RPL as CPL during task switch

Patrick Riphagen (2):
USB: serial: ftdi_sio: Annotate the current Xsens PID assignments
USB: serial: ftdi_sio: Add support for new Xsens devices

Perry Hung (1):
usb: serial: ftdi_sio: add "bricked" FTDI device PID

Peter Hurley (3):
serial: Fix divide-by-zero fault in uart_get_divisor()
tty: Fix high cpu load if tty is unreleaseable
tty: Prevent "read/write wait queue active!" log flooding

Petr Matousek (1):
kvm: vmx: handle invvpid vm exit gracefully

Quentin Casasnovas (1):
kvm: fix excessive pages un-pinning in kvm_iommu_map error path.

Quinn Tran (1):
target: Fix queue full status NULL pointer for
SCF_TRANSPORT_TASK_SENSE

Rabin Vincent (1):
tracing/syscalls: Ignore numbers outside NR_syscalls' range

Ray Jui (1):
spi: pl022: Fix incorrect dma_unmap_sg

Ricardo Ribalda Delgado (1):
PCI: Generate uppercase hex for modalias interface class

Sasha Levin (1):
kernel: add support for gcc 5

Scott Carter (1):
pata_serverworks: disable 64-KB DMA transfers on Broadcom OSB4 IDE
Controller

Sinclair Yeh (1):
drm/vmwgfx: Filter out modes those cannot be supported by the current
VRAM size.

Stanislaw Gruszka (2):
rt2800: correct BBP1_TX_POWER_CTRL mask
rt2x00: do not align payload on modern H/W

Stefan Richter (1):
firewire: cdev: prevent kernel stack leaking into ioctl arguments

Stephen Smalley (1):
selinux: fix inode security list corruption

Takashi Iwai (3):
ALSA: emu10k1: Fix deadlock in synth voice lookup
ALSA: pcm: Zero-clear reserved fields of PCM status ioctl in compat
mode
ALSA: usb-audio: Fix device_del() sysfs warnings at disconnect

Tetsuo Handa (1):
fs: Fix theoretical division by 0 in super_cache_scan().

Theodore Ts'o (2):
ext4: don't orphan or truncate the boot loader inode
ext4: add ext4_iget_normal() which is to be used for dir tree lookups

Thomas Körper (1):
can: dev: avoid calling kfree_skb() from interrupt context

Trond Myklebust (2):
NFSv4: fix open/lock state recovery error handling
NFSv4: Ensure that we remove NFSv4.0 delegations when state has
expired

Vlad Catoi (1):
ALSA: usb-audio: Add support for Steinberg UR22 USB interface

Wang Nan (1):
cgroup/kmemleak: add kmemleak_free() for cgroup deallocations.

Will Deacon (2):
zap_pte_range: update addr when forcing flush after TLB batching
faiure
tracing/syscalls: Fix perf syscall tracing when syscall_nr == -1

Willy Tarreau (3):
Documentation: lzo: document part of the encoding
Revert "lzo: properly check for overruns"
lzo: check for length overrun in variable length encoding.

Xiubo Li (2):
regmap: debugfs: fix possbile NULL pointer dereference
regmap: fix possible ZERO_SIZE_PTR pointer dereferencing error.

Yann Droneaud (1):
fanotify: enable close-on-exec on events' fd when requested in
fanotify_init()

Yijing Wang (1):
sysfs: driver core: Fix glue dir race condition by gdp_mutex

Documentation/lzo.txt | 164 +++++++++++++++
arch/m68k/mm/hwtest.c | 6 +
arch/mips/mm/tlbex.c | 5 +
arch/mips/oprofile/backtrace.c | 2 +-
arch/powerpc/kernel/entry_64.S | 6 +
arch/s390/kvm/interrupt.c | 1 +
arch/x86/include/asm/elf.h | 5 +-
arch/x86/include/asm/kvm_host.h | 17 +-
arch/x86/include/asm/page_32_types.h | 1 -
arch/x86/include/asm/page_64_types.h | 11 +-
arch/x86/include/asm/vmx.h | 2 +
arch/x86/kernel/apic/apic.c | 4 +-
arch/x86/kernel/cpu/common.c | 2 +
arch/x86/kernel/cpu/intel.c | 15 ++
arch/x86/kernel/dumpstack_64.c | 1 -
arch/x86/kernel/entry_64.S | 84 ++------
arch/x86/kernel/kvm.c | 9 +-
arch/x86/kernel/kvmclock.c | 1 -
arch/x86/kernel/tls.c | 23 +++
arch/x86/kernel/traps.c | 69 +++++--
arch/x86/kernel/tsc.c | 5 +-
arch/x86/kvm/emulate.c | 257 +++++++++++++++++-------
arch/x86/kvm/i8254.c | 2 +
arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/svm.c | 8 +-
arch/x86/kvm/vmx.c | 51 ++++-
arch/x86/kvm/x86.c | 38 +++-
arch/x86/kvm/x86.h | 20 +-
arch/xtensa/include/asm/unistd.h | 3 +-
block/blk-settings.c | 4 +-
block/scsi_ioctl.c | 3 +-
drivers/ata/ahci.c | 5 +
drivers/ata/libata-sff.c | 20 +-
drivers/ata/pata_serverworks.c | 13 +-
drivers/base/core.c | 4 +-
drivers/base/firmware_class.c | 3 +
drivers/base/regmap/regmap-debugfs.c | 7 +-
drivers/base/regmap/regmap.c | 5 +
drivers/bluetooth/btusb.c | 9 +
drivers/char/random.c | 10 +-
drivers/cpufreq/cpufreq.c | 23 ++-
drivers/edac/mpc85xx_edac.c | 3 +-
drivers/firewire/core-cdev.c | 3 +-
drivers/gpu/drm/radeon/evergreen.c | 1 +
drivers/gpu/drm/vmwgfx/vmwgfx_kms.c | 16 +-
drivers/hv/channel.c | 47 +++--
drivers/hv/connection.c | 17 +-
drivers/infiniband/ulp/srpt/ib_srpt.c | 8 +
drivers/input/mouse/alps.c | 11 +-
drivers/input/mouse/synaptics.c | 22 +-
drivers/input/mouse/synaptics.h | 8 +-
drivers/input/serio/i8042-x86ia64io.h | 22 ++
drivers/md/dm-bufio.c | 11 +-
drivers/md/dm-log-userspace-transfer.c | 2 +-
drivers/md/dm-raid.c | 11 +-
drivers/net/can/dev.c | 2 +-
drivers/net/can/usb/esd_usb2.c | 1 +
drivers/net/macvtap.c | 2 +
drivers/net/ppp/ppp_generic.c | 2 +-
drivers/net/wireless/rt2x00/rt2800.h | 2 +-
drivers/net/wireless/rt2x00/rt2800usb.c | 1 +
drivers/net/wireless/rt2x00/rt2x00queue.c | 50 ++---
drivers/of/address.c | 19 +-
drivers/of/base.c | 88 ++------
drivers/of/selftest.c | 64 +++++-
drivers/pci/hotplug/pciehp_core.c | 7 +
drivers/pci/pci-sysfs.c | 2 +-
drivers/pci/quirks.c | 20 ++
drivers/platform/x86/acer-wmi.c | 11 +
drivers/platform/x86/samsung-laptop.c | 10 +
drivers/power/charger-manager.c | 5 +
drivers/scsi/scsi_error.c | 4 +-
drivers/spi/spi-dw-mid.c | 7 +-
drivers/spi/spi-pl022.c | 2 +-
drivers/staging/iio/impedance-analyzer/ad5933.c | 47 ++++-
drivers/staging/iio/meter/ade7758_ring.c | 2 +-
drivers/target/target_core_transport.c | 3 +-
drivers/tty/serial/8250/8250_pci.c | 21 ++
drivers/tty/serial/serial_core.c | 2 +-
drivers/tty/tty_io.c | 13 +-
drivers/tty/vt/consolemap.c | 4 +
drivers/usb/class/cdc-acm.c | 6 +-
drivers/usb/core/hcd.c | 2 +
drivers/usb/core/hub.c | 9 +-
drivers/usb/core/quirks.c | 10 +
drivers/usb/dwc3/ep0.c | 4 +-
drivers/usb/dwc3/gadget.c | 17 +-
drivers/usb/dwc3/gadget.h | 2 +-
drivers/usb/host/xhci-pci.c | 14 --
drivers/usb/serial/cp210x.c | 3 +
drivers/usb/serial/ftdi_sio.c | 5 +
drivers/usb/serial/ftdi_sio_ids.h | 23 ++-
drivers/usb/serial/kobil_sct.c | 4 +-
drivers/usb/serial/opticon.c | 2 +-
drivers/usb/serial/option.c | 10 +
drivers/usb/storage/transport.c | 26 +++
drivers/video/console/bitblit.c | 3 +-
drivers/video/console/fbcon_ccw.c | 3 +-
drivers/video/console/fbcon_cw.c | 3 +-
drivers/video/console/fbcon_ud.c | 3 +-
drivers/virtio/virtio_pci.c | 33 ++-
fs/btrfs/file-item.c | 2 +-
fs/buffer.c | 8 +
fs/ecryptfs/inode.c | 2 +-
fs/ext3/super.c | 7 -
fs/ext4/ext4.h | 1 +
fs/ext4/ialloc.c | 4 +
fs/ext4/inode.c | 32 ++-
fs/ext4/namei.c | 33 +--
fs/ext4/resize.c | 2 +-
fs/ext4/super.c | 9 +-
fs/ext4/xattr.c | 32 ++-
fs/ioprio.c | 14 +-
fs/lockd/mon.c | 6 +
fs/namespace.c | 3 +
fs/nfs/inode.c | 2 +-
fs/nfs/nfs4proc.c | 26 ++-
fs/nfs/nfs4renewd.c | 12 +-
fs/nfs/nfs4state.c | 16 +-
fs/nfsd/nfs4proc.c | 3 +-
fs/notify/fanotify/fanotify_user.c | 2 +-
fs/super.c | 2 +
fs/ubifs/commit.c | 10 +-
fs/ubifs/log.c | 19 +-
fs/ubifs/master.c | 7 +-
fs/ubifs/super.c | 1 -
fs/ubifs/ubifs.h | 2 -
include/drm/drm_pciids.h | 1 -
include/linux/blkdev.h | 5 +-
include/linux/compiler-gcc.h | 3 +
include/linux/compiler-gcc5.h | 66 ++++++
include/linux/compiler-intel.h | 7 +
include/linux/init_task.h | 2 +
include/linux/khugepaged.h | 17 +-
include/linux/mm.h | 1 +
include/linux/of.h | 84 ++++++--
include/linux/oom.h | 4 +
include/linux/sched.h | 12 ++
include/linux/string.h | 4 +-
include/linux/usb/quirks.h | 6 +
kernel/audit_tree.c | 1 +
kernel/exit.c | 1 +
kernel/fork.c | 22 +-
kernel/freezer.c | 3 +
kernel/futex.c | 24 ++-
kernel/posix-timers.c | 1 +
kernel/power/hibernate.c | 8 +-
kernel/power/process.c | 40 +++-
kernel/trace/trace_syscalls.c | 8 +-
lib/bitmap.c | 8 +-
lib/lzo/lzo1x_decompress_safe.c | 103 +++++-----
lib/string.c | 16 ++
mm/huge_memory.c | 11 +-
mm/memory.c | 4 +-
mm/mmap.c | 8 +-
mm/oom_kill.c | 17 ++
mm/page_alloc.c | 8 +
mm/page_cgroup.c | 1 +
mm/percpu.c | 2 -
mm/truncate.c | 61 +++++-
net/bluetooth/smp.c | 5 +-
net/ceph/crypto.c | 169 ++++++++++++----
net/mac80211/iface.c | 7 +-
net/mac80211/rx.c | 14 +-
security/integrity/evm/evm_main.c | 9 +-
security/selinux/hooks.c | 2 +-
sound/core/pcm_compat.c | 2 +
sound/core/pcm_native.c | 2 +-
sound/pci/emu10k1/emu10k1_callback.c | 6 +-
sound/soc/codecs/sgtl5000.c | 3 +-
sound/soc/codecs/sgtl5000.h | 2 +-
sound/soc/sh/fsi.c | 3 +-
sound/usb/card.c | 9 +-
sound/usb/quirks-table.h | 30 +++
virt/kvm/iommu.c | 8 +-
virt/kvm/kvm_main.c | 4 +
176 files changed, 2061 insertions(+), 705 deletions(-)
create mode 100644 Documentation/lzo.txt
create mode 100644 include/linux/compiler-gcc5.h

--
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

li...@kernel.org

unread,
Jan 27, 2015, 11:13:25 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Christian Borntraeger, Zefan Li
From: Christian Borntraeger <bornt...@de.ibm.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 614a80e474b227cace52fd6e3c790554db8a396e upstream.

In the early days, we had some special handling for the
KVM_EXIT_S390_SIEIC exit, but this was gone in 2009 with commit
d7b0b5eb3000 (KVM: s390: Make psw available on all exits, not
just a subset).

Now this switch statement is just a sanity check for userspace
not messing with the kvm_run structure. Unfortunately, this
allows userspace to trigger a kernel BUG. Let's just remove
this switch statement.

Signed-off-by: Christian Borntraeger <bornt...@de.ibm.com>
Reviewed-by: Cornelia Huck <cornel...@de.ibm.com>
Reviewed-by: David Hildenbrand <da...@linux.vnet.ibm.com>
[lizf: Backported to 3.4:
- adjust context
- no KVM_EXIT_S390_TSCH and KVM_EXIT_DEBUG in 3.4]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/s390/kvm/kvm-s390.c | 11 -----------
1 file changed, 11 deletions(-)

diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c
index 53c973e..0f250d1 100644
--- a/arch/s390/kvm/kvm-s390.c
+++ b/arch/s390/kvm/kvm-s390.c
@@ -566,17 +566,6 @@ rerun_vcpu:

BUG_ON(vcpu->kvm->arch.float_int.local_int[vcpu->vcpu_id] == NULL);

- switch (kvm_run->exit_reason) {
- case KVM_EXIT_S390_SIEIC:
- case KVM_EXIT_UNKNOWN:
- case KVM_EXIT_INTR:
- case KVM_EXIT_S390_RESET:
- case KVM_EXIT_S390_UCONTROL:
- break;
- default:
- BUG();
- }
-
vcpu->arch.sie_block->gpsw.mask = kvm_run->psw_mask;
vcpu->arch.sie_block->gpsw.addr = kvm_run->psw_addr;
if (kvm_run->kvm_dirty_regs & KVM_SYNC_PREFIX) {

li...@kernel.org

unread,
Jan 27, 2015, 11:13:29 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, David Matlack, Xiao Guangrong, Paolo Bonzini, Zefan Li
From: David Matlack <dmat...@google.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 56f17dd3fbc44adcdbc3340fe3988ddb833a47a7 upstream.

The following events can lead to an incorrect KVM_EXIT_MMIO bubbling
up to userspace:

(1) Guest accesses gpa X without a memory slot. The gfn is cached in
struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets
the SPTE write-execute-noread so that future accesses cause
EPT_MISCONFIGs.

(2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION
covering the page just accessed.

(3) Guest attempts to read or write to gpa X again. On Intel, this
generates an EPT_MISCONFIG. The memory slot generation number that
was incremented in (2) would normally take care of this but we fast
path mmio faults through quickly_check_mmio_pf(), which only checks
the per-vcpu mmio cache. Since we hit the cache, KVM passes a
KVM_EXIT_MMIO up to userspace.

This patch fixes the issue by using the memslot generation number
to validate the mmio cache.

Signed-off-by: David Matlack <dmat...@google.com>
[xiaoguangrong: adjust the code to make it simpler for stable-tree fix.]
Signed-off-by: Xiao Guangrong <xiaogu...@linux.vnet.ibm.com>
Reviewed-by: David Matlack <dmat...@google.com>
Reviewed-by: Xiao Guangrong <xiaogu...@linux.vnet.ibm.com>
Tested-by: David Matlack <dmat...@google.com>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/include/asm/kvm_host.h | 1 +
arch/x86/kvm/mmu.c | 2 +-
arch/x86/kvm/x86.h | 20 +++++++++++++++-----
3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 944471f..c29677e 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -453,6 +453,7 @@ struct kvm_vcpu_arch {
u64 mmio_gva;
unsigned access;
gfn_t mmio_gfn;
+ u64 mmio_gen;

struct kvm_pmu pmu;

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index fd6dec6..84f4bca 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2842,7 +2842,7 @@ static void mmu_sync_roots(struct kvm_vcpu *vcpu)
if (!VALID_PAGE(vcpu->arch.mmu.root_hpa))
return;

- vcpu_clear_mmio_info(vcpu, ~0ul);
+ vcpu_clear_mmio_info(vcpu, MMIO_GVA_ANY);
kvm_mmu_audit(vcpu, AUDIT_PRE_SYNC);
if (vcpu->arch.mmu.root_level == PT64_ROOT_LEVEL) {
hpa_t root = vcpu->arch.mmu.root_hpa;
diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h
index cb80c29..1ce5611 100644
--- a/arch/x86/kvm/x86.h
+++ b/arch/x86/kvm/x86.h
@@ -78,15 +78,23 @@ static inline void vcpu_cache_mmio_info(struct kvm_vcpu *vcpu,
vcpu->arch.mmio_gva = gva & PAGE_MASK;
vcpu->arch.access = access;
vcpu->arch.mmio_gfn = gfn;
+ vcpu->arch.mmio_gen = kvm_memslots(vcpu->kvm)->generation;
+}
+
+static inline bool vcpu_match_mmio_gen(struct kvm_vcpu *vcpu)
+{
+ return vcpu->arch.mmio_gen == kvm_memslots(vcpu->kvm)->generation;
}

/*
- * Clear the mmio cache info for the given gva,
- * specially, if gva is ~0ul, we clear all mmio cache info.
+ * Clear the mmio cache info for the given gva. If gva is MMIO_GVA_ANY, we
+ * clear all mmio cache info.
*/
+#define MMIO_GVA_ANY (~(gva_t)0)
+
static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)
{
- if (gva != (~0ul) && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
+ if (gva != MMIO_GVA_ANY && vcpu->arch.mmio_gva != (gva & PAGE_MASK))
return;

vcpu->arch.mmio_gva = 0;
@@ -94,7 +102,8 @@ static inline void vcpu_clear_mmio_info(struct kvm_vcpu *vcpu, gva_t gva)

static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)
{
- if (vcpu->arch.mmio_gva && vcpu->arch.mmio_gva == (gva & PAGE_MASK))
+ if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gva &&
+ vcpu->arch.mmio_gva == (gva & PAGE_MASK))
return true;

return false;
@@ -102,7 +111,8 @@ static inline bool vcpu_match_mmio_gva(struct kvm_vcpu *vcpu, unsigned long gva)

static inline bool vcpu_match_mmio_gpa(struct kvm_vcpu *vcpu, gpa_t gpa)
{
- if (vcpu->arch.mmio_gfn && vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
+ if (vcpu_match_mmio_gen(vcpu) && vcpu->arch.mmio_gfn &&
+ vcpu->arch.mmio_gfn == gpa >> PAGE_SHIFT)
return true;

return false;

li...@kernel.org

unread,
Jan 27, 2015, 11:13:37 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Artem Bityutskiy, Zefan Li
From: Artem Bityutskiy <artem.bi...@linux.intel.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 07e19dff63e3d5d6500d831e36554ac9b1b0560e upstream.

The 'mst_mutex' is not needed since because 'ubifs_write_master()' is only
called on the mount path and commit path. The mount path is sequential and
there is no parallelism, and the commit path is also serialized - there is only
one commit going on at a time.

Signed-off-by: Artem Bityutskiy <artem.bi...@linux.intel.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ubifs/commit.c | 2 --
fs/ubifs/master.c | 7 +++----
fs/ubifs/super.c | 1 -
fs/ubifs/ubifs.h | 2 --
4 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/fs/ubifs/commit.c b/fs/ubifs/commit.c
index fb3b5c8..3920dc1 100644
--- a/fs/ubifs/commit.c
+++ b/fs/ubifs/commit.c
@@ -174,7 +174,6 @@ static int do_commit(struct ubifs_info *c)
if (err)
goto out;

- mutex_lock(&c->mst_mutex);
c->mst_node->cmt_no = cpu_to_le64(c->cmt_no);
c->mst_node->log_lnum = cpu_to_le32(new_ltail_lnum);
c->mst_node->root_lnum = cpu_to_le32(zroot.lnum);
@@ -204,7 +203,6 @@ static int do_commit(struct ubifs_info *c)
else
c->mst_node->flags &= ~cpu_to_le32(UBIFS_MST_NO_ORPHS);
err = ubifs_write_master(c);
- mutex_unlock(&c->mst_mutex);
if (err)
goto out;

diff --git a/fs/ubifs/master.c b/fs/ubifs/master.c
index 278c238..bb9f481 100644
--- a/fs/ubifs/master.c
+++ b/fs/ubifs/master.c
@@ -352,10 +352,9 @@ int ubifs_read_master(struct ubifs_info *c)
* ubifs_write_master - write master node.
* @c: UBIFS file-system description object
*
- * This function writes the master node. The caller has to take the
- * @c->mst_mutex lock before calling this function. Returns zero in case of
- * success and a negative error code in case of failure. The master node is
- * written twice to enable recovery.
+ * This function writes the master node. Returns zero in case of success and a
+ * negative error code in case of failure. The master node is written twice to
+ * enable recovery.
*/
int ubifs_write_master(struct ubifs_info *c)
{
diff --git a/fs/ubifs/super.c b/fs/ubifs/super.c
index d867bd9..129bb48 100644
--- a/fs/ubifs/super.c
+++ b/fs/ubifs/super.c
@@ -1984,7 +1984,6 @@ static struct ubifs_info *alloc_ubifs_info(struct ubi_volume_desc *ubi)
mutex_init(&c->lp_mutex);
mutex_init(&c->tnc_mutex);
mutex_init(&c->log_mutex);
- mutex_init(&c->mst_mutex);
mutex_init(&c->umount_mutex);
mutex_init(&c->bu_mutex);
mutex_init(&c->write_reserve_mutex);
diff --git a/fs/ubifs/ubifs.h b/fs/ubifs/ubifs.h
index 3f96261..cd62067 100644
--- a/fs/ubifs/ubifs.h
+++ b/fs/ubifs/ubifs.h
@@ -1041,7 +1041,6 @@ struct ubifs_debug_info;
*
* @mst_node: master node
* @mst_offs: offset of valid master node
- * @mst_mutex: protects the master node area, @mst_node, and @mst_offs
*
* @max_bu_buf_len: maximum bulk-read buffer length
* @bu_mutex: protects the pre-allocated bulk-read buffer and @c->bu
@@ -1281,7 +1280,6 @@ struct ubifs_info {

struct ubifs_mst_node *mst_node;
int mst_offs;
- struct mutex mst_mutex;

int max_bu_buf_len;
struct mutex bu_mutex;

li...@kernel.org

unread,
Jan 27, 2015, 11:13:45 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mathias Krause, Daniel Vetter, Duncan Laurie, Jarod Wilson, Rusty Russell, Jani Nikula, Zefan Li
From: Mathias Krause <min...@googlemail.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit bbe1c2740d3a25aa1dbe5d842d2ff09cddcdde0a upstream.

The __init annotations for the DMI callback functions are wrong as this
code can be called even after the module has been initialized, e.g. like
this:

# echo 1 > /sys/bus/pci/devices/0000:00:02.0/remove
# modprobe i915
# echo 1 > /sys/bus/pci/rescan

The first command will remove the PCI device from the kernel's device
list so the second command won't see it right away. But as it registers
a PCI driver it'll see it on the third command. If the system happens to
match one of the DMI table entries we'll try to call a function in long
released memory and generate an Oops, at best.

Fix this by removing the bogus annotation.

Modpost should have caught that one but it ignores section reference
mismatches from the .rodata section. :/

Fixes: 25e341cfc33d ("drm/i915: quirk away broken OpRegion VBT")
Fixes: 8ca4013d702d ("CHROMIUM: i915: Add DMI override to skip CRT...")
Fixes: 425d244c8670 ("drm/i915: ignore LVDS on intel graphics systems...")
Signed-off-by: Mathias Krause <min...@googlemail.com>
Cc: Daniel Vetter <daniel...@ffwll.ch>
Cc: Duncan Laurie <dla...@chromium.org>
Cc: Jarod Wilson <ja...@redhat.com>
Cc: Rusty Russell <ru...@rustcorp.com.au> # Can modpost be fixed?
Signed-off-by: Jani Nikula <jani....@intel.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/gpu/drm/i915/intel_bios.c | 2 +-
drivers/gpu/drm/i915/intel_crt.c | 2 +-
drivers/gpu/drm/i915/intel_lvds.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_bios.c b/drivers/gpu/drm/i915/intel_bios.c
index a2c9e56..d9e359a 100644
--- a/drivers/gpu/drm/i915/intel_bios.c
+++ b/drivers/gpu/drm/i915/intel_bios.c
@@ -651,7 +651,7 @@ init_vbt_defaults(struct drm_i915_private *dev_priv)
DRM_DEBUG_KMS("Set default to SSC at %dMHz\n", dev_priv->lvds_ssc_freq);
}

-static int __init intel_no_opregion_vbt_callback(const struct dmi_system_id *id)
+static int intel_no_opregion_vbt_callback(const struct dmi_system_id *id)
{
DRM_DEBUG_KMS("Falling back to manually reading VBT from "
"VBIOS ROM for %s\n",
diff --git a/drivers/gpu/drm/i915/intel_crt.c b/drivers/gpu/drm/i915/intel_crt.c
index a83f7ac..b4f71c2 100644
--- a/drivers/gpu/drm/i915/intel_crt.c
+++ b/drivers/gpu/drm/i915/intel_crt.c
@@ -564,7 +564,7 @@ static const struct drm_encoder_funcs intel_crt_enc_funcs = {
.destroy = intel_encoder_destroy,
};

-static int __init intel_no_crt_dmi_callback(const struct dmi_system_id *id)
+static int intel_no_crt_dmi_callback(const struct dmi_system_id *id)
{
DRM_DEBUG_KMS("Skipping CRT initialization for %s\n", id->ident);
return 1;
diff --git a/drivers/gpu/drm/i915/intel_lvds.c b/drivers/gpu/drm/i915/intel_lvds.c
index b695ab4..77190cc 100644
--- a/drivers/gpu/drm/i915/intel_lvds.c
+++ b/drivers/gpu/drm/i915/intel_lvds.c
@@ -619,7 +619,7 @@ static const struct drm_encoder_funcs intel_lvds_enc_funcs = {
.destroy = intel_encoder_destroy,
};

-static int __init intel_no_lvds_dmi_callback(const struct dmi_system_id *id)
+static int intel_no_lvds_dmi_callback(const struct dmi_system_id *id)
{
DRM_DEBUG_KMS("Skipping LVDS initialization for %s\n", id->ident);
return 1;

li...@kernel.org

unread,
Jan 27, 2015, 11:13:47 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Zefan Li
From: Al Viro <vi...@zeniv.linux.org.uk>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 88b368f27a094277143d8ecd5a056116f6a41520 upstream.

The check in __propagate_umount() ("has somebody explicitly mounted
something on that slave?") is done *before* taking the already doomed
victims out of the child lists.

Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
[lizf: Backported to 3.4:
- adjust context
- s/hlist_for_each_entry/list_for_each_entry/]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/namespace.c | 4 +++-
fs/pnode.c | 4 +++-
2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 4e46539..2985879 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1066,6 +1066,9 @@ void umount_tree(struct mount *mnt, int propagate, struct list_head *kill)
for (p = mnt; p; p = next_mnt(p, mnt))
list_move(&p->mnt_hash, &tmp_list);

+ list_for_each_entry(p, &tmp_list, mnt_hash)
+ list_del_init(&p->mnt_child);
+
if (propagate)
propagate_umount(&tmp_list);

@@ -1076,7 +1079,6 @@ void umount_tree(struct mount *mnt, int propagate, struct list_head *kill)
if (p->mnt_ns)
__mnt_make_shortterm(p);
p->mnt_ns = NULL;
- list_del_init(&p->mnt_child);
if (mnt_has_parent(p)) {
p->mnt_parent->mnt_ghosts++;
dentry_reset_mounted(p->mnt_mountpoint);
diff --git a/fs/pnode.c b/fs/pnode.c
index ab5fa9e..f61dcb4 100644
--- a/fs/pnode.c
+++ b/fs/pnode.c
@@ -333,8 +333,10 @@ static void __propagate_umount(struct mount *mnt)
* umount the child only if the child has no
* other children
*/
- if (child && list_empty(&child->mnt_mounts))
+ if (child && list_empty(&child->mnt_mounts)) {
+ list_del_init(&child->mnt_child);
list_move_tail(&child->mnt_hash, &mnt->mnt_hash);
+ }

li...@kernel.org

unread,
Jan 27, 2015, 11:13:54 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Hellstrom, Zefan Li
From: Thomas Hellstrom <thell...@vmware.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit f01ea0c3d9db536c64d47922716d8b3b8f21d850 upstream.

The code waiting for fifo idle was incorrect and could possibly spin
forever under certain circumstances.

Signed-off-by: Thomas Hellstrom <thell...@vmware.com>
Reported-by: Mark Sheldon <mark...@vmware.com>
Reviewed-by: Jakob Bornecrantz <ja...@vmware.com>
Reivewed-by: Mark Sheldon <mark...@vmware.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
index a0c2f12..decca82 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c
@@ -163,8 +163,9 @@ void vmw_fifo_release(struct vmw_private *dev_priv, struct vmw_fifo_state *fifo)

mutex_lock(&dev_priv->hw_mutex);

+ vmw_write(dev_priv, SVGA_REG_SYNC, SVGA_SYNC_GENERIC);
while (vmw_read(dev_priv, SVGA_REG_BUSY) != 0)
- vmw_write(dev_priv, SVGA_REG_SYNC, SVGA_SYNC_GENERIC);
+ ;

dev_priv->last_read_seqno = ioread32(fifo_mem + SVGA_FIFO_FENCE);

li...@kernel.org

unread,
Jan 27, 2015, 11:13:58 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Christian Borntraeger, Zefan Li
From: Christian Borntraeger <bornt...@de.ibm.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit f346026e55f1efd3949a67ddd1dcea7c1b9a615e upstream.

We must not fallthrough if the conditions for external call are not met.

Signed-off-by: Christian Borntraeger <bornt...@de.ibm.com>
Reviewed-by: Thomas Huth <th...@linux.vnet.ibm.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/s390/kvm/interrupt.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/s390/kvm/interrupt.c b/arch/s390/kvm/interrupt.c
index 10e13b3..df69bcb 100644
--- a/arch/s390/kvm/interrupt.c
+++ b/arch/s390/kvm/interrupt.c
@@ -43,6 +43,7 @@ static int __interrupt_is_deliverable(struct kvm_vcpu *vcpu,
return 0;
if (vcpu->arch.sie_block->gcr[0] & 0x2000ul)
return 1;
+ return 0;
case KVM_S390_INT_EMERGENCY:
if (psw_extint_disabled(vcpu))
return 0;

li...@kernel.org

unread,
Jan 27, 2015, 11:14:03 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andreas Noever, Bjorn Helgaas, Zefan Li
From: Andreas Noever <andreas...@gmail.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit bceee4a97eb58bd0e80e39eff11b506ddd9e7ad3 upstream.

pciehp assumes that dev->subordinate, the struct pci_bus for a bridge's
secondary bus, exists. But we do not create that bus if we run out of bus
numbers during enumeration. This leads to a NULL dereference in
init_slot() (and other places).

Change pciehp_probe() to return -ENODEV when no secondary bus is present.

Signed-off-by: Andreas Noever <andreas...@gmail.com>
Signed-off-by: Bjorn Helgaas <bhel...@google.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/pci/hotplug/pciehp_core.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index 9e39df9..75dc402 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -237,6 +237,13 @@ static int pciehp_probe(struct pcie_device *dev)
else if (pciehp_acpi_slot_detection_check(dev->port))
goto err_out_none;

+ if (!dev->port->subordinate) {
+ /* Can happen if we run out of bus numbers during probe */
+ dev_err(&dev->device,
+ "Hotplug bridge without secondary bus, ignoring\n");
+ goto err_out_none;
+ }
+
ctrl = pcie_init(dev);
if (!ctrl) {
dev_err(&dev->device, "Controller initialization failed\n");

li...@kernel.org

unread,
Jan 27, 2015, 11:14:13 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Eliad Peller, Eliad Peller, Johannes Berg, Zefan Li
From: Eliad Peller <el...@wizery.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit a5fe8e7695dc3f547e955ad2b662e3e72969e506 upstream.

alpha2 is defined as 2-chars array, but is used in multiple
places as string (e.g. with nla_put_string calls), which
might leak kernel data.

Solve it by simply adding an extra char for the NULL
terminator, making such operations safe.

Signed-off-by: Eliad Peller <eliadx...@intel.com>
Signed-off-by: Johannes Berg <johann...@intel.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/net/regulatory.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/regulatory.h b/include/net/regulatory.h
index a5f7993..8d64abf 100644
--- a/include/net/regulatory.h
+++ b/include/net/regulatory.h
@@ -97,7 +97,7 @@ struct ieee80211_reg_rule {

struct ieee80211_regdomain {
u32 n_reg_rules;
- char alpha2[2];
+ char alpha2[3];
u8 dfs_region;
struct ieee80211_reg_rule reg_rules[];
};

li...@kernel.org

unread,
Jan 27, 2015, 11:14:18 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Tejun Heo, Zefan Li
From: Tejun Heo <t...@kernel.org>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit f0d279654dea22b7a6ad34b9334aee80cda62cde upstream.

When pcpu_alloc_pages() fails midway, pcpu_free_pages() is invoked to
free what has already been allocated. The invocation is across the
whole requested range and pcpu_free_pages() will try to free all
non-NULL pages; unfortunately, this is incorrect as
pcpu_get_pages_and_bitmap(), unlike what its comment suggests, doesn't
clear the pages array and thus the array may have entries from the
previous invocations making the partial failure path free incorrect
pages.

Fix it by open-coding the partial freeing of the already allocated
pages.

Signed-off-by: Tejun Heo <t...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
mm/percpu-vm.c | 21 +++++++++++++++------
1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 405d331..6c055e4 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -108,7 +108,7 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
int page_start, int page_end)
{
const gfp_t gfp = GFP_KERNEL | __GFP_HIGHMEM | __GFP_COLD;
- unsigned int cpu;
+ unsigned int cpu, tcpu;
int i;

for_each_possible_cpu(cpu) {
@@ -116,14 +116,23 @@ static int pcpu_alloc_pages(struct pcpu_chunk *chunk,
struct page **pagep = &pages[pcpu_page_idx(cpu, i)];

*pagep = alloc_pages_node(cpu_to_node(cpu), gfp, 0);
- if (!*pagep) {
- pcpu_free_pages(chunk, pages, populated,
- page_start, page_end);
- return -ENOMEM;
- }
+ if (!*pagep)
+ goto err;
}
}
return 0;
+
+err:
+ while (--i >= page_start)
+ __free_page(pages[pcpu_page_idx(cpu, i)]);
+
+ for_each_possible_cpu(tcpu) {
+ if (tcpu == cpu)
+ break;
+ for (i = page_start; i < page_end; i++)
+ __free_page(pages[pcpu_page_idx(tcpu, i)]);
+ }
+ return -ENOMEM;
}

/**

li...@kernel.org

unread,
Jan 27, 2015, 11:14:33 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Alban Crequy, Tejun Heo, Zefan Li
From: Alban Crequy <alban....@collabora.co.uk>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 71b1fb5c4473a5b1e601d41b109bdfe001ec82e0 upstream.

/proc/<pid>/cgroup contains one cgroup path on each line. If cgroup names are
allowed to contain "\n", applications cannot parse /proc/<pid>/cgroup safely.

Signed-off-by: Alban Crequy <alban....@collabora.co.uk>
Signed-off-by: Tejun Heo <t...@kernel.org>
[lizf: Backported to 3.4:
- adjust context
- s/name/dentry->d_name.name/]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/cgroup.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 7c8f4f7..c776f89 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3838,6 +3838,11 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
{
struct cgroup *c_parent = dentry->d_parent->d_fsdata;

+ /* Do not accept '\n' to prevent making /proc/<pid>/cgroup unparsable.
+ */
+ if (strchr(dentry->d_name.name, '\n'))
+ return -EINVAL;
+
/* the vfs holds inode->i_mutex already */
return cgroup_create(c_parent, dentry, mode | S_IFDIR);

li...@kernel.org

unread,
Jan 27, 2015, 11:14:39 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Larry Finger, Nico Baggus, John W. Linville, Zefan Li
From: Larry Finger <Larry....@lwfinger.net>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit c66517165610b911e4c6d268f28d8c640832dbd1 upstream.

The Sitecom WLA-2102 adapter uses this driver.

Reported-by: Nico Baggus <nico-...@noci.xs4all.nl>
Signed-off-by: Larry Finger <Larry....@lwfinger.net>
Cc: Nico Baggus <nico-...@noci.xs4all.nl>
Signed-off-by: John W. Linville <linv...@tuxdriver.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/net/wireless/rtlwifi/rtl8192cu/sw.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
index 52a9c33..2c4cdce 100644
--- a/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
+++ b/drivers/net/wireless/rtlwifi/rtl8192cu/sw.c
@@ -306,6 +306,7 @@ static struct usb_device_id rtl8192c_usb_ids[] = {
{RTL_USB_DEVICE(0x0bda, 0x5088, rtl92cu_hal_cfg)}, /*Thinkware-CC&C*/
{RTL_USB_DEVICE(0x0df6, 0x0052, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
{RTL_USB_DEVICE(0x0df6, 0x005c, rtl92cu_hal_cfg)}, /*Sitecom - Edimax*/
+ {RTL_USB_DEVICE(0x0df6, 0x0070, rtl92cu_hal_cfg)}, /*Sitecom - 150N */
{RTL_USB_DEVICE(0x0df6, 0x0077, rtl92cu_hal_cfg)}, /*Sitecom-WLA2100V2*/
{RTL_USB_DEVICE(0x0eb0, 0x9071, rtl92cu_hal_cfg)}, /*NO Brand - Etop*/
{RTL_USB_DEVICE(0x4856, 0x0091, rtl92cu_hal_cfg)}, /*NetweeN - Feixun*/

li...@kernel.org

unread,
Jan 27, 2015, 11:14:51 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, K. Y. Srinivasan, Greg Kroah-Hartman, Zefan Li
From: "K. Y. Srinivasan" <k...@microsoft.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit fdeebcc62279119dbeafbc1a2e39e773839025fd upstream.

Posting messages to the host can fail because of transient resource
related failures. Correctly deal with these failures and increase the
number of attempts to post the message before giving up.

In this version of the patch, I have normalized the error code to
Linux error code.

Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
Tested-by: Sitsofe Wheeler <sit...@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/hv/connection.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)

diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index 650c9f0..2d52a1b 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -294,10 +294,21 @@ int vmbus_post_msg(void *buffer, size_t buflen)
* insufficient resources. Retry the operation a couple of
* times before giving up.
*/
- while (retries < 3) {
- ret = hv_post_message(conn_id, 1, buffer, buflen);
- if (ret != HV_STATUS_INSUFFICIENT_BUFFERS)
+ while (retries < 10) {
+ ret = hv_post_message(conn_id, 1, buffer, buflen);
+
+ switch (ret) {
+ case HV_STATUS_INSUFFICIENT_BUFFERS:
+ ret = -ENOMEM;
+ case -ENOMEM:
+ break;
+ case HV_STATUS_SUCCESS:
return ret;
+ default:
+ pr_err("hv_post_msg() failed; error code:%d\n", ret);
+ return -EINVAL;
+ }
+
retries++;
msleep(100);

li...@kernel.org

unread,
Jan 27, 2015, 11:14:56 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Johan Hovold, Zefan Li
From: Johan Hovold <jo...@kernel.org>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit ee444609dbae8afee420c3243ce4c5f442efb622 upstream.

Add device id for NOVITUS Bono E thermal printer.

Reported-by: Emanuel Koczwara <poc...@emanuelkoczwara.pl>
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/ftdi_sio.c | 1 +
drivers/usb/serial/ftdi_sio_ids.h | 6 ++++++
2 files changed, 7 insertions(+)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 121a052..d91185a 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -750,6 +750,7 @@ static struct usb_device_id id_table_combined [] = {
{ USB_DEVICE(FTDI_VID, FTDI_NDI_AURORA_SCU_PID),
.driver_info = (kernel_ulong_t)&ftdi_NDI_device_quirk },
{ USB_DEVICE(TELLDUS_VID, TELLDUS_TELLSTICK_PID) },
+ { USB_DEVICE(NOVITUS_VID, NOVITUS_BONO_E_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_S03_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_59_PID) },
{ USB_DEVICE(RTSYSTEMS_VID, RTSYSTEMS_USB_57A_PID) },
diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
index 0eb2e97..a5cd5f7 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -828,6 +828,12 @@
#define TELLDUS_TELLSTICK_PID 0x0C30 /* RF control dongle 433 MHz using FT232RL */

/*
+ * NOVITUS printers
+ */
+#define NOVITUS_VID 0x1a28
+#define NOVITUS_BONO_E_PID 0x6010
+
+/*
* RT Systems programming cables for various ham radios
*/
#define RTSYSTEMS_VID 0x2100 /* Vendor ID */

li...@kernel.org

unread,
Jan 27, 2015, 11:15:01 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Bjørn Mork, Johan Hovold, Zefan Li
From: Bjørn Mork <bj...@mork.no>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 049255f51644c1105775af228396d187402a5934 upstream.

Sierra Wireless Direct IP devices using the 68A3 product ID
can be configured for modes including a CDC ECM class function.
The known example uses interface numbers 12 and 13 for the ECM
control and data interfaces respectively, consistent with CDC
MBIM function interface numbering on other Sierra devices.

It seems cleaner to restrict this driver to the ff/ff/ff
vendor specific interfaces rather than increasing the already
long interface number blacklist. This should be more future
proof if Sierra adds more class functions using interface
numbers not yet in the blacklist.

Signed-off-by: Bjørn Mork <bj...@mork.no>
Signed-off-by: Johan Hovold <jo...@kernel.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/sierra.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index bd79d68..64df565 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -296,14 +296,16 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x1199, 0x68A2), /* Sierra Wireless MC77xx in QMI mode */
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
- { USB_DEVICE(0x1199, 0x68A3), /* Sierra Wireless Direct IP modems */
+ /* Sierra Wireless Direct IP modems */
+ { USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
/* AT&T Direct IP LTE modems */
{ USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
- { USB_DEVICE(0x0f3d, 0x68A3), /* Airprime/Sierra Wireless Direct IP modems */
+ /* Airprime/Sierra Wireless Direct IP modems */
+ { USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},

li...@kernel.org

unread,
Jan 27, 2015, 11:15:03 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Bjørn Mork, Johan Hovold, Zefan Li
From: Bjørn Mork <bj...@mork.no>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 5b3da69285c143b7ea76b3b9f73099ff1093ab73 upstream.

This VID:PID is used for some Direct IP devices behaving
identical to the already supported 0F3D:68AA devices.

Reported-by: Lars Melin <lar...@gmail.com>
Signed-off-by: Bjørn Mork <bj...@mork.no>
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/sierra.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/serial/sierra.c b/drivers/usb/serial/sierra.c
index 64df565..e3ddec0 100644
--- a/drivers/usb/serial/sierra.c
+++ b/drivers/usb/serial/sierra.c
@@ -300,6 +300,9 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68A3, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
},
+ { USB_DEVICE_AND_INTERFACE_INFO(0x1199, 0x68AA, 0xFF, 0xFF, 0xFF),
+ .driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist
+ },
/* AT&T Direct IP LTE modems */
{ USB_DEVICE_AND_INTERFACE_INFO(0x0F3D, 0x68AA, 0xFF, 0xFF, 0xFF),
.driver_info = (kernel_ulong_t)&direct_ip_interface_blacklist

li...@kernel.org

unread,
Jan 27, 2015, 11:15:20 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, K. Y. Srinivasan, Greg Kroah-Hartman, Zefan Li
From: "K. Y. Srinivasan" <k...@microsoft.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 45d727cee9e200f5b351528b9fb063b69cf702c8 upstream.

Fix a bug in vmbus_open() and properly propagate the error. I would
like to thank Dexuan Cui <de...@microsoft.com> for identifying the
issue.

Signed-off-by: K. Y. Srinivasan <k...@microsoft.com>
Tested-by: Sitsofe Wheeler <sit...@yahoo.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/hv/channel.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 1211d38..7bbbb95 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -207,8 +207,10 @@ int vmbus_open(struct vmbus_channel *newchannel, u32 send_ringbuffer_size,
ret = vmbus_post_msg(open_msg,
sizeof(struct vmbus_channel_open_channel));

- if (ret != 0)
+ if (ret != 0) {
+ err = ret;
goto error1;
+ }

t = wait_for_completion_timeout(&open_info->waitevent, 5*HZ);
if (t == 0) {

li...@kernel.org

unread,
Jan 27, 2015, 11:15:28 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Shevchenko, Mark Brown, Zefan Li
From: Andy Shevchenko <andriy.s...@linux.intel.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 8e45ef682cb31fda62ed4eeede5d9745a0a1b1e2 upstream.

Do full clean up at exit, means terminate all ongoing DMA transfers.

Signed-off-by: Andy Shevchenko <andriy.s...@linux.intel.com>
Signed-off-by: Mark Brown <bro...@kernel.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/spi/spi-dw-mid.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/spi/spi-dw-mid.c b/drivers/spi/spi-dw-mid.c
index 58fa14d..efc494a 100644
--- a/drivers/spi/spi-dw-mid.c
+++ b/drivers/spi/spi-dw-mid.c
@@ -89,7 +89,10 @@ err_exit:

static void mid_spi_dma_exit(struct dw_spi *dws)
{
+ dmaengine_terminate_all(dws->txchan);
dma_release_channel(dws->txchan);
+
+ dmaengine_terminate_all(dws->rxchan);
dma_release_channel(dws->rxchan);

li...@kernel.org

unread,
Jan 27, 2015, 11:15:29 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Shevchenko, Mark Brown, Zefan Li
From: Andy Shevchenko <andriy.s...@linux.intel.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit b41583e7299046abdc578c33f25ed83ee95b9b31 upstream.

In case of 8 bit mode and DMA usage we end up with every second byte written as
0. We have to respect bits_per_word settings what this patch actually does.

Signed-off-by: Andy Shevchenko <andriy.s...@linux.intel.com>
Signed-off-by: Mark Brown <bro...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/spi/spi-dw-mid.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-dw-mid.c b/drivers/spi/spi-dw-mid.c
index b9f0192..58fa14d 100644
--- a/drivers/spi/spi-dw-mid.c
+++ b/drivers/spi/spi-dw-mid.c
@@ -136,7 +136,7 @@ static int mid_spi_dma_transfer(struct dw_spi *dws, int cs_change)
txconf.dst_addr = dws->dma_addr;
txconf.dst_maxburst = LNW_DMA_MSIZE_16;
txconf.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
- txconf.dst_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTES;
+ txconf.dst_addr_width = dws->dma_width;
txconf.device_fc = false;

txchan->device->device_control(txchan, DMA_SLAVE_CONFIG,
@@ -159,7 +159,7 @@ static int mid_spi_dma_transfer(struct dw_spi *dws, int cs_change)
rxconf.src_addr = dws->dma_addr;
rxconf.src_maxburst = LNW_DMA_MSIZE_16;
rxconf.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
- rxconf.src_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTES;
+ rxconf.src_addr_width = dws->dma_width;
rxconf.device_fc = false;

rxchan->device->device_control(rxchan, DMA_SLAVE_CONFIG,

li...@kernel.org

unread,
Jan 27, 2015, 11:15:40 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Felipe Balbi, Alexis R. Cortes, Greg Kroah-Hartman, Zefan Li
From: Felipe Balbi <ba...@ti.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 96908589a8b2584b1185f834d365f5cc360e8226 upstream.

Commit 71c731a (usb: host: xhci: Fix Compliance Mode
on SN65LVP3502CP Hardware) implemented a workaround
for a known issue with Texas Instruments' USB 3.0
redriver IC but it left a condition where any xHCI
host would be taken out of reset if port was placed
in compliance mode and there was no device connected
to the port.

That condition would trigger a fake connection to a
non-existent device so that usbcore would trigger a
warm reset of the port, thus taking the link out of
reset.

This has the side-effect of preventing any xHCI host
connected to a Linux machine from starting and running
the USB 3.0 Electrical Compliance Suite because the
port will mysteriously taken out of compliance mode
and, thus, xHCI won't step through the necessary
compliance patterns for link validation.

This patch fixes the issue by just adding a missing
check for XHCI_COMP_MODE_QUIRK inside
xhci_hub_report_usb3_link_state() when PORT_CAS isn't
set.

This patch should be backported to all kernels containing
commit 71c731a.

Fixes: 71c731a (usb: host: xhci: Fix Compliance Mode on SN65LVP3502CP Hardware)
Cc: Alexis R. Cortes <alexis...@ti.com>
Signed-off-by: Felipe Balbi <ba...@ti.com>
Acked-by: Mathias Nyman <mathia...@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[lizf: Backported to 3.4:
- s/xhci_hub_report_usb3_link_state/xhci_hub_report_link_state/]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/host/xhci-hub.c | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/usb/host/xhci-hub.c b/drivers/usb/host/xhci-hub.c
index a94eabd..56ec28b 100644
--- a/drivers/usb/host/xhci-hub.c
+++ b/drivers/usb/host/xhci-hub.c
@@ -463,7 +463,8 @@ void xhci_test_and_clear_bit(struct xhci_hcd *xhci, __le32 __iomem **port_array,
}

/* Updates Link Status for super Speed port */
-static void xhci_hub_report_link_state(u32 *status, u32 status_reg)
+static void xhci_hub_report_link_state(struct xhci_hcd *xhci,
+ u32 *status, u32 status_reg)
{
u32 pls = status_reg & PORT_PLS_MASK;

@@ -502,7 +503,8 @@ static void xhci_hub_report_link_state(u32 *status, u32 status_reg)
* in which sometimes the port enters compliance mode
* caused by a delay on the host-device negotiation.
*/
- if (pls == USB_SS_PORT_LS_COMP_MOD)
+ if ((xhci->quirks & XHCI_COMP_MODE_QUIRK) &&
+ (pls == USB_SS_PORT_LS_COMP_MOD))
pls |= USB_PORT_STAT_CONNECTION;
}

@@ -680,7 +682,7 @@ int xhci_hub_control(struct usb_hcd *hcd, u16 typeReq, u16 wValue,
}
/* Update Port Link State for super speed ports*/
if (hcd->speed == HCD_USB3) {
- xhci_hub_report_link_state(&status, temp);
+ xhci_hub_report_link_state(xhci, &status, temp);
/*
* Verify if all USB3 Ports Have entered U0 already.
* Delete Compliance Mode Timer if so.

li...@kernel.org

unread,
Jan 27, 2015, 11:15:49 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Bryan O'Donoghue, Borislav Petkov, Ingo Molnar, Zefan Li
From: Bryan O'Donoghue <pure....@nexus-software.ie>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit ee1b5b165c0a2f04d2107e634e51f05d0eb107de upstream.

Quark x1000 advertises PGE via the standard CPUID method
PGE bits exist in Quark X1000's PTEs. In order to flush
an individual PTE it is necessary to reload CR3 irrespective
of the PTE.PGE bit.

See Quark Core_DevMan_001.pdf section 6.4.11

This bug was fixed in Galileo kernels, unfixed vanilla kernels are expected to
crash and burn on this platform.

Signed-off-by: Bryan O'Donoghue <pure....@nexus-software.ie>
Cc: Borislav Petkov <b...@alien8.de>
Link: http://lkml.kernel.org/r/1411514784-14885-1-git...@nexus-software.ie
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kernel/cpu/intel.c | 15 +++++++++++++++
1 file changed, 15 insertions(+)

diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
index 3e6ff6c..e7a64dd 100644
--- a/arch/x86/kernel/cpu/intel.c
+++ b/arch/x86/kernel/cpu/intel.c
@@ -143,6 +143,21 @@ static void __cpuinit early_init_intel(struct cpuinfo_x86 *c)
setup_clear_cpu_cap(X86_FEATURE_ERMS);
}
}
+
+ /*
+ * Intel Quark Core DevMan_001.pdf section 6.4.11
+ * "The operating system also is required to invalidate (i.e., flush)
+ * the TLB when any changes are made to any of the page table entries.
+ * The operating system must reload CR3 to cause the TLB to be flushed"
+ *
+ * As a result cpu_has_pge() in arch/x86/include/asm/tlbflush.h should
+ * be false so that __flush_tlb_all() causes CR3 insted of CR4.PGE
+ * to be modified
+ */
+ if (c->x86 == 5 && c->x86_model == 9) {
+ pr_info("Disabling PGE capability bit\n");
+ setup_clear_cpu_cap(X86_FEATURE_PGE);
+ }
}

#ifdef CONFIG_X86_32

li...@kernel.org

unread,
Jan 27, 2015, 11:16:03 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Willy Tarreau, Willem Pinckaers, Don A. Bailey, Greg Kroah-Hartman, Zefan Li
From: Willy Tarreau <w...@1wt.eu>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d98a0526434d27e261f622cf9d2e0028b5ff1a00 upstream.

Add a complete description of the LZO format as processed by the
decompressor. I have not found a public specification of this format
hence this analysis, which will be used to better understand the code.

Cc: Willem Pinckaers <wil...@lekkertech.net>
Cc: "Don A. Bailey" <do...@securitymouse.com>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
Documentation/lzo.txt | 164 ++++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 164 insertions(+)
create mode 100644 Documentation/lzo.txt

diff --git a/Documentation/lzo.txt b/Documentation/lzo.txt
new file mode 100644
index 0000000..ea45dd3
--- /dev/null
+++ b/Documentation/lzo.txt
@@ -0,0 +1,164 @@
+
+LZO stream format as understood by Linux's LZO decompressor
+===========================================================
+
+Introduction
+
+ This is not a specification. No specification seems to be publicly available
+ for the LZO stream format. This document describes what input format the LZO
+ decompressor as implemented in the Linux kernel understands. The file subject
+ of this analysis is lib/lzo/lzo1x_decompress_safe.c. No analysis was made on
+ the compressor nor on any other implementations though it seems likely that
+ the format matches the standard one. The purpose of this document is to
+ better understand what the code does in order to propose more efficient fixes
+ for future bug reports.
+
+Description
+
+ The stream is composed of a series of instructions, operands, and data. The
+ instructions consist in a few bits representing an opcode, and bits forming
+ the operands for the instruction, whose size and position depend on the
+ opcode and on the number of literals copied by previous instruction. The
+ operands are used to indicate :
+
+ - a distance when copying data from the dictionary (past output buffer)
+ - a length (number of bytes to copy from dictionary)
+ - the number of literals to copy, which is retained in variable "state"
+ as a piece of information for next instructions.
+
+ Optionally depending on the opcode and operands, extra data may follow. These
+ extra data can be a complement for the operand (eg: a length or a distance
+ encoded on larger values), or a literal to be copied to the output buffer.
+
+ The first byte of the block follows a different encoding from other bytes, it
+ seems to be optimized for literal use only, since there is no dictionary yet
+ prior to that byte.
+
+ Lengths are always encoded on a variable size starting with a small number
+ of bits in the operand. If the number of bits isn't enough to represent the
+ length, up to 255 may be added in increments by consuming more bytes with a
+ rate of at most 255 per extra byte (thus the compression ratio cannot exceed
+ around 255:1). The variable length encoding using #bits is always the same :
+
+ length = byte & ((1 << #bits) - 1)
+ if (!length) {
+ length = ((1 << #bits) - 1)
+ length += 255*(number of zero bytes)
+ length += first-non-zero-byte
+ }
+ length += constant (generally 2 or 3)
+
+ For references to the dictionary, distances are relative to the output
+ pointer. Distances are encoded using very few bits belonging to certain
+ ranges, resulting in multiple copy instructions using different encodings.
+ Certain encodings involve one extra byte, others involve two extra bytes
+ forming a little-endian 16-bit quantity (marked LE16 below).
+
+ After any instruction except the large literal copy, 0, 1, 2 or 3 literals
+ are copied before starting the next instruction. The number of literals that
+ were copied may change the meaning and behaviour of the next instruction. In
+ practice, only one instruction needs to know whether 0, less than 4, or more
+ literals were copied. This is the information stored in the <state> variable
+ in this implementation. This number of immediate literals to be copied is
+ generally encoded in the last two bits of the instruction but may also be
+ taken from the last two bits of an extra operand (eg: distance).
+
+ End of stream is declared when a block copy of distance 0 is seen. Only one
+ instruction may encode this distance (0001HLLL), it takes one LE16 operand
+ for the distance, thus requiring 3 bytes.
+
+ IMPORTANT NOTE : in the code some length checks are missing because certain
+ instructions are called under the assumption that a certain number of bytes
+ follow because it has already been garanteed before parsing the instructions.
+ They just have to "refill" this credit if they consume extra bytes. This is
+ an implementation design choice independant on the algorithm or encoding.
+
+Byte sequences
+
+ First byte encoding :
+
+ 0..17 : follow regular instruction encoding, see below. It is worth
+ noting that codes 16 and 17 will represent a block copy from
+ the dictionary which is empty, and that they will always be
+ invalid at this place.
+
+ 18..21 : copy 0..3 literals
+ state = (byte - 17) = 0..3 [ copy <state> literals ]
+ skip byte
+
+ 22..255 : copy literal string
+ length = (byte - 17) = 4..238
+ state = 4 [ don't copy extra literals ]
+ skip byte
+
+ Instruction encoding :
+
+ 0 0 0 0 X X X X (0..15)
+ Depends on the number of literals copied by the last instruction.
+ If last instruction did not copy any literal (state == 0), this
+ encoding will be a copy of 4 or more literal, and must be interpreted
+ like this :
+
+ 0 0 0 0 L L L L (0..15) : copy long literal string
+ length = 3 + (L ?: 15 + (zero_bytes * 255) + non_zero_byte)
+ state = 4 (no extra literals are copied)
+
+ If last instruction used to copy between 1 to 3 literals (encoded in
+ the instruction's opcode or distance), the instruction is a copy of a
+ 2-byte block from the dictionary within a 1kB distance. It is worth
+ noting that this instruction provides little savings since it uses 2
+ bytes to encode a copy of 2 other bytes but it encodes the number of
+ following literals for free. It must be interpreted like this :
+
+ 0 0 0 0 D D S S (0..15) : copy 2 bytes from <= 1kB distance
+ length = 2
+ state = S (copy S literals after this block)
+ Always followed by exactly one byte : H H H H H H H H
+ distance = (H << 2) + D + 1
+
+ If last instruction used to copy 4 or more literals (as detected by
+ state == 4), the instruction becomes a copy of a 3-byte block from the
+ dictionary from a 2..3kB distance, and must be interpreted like this :
+
+ 0 0 0 0 D D S S (0..15) : copy 3 bytes from 2..3 kB distance
+ length = 3
+ state = S (copy S literals after this block)
+ Always followed by exactly one byte : H H H H H H H H
+ distance = (H << 2) + D + 2049
+
+ 0 0 0 1 H L L L (16..31)
+ Copy of a block within 16..48kB distance (preferably less than 10B)
+ length = 2 + (L ?: 7 + (zero_bytes * 255) + non_zero_byte)
+ Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S
+ distance = 16384 + (H << 14) + D
+ state = S (copy S literals after this block)
+ End of stream is reached if distance == 16384
+
+ 0 0 1 L L L L L (32..63)
+ Copy of small block within 16kB distance (preferably less than 34B)
+ length = 2 + (L ?: 31 + (zero_bytes * 255) + non_zero_byte)
+ Always followed by exactly one LE16 : D D D D D D D D : D D D D D D S S
+ distance = D + 1
+ state = S (copy S literals after this block)
+
+ 0 1 L D D D S S (64..127)
+ Copy 3-4 bytes from block within 2kB distance
+ state = S (copy S literals after this block)
+ length = 3 + L
+ Always followed by exactly one byte : H H H H H H H H
+ distance = (H << 3) + D + 1
+
+ 1 L L D D D S S (128..255)
+ Copy 5-8 bytes from block within 2kB distance
+ state = S (copy S literals after this block)
+ length = 5 + L
+ Always followed by exactly one byte : H H H H H H H H
+ distance = (H << 3) + D + 1
+
+Authors
+
+ This document was written by Willy Tarreau <w...@1wt.eu> on 2014/07/19 during an
+ analysis of the decompression code available in Linux 3.16-rc5. The code is
+ tricky, it is possible that this document contains mistakes or that a few
+ corner cases were overlooked. In any case, please report any doubt, fix, or
+ proposed updates to the author(s) so that the document can be updated.

li...@kernel.org

unread,
Jan 27, 2015, 11:16:19 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Willy Tarreau, Don A. Bailey, Greg Kroah-Hartman, Zefan Li
From: Willy Tarreau <w...@1wt.eu>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 72cf90124e87d975d0b2114d930808c58b4c05e4 upstream.

This fix ensures that we never meet an integer overflow while adding
255 while parsing a variable length encoding. It works differently from
commit 206a81c ("lzo: properly check for overruns") because instead of
ensuring that we don't overrun the input, which is tricky to guarantee
due to many assumptions in the code, it simply checks that the cumulated
number of 255 read cannot overflow by bounding this number.

The MAX_255_COUNT is the maximum number of times we can add 255 to a base
count without overflowing an integer. The multiply will overflow when
multiplying 255 by more than MAXINT/255. The sum will overflow earlier
depending on the base count. Since the base count is taken from a u8
and a few bits, it is safe to assume that it will always be lower than
or equal to 2*255, thus we can always prevent any overflow by accepting
two less 255 steps.

This patch also reduces the CPU overhead and actually increases performance
by 1.1% compared to the initial code, while the previous fix costs 3.1%
(measured on x86_64).

The fix needs to be backported to all currently supported stable kernels.

Reported-by: Willem Pinckaers <wil...@lekkertech.net>
Cc: "Don A. Bailey" <do...@securitymouse.com>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
lib/lzo/lzo1x_decompress_safe.c | 43 +++++++++++++++++++++++++++++++++++------
1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/lib/lzo/lzo1x_decompress_safe.c b/lib/lzo/lzo1x_decompress_safe.c
index 569985d..a1c387f 100644
--- a/lib/lzo/lzo1x_decompress_safe.c
+++ b/lib/lzo/lzo1x_decompress_safe.c
@@ -25,6 +25,16 @@
#define NEED_OP(x) if (!HAVE_OP(x)) goto output_overrun
#define TEST_LB(m_pos) if ((m_pos) < out) goto lookbehind_overrun

+/* This MAX_255_COUNT is the maximum number of times we can add 255 to a base
+ * count without overflowing an integer. The multiply will overflow when
+ * multiplying 255 by more than MAXINT/255. The sum will overflow earlier
+ * depending on the base count. Since the base count is taken from a u8
+ * and a few bits, it is safe to assume that it will always be lower than
+ * or equal to 2*255, thus we can always prevent any overflow by accepting
+ * two less 255 steps. See Documentation/lzo.txt for more information.
+ */
+#define MAX_255_COUNT ((((size_t)~0) / 255) - 2)
+
int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
unsigned char *out, size_t *out_len)
{
@@ -55,12 +65,19 @@ int lzo1x_decompress_safe(const unsigned char *in, size_t in_len,
if (t < 16) {
if (likely(state == 0)) {
if (unlikely(t == 0)) {
+ size_t offset;
+ const unsigned char *ip_last = ip;
+
while (unlikely(*ip == 0)) {
- t += 255;
ip++;
NEED_IP(1);
}
- t += 15 + *ip++;
+ offset = ip - ip_last;
+ if (unlikely(offset > MAX_255_COUNT))
+ return LZO_E_ERROR;
+
+ offset = (offset << 8) - offset;
+ t += offset + 15 + *ip++;
}
t += 3;
copy_literal_run:
@@ -116,12 +133,19 @@ copy_literal_run:
} else if (t >= 32) {
t = (t & 31) + (3 - 1);
if (unlikely(t == 2)) {
+ size_t offset;
+ const unsigned char *ip_last = ip;
+
while (unlikely(*ip == 0)) {
- t += 255;
ip++;
NEED_IP(1);
}
- t += 31 + *ip++;
+ offset = ip - ip_last;
+ if (unlikely(offset > MAX_255_COUNT))
+ return LZO_E_ERROR;
+
+ offset = (offset << 8) - offset;
+ t += offset + 31 + *ip++;
NEED_IP(2);
}
m_pos = op - 1;
@@ -134,12 +158,19 @@ copy_literal_run:
m_pos -= (t & 8) << 11;
t = (t & 7) + (3 - 1);
if (unlikely(t == 2)) {
+ size_t offset;
+ const unsigned char *ip_last = ip;
+
while (unlikely(*ip == 0)) {
- t += 255;
ip++;
NEED_IP(1);
}
- t += 7 + *ip++;
+ offset = ip - ip_last;
+ if (unlikely(offset > MAX_255_COUNT))
+ return LZO_E_ERROR;
+
+ offset = (offset << 8) - offset;
+ t += offset + 7 + *ip++;
NEED_IP(2);
}
next = get_unaligned_le16(ip);

li...@kernel.org

unread,
Jan 27, 2015, 11:16:24 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Ilya Dryomov, Zefan Li
From: Ilya Dryomov <ilya.d...@inktank.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit c27a3e4d667fdcad3db7b104f75659478e0c68d8 upstream.

We hard code cephx auth ticket buffer size to 256 bytes. This isn't
enough for any moderate setups and, in case tickets themselves are not
encrypted, leads to buffer overflows (ceph_x_decrypt() errors out, but
ceph_decode_copy() doesn't - it's just a memcpy() wrapper). Since the
buffer is allocated dynamically anyway, allocated it a bit later, at
the point where we know how much is going to be needed.

Fixes: http://tracker.ceph.com/issues/8979

Signed-off-by: Ilya Dryomov <ilya.d...@inktank.com>
Reviewed-by: Sage Weil <sa...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
net/ceph/auth_x.c | 64 +++++++++++++++++++++++++------------------------------
1 file changed, 29 insertions(+), 35 deletions(-)

diff --git a/net/ceph/auth_x.c b/net/ceph/auth_x.c
index 0eb146d..de6662b 100644
--- a/net/ceph/auth_x.c
+++ b/net/ceph/auth_x.c
@@ -13,8 +13,6 @@
#include "auth_x.h"
#include "auth_x_protocol.h"

-#define TEMP_TICKET_BUF_LEN 256
-
static void ceph_x_validate_tickets(struct ceph_auth_client *ac, int *pneed);

static int ceph_x_is_authenticated(struct ceph_auth_client *ac)
@@ -64,7 +62,7 @@ static int ceph_x_encrypt(struct ceph_crypto_key *secret,
}

static int ceph_x_decrypt(struct ceph_crypto_key *secret,
- void **p, void *end, void *obuf, size_t olen)
+ void **p, void *end, void **obuf, size_t olen)
{
struct ceph_x_encrypt_header head;
size_t head_len = sizeof(head);
@@ -75,8 +73,14 @@ static int ceph_x_decrypt(struct ceph_crypto_key *secret,
return -EINVAL;

dout("ceph_x_decrypt len %d\n", len);
- ret = ceph_decrypt2(secret, &head, &head_len, obuf, &olen,
- *p, len);
+ if (*obuf == NULL) {
+ *obuf = kmalloc(len, GFP_NOFS);
+ if (!*obuf)
+ return -ENOMEM;
+ olen = len;
+ }
+
+ ret = ceph_decrypt2(secret, &head, &head_len, *obuf, &olen, *p, len);
if (ret)
return ret;
if (head.struct_v != 1 || le64_to_cpu(head.magic) != CEPHX_ENC_MAGIC)
@@ -131,18 +135,19 @@ static void remove_ticket_handler(struct ceph_auth_client *ac,

static int process_one_ticket(struct ceph_auth_client *ac,
struct ceph_crypto_key *secret,
- void **p, void *end,
- void *dbuf, void *ticket_buf)
+ void **p, void *end)
{
struct ceph_x_info *xi = ac->private;
int type;
u8 tkt_struct_v, blob_struct_v;
struct ceph_x_ticket_handler *th;
+ void *dbuf = NULL;
void *dp, *dend;
int dlen;
char is_enc;
struct timespec validity;
struct ceph_crypto_key old_key;
+ void *ticket_buf = NULL;
void *tp, *tpend;
struct ceph_timespec new_validity;
struct ceph_crypto_key new_session_key;
@@ -167,8 +172,7 @@ static int process_one_ticket(struct ceph_auth_client *ac,
}

/* blob for me */
- dlen = ceph_x_decrypt(secret, p, end, dbuf,
- TEMP_TICKET_BUF_LEN);
+ dlen = ceph_x_decrypt(secret, p, end, &dbuf, 0);
if (dlen <= 0) {
ret = dlen;
goto out;
@@ -195,20 +199,25 @@ static int process_one_ticket(struct ceph_auth_client *ac,

/* ticket blob for service */
ceph_decode_8_safe(p, end, is_enc, bad);
- tp = ticket_buf;
if (is_enc) {
/* encrypted */
dout(" encrypted ticket\n");
- dlen = ceph_x_decrypt(&old_key, p, end, ticket_buf,
- TEMP_TICKET_BUF_LEN);
+ dlen = ceph_x_decrypt(&old_key, p, end, &ticket_buf, 0);
if (dlen < 0) {
ret = dlen;
goto out;
}
+ tp = ticket_buf;
dlen = ceph_decode_32(&tp);
} else {
/* unencrypted */
ceph_decode_32_safe(p, end, dlen, bad);
+ ticket_buf = kmalloc(dlen, GFP_NOFS);
+ if (!ticket_buf) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ tp = ticket_buf;
ceph_decode_need(p, end, dlen, bad);
ceph_decode_copy(p, ticket_buf, dlen);
}
@@ -237,6 +246,8 @@ static int process_one_ticket(struct ceph_auth_client *ac,
xi->have_keys |= th->service;

out:
+ kfree(ticket_buf);
+ kfree(dbuf);
return ret;

bad:
@@ -249,21 +260,10 @@ static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
void *buf, void *end)
{
void *p = buf;
- char *dbuf;
- char *ticket_buf;
u8 reply_struct_v;
u32 num;
int ret;

- dbuf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
- if (!dbuf)
- return -ENOMEM;
-
- ret = -ENOMEM;
- ticket_buf = kmalloc(TEMP_TICKET_BUF_LEN, GFP_NOFS);
- if (!ticket_buf)
- goto out_dbuf;
-
ceph_decode_8_safe(&p, end, reply_struct_v, bad);
if (reply_struct_v != 1)
return -EINVAL;
@@ -272,22 +272,15 @@ static int ceph_x_proc_ticket_reply(struct ceph_auth_client *ac,
dout("%d tickets\n", num);

while (num--) {
- ret = process_one_ticket(ac, secret, &p, end,
- dbuf, ticket_buf);
+ ret = process_one_ticket(ac, secret, &p, end);
if (ret)
- goto out;
+ return ret;
}

- ret = 0;
-out:
- kfree(ticket_buf);
-out_dbuf:
- kfree(dbuf);
- return ret;
+ return 0;

bad:
- ret = -EINVAL;
- goto out;
+ return -EINVAL;
}

static int ceph_x_build_authorizer(struct ceph_auth_client *ac,
@@ -603,13 +596,14 @@ static int ceph_x_verify_authorizer_reply(struct ceph_auth_client *ac,
struct ceph_x_ticket_handler *th;
int ret = 0;
struct ceph_x_authorize_reply reply;
+ void *preply = &reply;
void *p = au->reply_buf;
void *end = p + sizeof(au->reply_buf);

th = get_ticket_handler(ac, au->service);
if (IS_ERR(th))
return PTR_ERR(th);
- ret = ceph_x_decrypt(&th->session_key, &p, end, &reply, sizeof(reply));
+ ret = ceph_x_decrypt(&th->session_key, &p, end, &preply, sizeof(reply));
if (ret < 0)
return ret;
if (ret != sizeof(reply))

li...@kernel.org

unread,
Jan 27, 2015, 11:16:30 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, John Sung, Dmitry Torokhov, Zefan Li
From: John Sung <penmoun...@gmail.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit a80d8b02751060a178bb1f7a6b7a93645a7a308b upstream.

When running a 32-bit inputattach utility in a 64-bit system, there will be
error code "inputattach: can't set device type". This is caused by the
serport device driver not supporting compat_ioctl, so that SPIOCSTYPE ioctl
fails.

Signed-off-by: John Sung <penmoun...@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/input/serio/serport.c | 45 ++++++++++++++++++++++++++++++++++++-------
1 file changed, 38 insertions(+), 7 deletions(-)

diff --git a/drivers/input/serio/serport.c b/drivers/input/serio/serport.c
index 8755f5f..e4ecf3b 100644
--- a/drivers/input/serio/serport.c
+++ b/drivers/input/serio/serport.c
@@ -21,6 +21,7 @@
#include <linux/init.h>
#include <linux/serio.h>
#include <linux/tty.h>
+#include <linux/compat.h>

MODULE_AUTHOR("Vojtech Pavlik <voj...@ucw.cz>");
MODULE_DESCRIPTION("Input device TTY line discipline");
@@ -196,28 +197,55 @@ static ssize_t serport_ldisc_read(struct tty_struct * tty, struct file * file, u
return 0;
}

+static void serport_set_type(struct tty_struct *tty, unsigned long type)
+{
+ struct serport *serport = tty->disc_data;
+
+ serport->id.proto = type & 0x000000ff;
+ serport->id.id = (type & 0x0000ff00) >> 8;
+ serport->id.extra = (type & 0x00ff0000) >> 16;
+}
+
/*
* serport_ldisc_ioctl() allows to set the port protocol, and device ID
*/

-static int serport_ldisc_ioctl(struct tty_struct * tty, struct file * file, unsigned int cmd, unsigned long arg)
+static int serport_ldisc_ioctl(struct tty_struct *tty, struct file *file,
+ unsigned int cmd, unsigned long arg)
{
- struct serport *serport = (struct serport*) tty->disc_data;
- unsigned long type;
-
if (cmd == SPIOCSTYPE) {
+ unsigned long type;
+
if (get_user(type, (unsigned long __user *) arg))
return -EFAULT;

- serport->id.proto = type & 0x000000ff;
- serport->id.id = (type & 0x0000ff00) >> 8;
- serport->id.extra = (type & 0x00ff0000) >> 16;
+ serport_set_type(tty, type);
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+#ifdef CONFIG_COMPAT
+#define COMPAT_SPIOCSTYPE _IOW('q', 0x01, compat_ulong_t)
+static long serport_ldisc_compat_ioctl(struct tty_struct *tty,
+ struct file *file,
+ unsigned int cmd, unsigned long arg)
+{
+ if (cmd == COMPAT_SPIOCSTYPE) {
+ void __user *uarg = compat_ptr(arg);
+ compat_ulong_t compat_type;
+
+ if (get_user(compat_type, (compat_ulong_t __user *)uarg))
+ return -EFAULT;

+ serport_set_type(tty, compat_type);
return 0;
}

return -EINVAL;
}
+#endif

static void serport_ldisc_write_wakeup(struct tty_struct * tty)
{
@@ -241,6 +269,9 @@ static struct tty_ldisc_ops serport_ldisc = {
.close = serport_ldisc_close,
.read = serport_ldisc_read,
.ioctl = serport_ldisc_ioctl,
+#ifdef CONFIG_COMPAT
+ .compat_ioctl = serport_ldisc_compat_ioctl,
+#endif
.receive_buf = serport_ldisc_receive,
.write_wakeup = serport_ldisc_write_wakeup
};

li...@kernel.org

unread,
Jan 27, 2015, 11:16:39 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Joe Lawrence, Greg Kroah-Hartman, Zefan Li
From: Joe Lawrence <joe.la...@stratus.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit c605f3cdff53a743f6d875b76956b239deca1272 upstream.

During surprise device hotplug removal tests, it was observed that
hub_events may try to call usb_lock_device on a device that has already
been freed. Protect the usb_device by taking out a reference (under the
hub_event_lock) when hub_events pulls it off the list, returning the
reference after hub_events is finished using it.

Signed-off-by: Joe Lawrence <joe.la...@stratus.com>
Suggested-by: David Bulkow <david....@stratus.com> for using kref
Suggested-by: Alan Stern <st...@rowland.harvard.edu> for placement
Acked-by: Alan Stern <st...@rowland.harvard.edu>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/core/hub.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index e0d4d90..62a9e44 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3729,9 +3729,10 @@ static void hub_events(void)

hub = list_entry(tmp, struct usb_hub, event_list);
kref_get(&hub->kref);
+ hdev = hub->hdev;
+ usb_get_dev(hdev);
spin_unlock_irq(&hub_event_lock);

- hdev = hub->hdev;
hub_dev = hub->intfdev;
intf = to_usb_interface(hub_dev);
dev_dbg(hub_dev, "state %d ports %d chg %04x evt %04x\n",
@@ -3946,6 +3947,7 @@ static void hub_events(void)
usb_autopm_put_interface(intf);
loop_disconnected:
usb_unlock_device(hdev);
+ usb_put_dev(hdev);
kref_put(&hub->kref, hub_release);

} /* end while (1) */

li...@kernel.org

unread,
Jan 27, 2015, 11:16:48 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mark, Greg Kroah-Hartman, Zefan Li
From: Mark <ma...@clara.co.uk>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit c66f1c62e85927357e7b3f4c701614dcb5c498a2 upstream.

The Iomega Jaz USB Adapter is a SCSI-USB converter cable. The hardware
seems to be identical to e.g. the Microtech XpressSCSI, using a Shuttle/
SCM chip set. However its firmware restricts it to only work with Jaz
drives.

On connecting the cable a message like this appears four times in the log:
reset full speed USB device number 4 using uhci_hcd

That's non-fatal but the US_FL_SINGLE_LUN quirk fixes it.

Signed-off-by: Mark Knibbs <ma...@clara.co.uk>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/storage/unusual_devs.h | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/drivers/usb/storage/unusual_devs.h b/drivers/usb/storage/unusual_devs.h
index 1d9fc30..813422e 100644
--- a/drivers/usb/storage/unusual_devs.h
+++ b/drivers/usb/storage/unusual_devs.h
@@ -733,6 +733,12 @@ UNUSUAL_DEV( 0x059b, 0x0001, 0x0100, 0x0100,
USB_SC_DEVICE, USB_PR_DEVICE, NULL,
US_FL_SINGLE_LUN ),

+UNUSUAL_DEV( 0x059b, 0x0040, 0x0100, 0x0100,
+ "Iomega",
+ "Jaz USB Adapter",
+ USB_SC_DEVICE, USB_PR_DEVICE, NULL,
+ US_FL_SINGLE_LUN ),
+
/* Reported by <Hendryk....@gmx.de> */
UNUSUAL_DEV( 0x059f, 0x0643, 0x0000, 0x0000,
"LaCie",

li...@kernel.org

unread,
Jan 27, 2015, 11:16:54 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Thomas Gleixner, Peter Zijlstra, Zefan Li
From: Thomas Gleixner <tg...@linutronix.de>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 13c42c2f43b19aab3195f2d357db00d1e885eaa8 upstream.

futex_wait_requeue_pi() calls futex_wait_setup(). If
futex_wait_setup() succeeds it returns with hb->lock held and
preemption disabled. Now the sanity check after this does:

if (match_futex(&q.key, &key2)) {
ret = -EINVAL;
goto out_put_keys;
}

which releases the keys but does not release hb->lock.

So we happily return to user space with hb->lock held and therefor
preemption disabled.

Unlock hb->lock before taking the exit route.

Reported-by: Dave "Trinity" Jones <da...@redhat.com>
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Reviewed-by: Darren Hart <dvh...@linux.intel.com>
Reviewed-by: Davidlohr Bueso <da...@stgolabs.net>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Link: http://lkml.kernel.org/r/alpine.DEB.2.10.1409112318500.4178@nanos
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
[lizf: Backported to 3.4: queue_unlock() takes two parameters]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/futex.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 9396b7b..41dfb18 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2460,6 +2460,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, unsigned int flags,
* shared futexes. We need to compare the keys:
*/
if (match_futex(&q.key, &key2)) {
+ queue_unlock(&q, hb);
ret = -EINVAL;
goto out_put_keys;

li...@kernel.org

unread,
Jan 27, 2015, 11:16:58 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Richard Larocque, Thomas Gleixner, Ingo Molnar, Richard Cochran, Prarit Bhargava, Sharvil Nanavati, John Stultz, Zefan Li
From: Richard Larocque <rlar...@google.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit e86fea764991e00a03ff1e56409ec9cacdbda4c9 upstream.

Returns the time remaining for an alarm timer, rather than the time at
which it is scheduled to expire. If the timer has already expired or it
is not currently scheduled, the it_value's members are set to zero.

This new behavior matches that of the other posix-timers and the POSIX
specifications.

This is a change in user-visible behavior, and may break existing
applications. Hopefully, few users rely on the old incorrect behavior.

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Richard Cochran <richard...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Sharvil Nanavati <sha...@google.com>
Signed-off-by: Richard Larocque <rlar...@google.com>
[jstultz: minor style tweak]
Signed-off-by: John Stultz <john....@linaro.org>
[lizf: Backported to 3.4:
- add alarm_expires_remaining() introduced by commit 6cffe00f7d4e]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/time/alarmtimer.c | 24 +++++++++++++++++-------
1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index b704579..c72d3dd 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -232,6 +232,12 @@ static enum hrtimer_restart alarmtimer_fired(struct hrtimer *timer)

}

+ktime_t alarm_expires_remaining(const struct alarm *alarm)
+{
+ struct alarm_base *base = &alarm_bases[alarm->type];
+ return ktime_sub(alarm->node.expires, base->gettime());
+}
+
#ifdef CONFIG_RTC_CLASS
/**
* alarmtimer_suspend - Suspend time callback
@@ -525,18 +531,22 @@ static int alarm_timer_create(struct k_itimer *new_timer)
* @new_timer: k_itimer pointer
* @cur_setting: itimerspec data to fill
*
- * Copies the itimerspec data out from the k_itimer
+ * Copies out the current itimerspec data
*/
static void alarm_timer_get(struct k_itimer *timr,
struct itimerspec *cur_setting)
{
- memset(cur_setting, 0, sizeof(struct itimerspec));
+ ktime_t relative_expiry_time =
+ alarm_expires_remaining(&(timr->it.alarm.alarmtimer));
+
+ if (ktime_to_ns(relative_expiry_time) > 0) {
+ cur_setting->it_value = ktime_to_timespec(relative_expiry_time);
+ } else {
+ cur_setting->it_value.tv_sec = 0;
+ cur_setting->it_value.tv_nsec = 0;
+ }

- cur_setting->it_interval =
- ktime_to_timespec(timr->it.alarm.interval);
- cur_setting->it_value =
- ktime_to_timespec(timr->it.alarm.alarmtimer.node.expires);
- return;
+ cur_setting->it_interval = ktime_to_timespec(timr->it.alarm.interval);
}

/**

li...@kernel.org

unread,
Jan 27, 2015, 11:17:07 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Adamson, Trond Myklebust, Zefan Li
From: Andy Adamson <and...@netapp.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d1f456b0b9545f1606a54cd17c20775f159bd2ce upstream.

Commit 2f60ea6b8ced ("NFSv4: The NFSv4.0 client must send RENEW calls if it holds a delegation") set the NFS4_RENEW_TIMEOUT flag in nfs4_renew_state, and does
not put an nfs41_proc_async_sequence call, the NFSv4.1 lease renewal heartbeat
call, on the wire to renew the NFSv4.1 state if the flag was not set.

The NFS4_RENEW_TIMEOUT flag is set when "now" is after the last renewal
(cl_last_renewal) plus the lease time divided by 3. This is arbitrary and
sometimes does the following:

In normal operation, the only way a future state renewal call is put on the
wire is via a call to nfs4_schedule_state_renewal, which schedules a
nfs4_renew_state workqueue task. nfs4_renew_state determines if the
NFS4_RENEW_TIMEOUT should be set, and the calls nfs41_proc_async_sequence,
which only gets sent if the NFS4_RENEW_TIMEOUT flag is set.
Then the nfs41_proc_async_sequence rpc_release function schedules
another state remewal via nfs4_schedule_state_renewal.

Without this change we can get into a state where an application stops
accessing the NFSv4.1 share, state renewal calls stop due to the
NFS4_RENEW_TIMEOUT flag _not_ being set. The only way to recover
from this situation is with a clientid re-establishment, once the application
resumes and the server has timed out the lease and so returns
NFS4ERR_BAD_SESSION on the subsequent SEQUENCE operation.

An example application:
open, lock, write a file.

sleep for 6 * lease (could be less)

ulock, close.

In the above example with NFSv4.1 delegations enabled, without this change,
there are no OP_SEQUENCE state renewal calls during the sleep, and the
clientid is recovered due to lease expiration on the close.

This issue does not occur with NFSv4.1 delegations disabled, nor with
NFSv4.0, with or without delegations enabled.

Signed-off-by: Andy Adamson <and...@netapp.com>
Link: http://lkml.kernel.org/r/1411486536-23401-1-g...@netapp.com
Fixes: 2f60ea6b8ced (NFSv4: The NFSv4.0 client must send RENEW calls...)
Signed-off-by: Trond Myklebust <trond.m...@primarydata.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/nfs/nfs4proc.c | 2 +-
fs/nfs/nfs4renewd.c | 12 ++++++++++--
2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 527a4fc..37ab4e1 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -5796,7 +5796,7 @@ static int nfs41_proc_async_sequence(struct nfs_client *clp, struct rpc_cred *cr
int ret = 0;

if ((renew_flags & NFS4_RENEW_TIMEOUT) == 0)
- return 0;
+ return -EAGAIN;
task = _nfs41_proc_sequence(clp, cred, &nfs41_sequence_ops);
if (IS_ERR(task))
ret = PTR_ERR(task);
diff --git a/fs/nfs/nfs4renewd.c b/fs/nfs/nfs4renewd.c
index dc484c0..78071cf9 100644
--- a/fs/nfs/nfs4renewd.c
+++ b/fs/nfs/nfs4renewd.c
@@ -88,10 +88,18 @@ nfs4_renew_state(struct work_struct *work)
}
nfs_expire_all_delegations(clp);
} else {
+ int ret;
+
/* Queue an asynchronous RENEW. */
- ops->sched_state_renewal(clp, cred, renew_flags);
+ ret = ops->sched_state_renewal(clp, cred, renew_flags);
put_rpccred(cred);
- goto out_exp;
+ switch (ret) {
+ default:
+ goto out_exp;
+ case -EAGAIN:
+ case -ENOMEM:
+ break;
+ }
}
} else {
dprintk("%s: failed to call renewd. Reason: lease not expired \n",

li...@kernel.org

unread,
Jan 27, 2015, 11:17:15 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Richard Larocque, Thomas Gleixner, Ingo Molnar, Richard Cochran, Prarit Bhargava, Sharvil Nanavati, John Stultz, Zefan Li
From: Richard Larocque <rlar...@google.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 474e941bed9262f5fa2394f9a4a67e24499e5926 upstream.

Locks the k_itimer's it_lock member when handling the alarm timer's
expiry callback.

The regular posix timers defined in posix-timers.c have this lock held
during timout processing because their callbacks are routed through
posix_timer_fn(). The alarm timers follow a different path, so they
ought to grab the lock somewhere else.

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Richard Cochran <richard...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Sharvil Nanavati <sha...@google.com>
Signed-off-by: Richard Larocque <rlar...@google.com>
Signed-off-by: John Stultz <john....@linaro.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/time/alarmtimer.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/time/alarmtimer.c b/kernel/time/alarmtimer.c
index c870118..3ce9262 100644
--- a/kernel/time/alarmtimer.c
+++ b/kernel/time/alarmtimer.c
@@ -454,8 +454,12 @@ static enum alarmtimer_type clock2alarm(clockid_t clockid)
static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
ktime_t now)
{
+ unsigned long flags;
struct k_itimer *ptr = container_of(alarm, struct k_itimer,
it.alarm.alarmtimer);
+ enum alarmtimer_restart result = ALARMTIMER_NORESTART;
+
+ spin_lock_irqsave(&ptr->it_lock, flags);
if ((ptr->it_sigev_notify & ~SIGEV_THREAD_ID) != SIGEV_NONE) {
if (posix_timer_event(ptr, 0) != 0)
ptr->it_overrun++;
@@ -465,9 +469,11 @@ static enum alarmtimer_restart alarm_handle_timer(struct alarm *alarm,
if (ptr->it.alarm.interval.tv64) {
ptr->it_overrun += alarm_forward(alarm, now,
ptr->it.alarm.interval);
- return ALARMTIMER_RESTART;
+ result = ALARMTIMER_RESTART;
}
- return ALARMTIMER_NORESTART;
+ spin_unlock_irqrestore(&ptr->it_lock, flags);
+
+ return result;
}

/**

li...@kernel.org

unread,
Jan 27, 2015, 11:17:19 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Joe Thornber, Mikulas Patocka, Mike Snitzer, Zefan Li
From: Joe Thornber <e...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit eb76faf53b1ff7a77ce3f78cc98ad392ac70c2a0 upstream.

The 'last_accessed' member of the dm_buffer structure was only set when
the the buffer was created. This led to each buffer being discarded
after dm_bufio_max_age time even if it was used recently. In practice
this resulted in all thinp metadata being evicted soon after being read
-- this is particularly problematic for metadata intensive workloads
like multithreaded small random IO.

'last_accessed' is now updated each time the buffer is moved to the head
of the LRU list, so the buffer is now properly discarded if it was not
used in dm_bufio_max_age time.

Signed-off-by: Joe Thornber <e...@redhat.com>
Signed-off-by: Mikulas Patocka <mpat...@redhat.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/md/dm-bufio.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 6f99500..5abc8d6 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -467,6 +467,7 @@ static void __relink_lru(struct dm_buffer *b, int dirty)
b->list_mode = dirty;
list_del(&b->lru_list);
list_add(&b->lru_list, &c->lru[dirty]);
+ b->last_accessed = jiffies;
}

/*----------------------------------------------------------------

li...@kernel.org

unread,
Jan 27, 2015, 11:17:24 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Alexey Khoroshilov, Mike Snitzer, Zefan Li
From: Alexey Khoroshilov <khoro...@ispras.ru>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 56ec16cb1e1ce46354de8511eef962a417c32c92 upstream.

If cn_add_callback() fails in dm_ulog_tfr_init(), it does not
deallocate prealloced memory but calls cn_del_callback().

Found by Linux Driver Verification project (linuxtesting.org).

Signed-off-by: Alexey Khoroshilov <khoro...@ispras.ru>
Reviewed-by: Jonathan Brassow <jbra...@redhat.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/md/dm-log-userspace-transfer.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/md/dm-log-userspace-transfer.c b/drivers/md/dm-log-userspace-transfer.c
index 08d9a20..c69d0b7 100644
--- a/drivers/md/dm-log-userspace-transfer.c
+++ b/drivers/md/dm-log-userspace-transfer.c
@@ -272,7 +272,7 @@ int dm_ulog_tfr_init(void)

r = cn_add_callback(&ulog_cn_id, "dmlogusr", cn_ulog_callback);
if (r) {
- cn_del_callback(&ulog_cn_id);
+ kfree(prealloced_cn_msg);
return r;

li...@kernel.org

unread,
Jan 27, 2015, 11:17:31 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andrew Hunter, Thomas Gleixner, Ingo Molnar, Paul Turner, Richard Cochran, Prarit Bhargava, John Stultz, Zefan Li
From: Andrew Hunter <a...@google.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d78c9300c51d6ceed9f6d078d4e9366f259de28c upstream.

timeval_to_jiffies tried to round a timeval up to an integral number
of jiffies, but the logic for doing so was incorrect: intervals
corresponding to exactly N jiffies would become N+1. This manifested
itself particularly repeatedly stopping/starting an itimer:

setitimer(ITIMER_PROF, &val, NULL);
setitimer(ITIMER_PROF, NULL, &val);

would add a full tick to val, _even if it was exactly representable in
terms of jiffies_ (say, the result of a previous rounding.) Doing
this repeatedly would cause unbounded growth in val. So fix the math.

Here's what was wrong with the conversion: we essentially computed
(eliding seconds)

jiffies = usec * (NSEC_PER_USEC/TICK_NSEC)

by using scaling arithmetic, which took the best approximation of
NSEC_PER_USEC/TICK_NSEC with denominator of 2^USEC_JIFFIE_SC =
x/(2^USEC_JIFFIE_SC), and computed:

jiffies = (usec * x) >> USEC_JIFFIE_SC

and rounded this calculation up in the intermediate form (since we
can't necessarily exactly represent TICK_NSEC in usec.) But the
scaling arithmetic is a (very slight) *over*approximation of the true
value; that is, instead of dividing by (1 usec/ 1 jiffie), we
effectively divided by (1 usec/1 jiffie)-epsilon (rounding
down). This would normally be fine, but we want to round timeouts up,
and we did so by adding 2^USEC_JIFFIE_SC - 1 before the shift; this
would be fine if our division was exact, but dividing this by the
slightly smaller factor was equivalent to adding just _over_ 1 to the
final result (instead of just _under_ 1, as desired.)

In particular, with HZ=1000, we consistently computed that 10000 usec
was 11 jiffies; the same was true for any exact multiple of
TICK_NSEC.

We could possibly still round in the intermediate form, adding
something less than 2^USEC_JIFFIE_SC - 1, but easier still is to
convert usec->nsec, round in nanoseconds, and then convert using
time*spec*_to_jiffies. This adds one constant multiplication, and is
not observably slower in microbenchmarks on recent x86 hardware.

Tested: the following program:

int main() {
struct itimerval zero = {{0, 0}, {0, 0}};
/* Initially set to 10 ms. */
struct itimerval initial = zero;
initial.it_interval.tv_usec = 10000;
setitimer(ITIMER_PROF, &initial, NULL);
/* Save and restore several times. */
for (size_t i = 0; i < 10; ++i) {
struct itimerval prev;
setitimer(ITIMER_PROF, &zero, &prev);
/* on old kernels, this goes up by TICK_USEC every iteration */
printf("previous value: %ld %ld %ld %ld\n",
prev.it_interval.tv_sec, prev.it_interval.tv_usec,
prev.it_value.tv_sec, prev.it_value.tv_usec);
setitimer(ITIMER_PROF, &prev, NULL);
}
return 0;
}

Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Ingo Molnar <mi...@redhat.com>
Cc: Paul Turner <p...@google.com>
Cc: Richard Cochran <richard...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Reviewed-by: Paul Turner <p...@google.com>
Reported-by: Aaron Jacobs <jac...@google.com>
Signed-off-by: Andrew Hunter <a...@google.com>
[jstultz: Tweaked to apply to 3.17-rc]
Signed-off-by: John Stultz <john....@linaro.org>
[lizf: Backported to 3.4: adjust filename]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/linux/jiffies.h | 12 -----------
kernel/time.c | 54 +++++++++++++++++++++++++++----------------------
2 files changed, 30 insertions(+), 36 deletions(-)

diff --git a/include/linux/jiffies.h b/include/linux/jiffies.h
index f5df3dc..f4e8578 100644
--- a/include/linux/jiffies.h
+++ b/include/linux/jiffies.h
@@ -259,23 +259,11 @@ extern unsigned long preset_lpj;
#define SEC_JIFFIE_SC (32 - SHIFT_HZ)
#endif
#define NSEC_JIFFIE_SC (SEC_JIFFIE_SC + 29)
-#define USEC_JIFFIE_SC (SEC_JIFFIE_SC + 19)
#define SEC_CONVERSION ((unsigned long)((((u64)NSEC_PER_SEC << SEC_JIFFIE_SC) +\
TICK_NSEC -1) / (u64)TICK_NSEC))

#define NSEC_CONVERSION ((unsigned long)((((u64)1 << NSEC_JIFFIE_SC) +\
TICK_NSEC -1) / (u64)TICK_NSEC))
-#define USEC_CONVERSION \
- ((unsigned long)((((u64)NSEC_PER_USEC << USEC_JIFFIE_SC) +\
- TICK_NSEC -1) / (u64)TICK_NSEC))
-/*
- * USEC_ROUND is used in the timeval to jiffie conversion. See there
- * for more details. It is the scaled resolution rounding value. Note
- * that it is a 64-bit value. Since, when it is applied, we are already
- * in jiffies (albit scaled), it is nothing but the bits we will shift
- * off.
- */
-#define USEC_ROUND (u64)(((u64)1 << USEC_JIFFIE_SC) - 1)
/*
* The maximum jiffie value is (MAX_INT >> 1). Here we translate that
* into seconds. The 64-bit case will overflow if we are not careful,
diff --git a/kernel/time.c b/kernel/time.c
index ba744cf..a095290 100644
--- a/kernel/time.c
+++ b/kernel/time.c
@@ -487,17 +487,20 @@ EXPORT_SYMBOL(usecs_to_jiffies);
* that a remainder subtract here would not do the right thing as the
* resolution values don't fall on second boundries. I.e. the line:
* nsec -= nsec % TICK_NSEC; is NOT a correct resolution rounding.
+ * Note that due to the small error in the multiplier here, this
+ * rounding is incorrect for sufficiently large values of tv_nsec, but
+ * well formed timespecs should have tv_nsec < NSEC_PER_SEC, so we're
+ * OK.
*
* Rather, we just shift the bits off the right.
*
* The >> (NSEC_JIFFIE_SC - SEC_JIFFIE_SC) converts the scaled nsec
* value to a scaled second value.
*/
-unsigned long
-timespec_to_jiffies(const struct timespec *value)
+static unsigned long
+__timespec_to_jiffies(unsigned long sec, long nsec)
{
- unsigned long sec = value->tv_sec;
- long nsec = value->tv_nsec + TICK_NSEC - 1;
+ nsec = nsec + TICK_NSEC - 1;

if (sec >= MAX_SEC_IN_JIFFIES){
sec = MAX_SEC_IN_JIFFIES;
@@ -508,6 +511,13 @@ timespec_to_jiffies(const struct timespec *value)
(NSEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;

}
+
+unsigned long
+timespec_to_jiffies(const struct timespec *value)
+{
+ return __timespec_to_jiffies(value->tv_sec, value->tv_nsec);
+}
+
EXPORT_SYMBOL(timespec_to_jiffies);

void
@@ -524,31 +534,27 @@ jiffies_to_timespec(const unsigned long jiffies, struct timespec *value)
}
EXPORT_SYMBOL(jiffies_to_timespec);

-/* Same for "timeval"
+/*
+ * We could use a similar algorithm to timespec_to_jiffies (with a
+ * different multiplier for usec instead of nsec). But this has a
+ * problem with rounding: we can't exactly add TICK_NSEC - 1 to the
+ * usec value, since it's not necessarily integral.
*
- * Well, almost. The problem here is that the real system resolution is
- * in nanoseconds and the value being converted is in micro seconds.
- * Also for some machines (those that use HZ = 1024, in-particular),
- * there is a LARGE error in the tick size in microseconds.
-
- * The solution we use is to do the rounding AFTER we convert the
- * microsecond part. Thus the USEC_ROUND, the bits to be shifted off.
- * Instruction wise, this should cost only an additional add with carry
- * instruction above the way it was done above.
+ * We could instead round in the intermediate scaled representation
+ * (i.e. in units of 1/2^(large scale) jiffies) but that's also
+ * perilous: the scaling introduces a small positive error, which
+ * combined with a division-rounding-upward (i.e. adding 2^(scale) - 1
+ * units to the intermediate before shifting) leads to accidental
+ * overflow and overestimates.
+ *
+ * At the cost of one additional multiplication by a constant, just
+ * use the timespec implementation.
*/
unsigned long
timeval_to_jiffies(const struct timeval *value)
{
- unsigned long sec = value->tv_sec;
- long usec = value->tv_usec;
-
- if (sec >= MAX_SEC_IN_JIFFIES){
- sec = MAX_SEC_IN_JIFFIES;
- usec = 0;
- }
- return (((u64)sec * SEC_CONVERSION) +
- (((u64)usec * USEC_CONVERSION + USEC_ROUND) >>
- (USEC_JIFFIE_SC - SEC_JIFFIE_SC))) >> SEC_JIFFIE_SC;
+ return __timespec_to_jiffies(value->tv_sec,
+ value->tv_usec * NSEC_PER_USEC);
}
EXPORT_SYMBOL(timeval_to_jiffies);

li...@kernel.org

unread,
Jan 27, 2015, 11:17:36 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Chao Yu, Tyler Hicks, Zefan Li
From: Chao Yu <chao...@samsung.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 35425ea2492175fd39f6116481fe98b2b3ddd4ca upstream.

Christopher Head 2014-06-28 05:26:20 UTC described:
"I tried to reproduce this on 3.12.21. Instead, when I do "echo hello > foo"
in an ecryptfs mount with ecryptfs_xattr specified, I get a kernel crash:

BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
PGD d7840067 PUD b2c3c067 PMD 0
Oops: 0002 [#1] SMP
Modules linked in: nvidia(PO)
CPU: 3 PID: 3566 Comm: bash Tainted: P O 3.12.21-gentoo-r1 #2
Hardware name: ASUSTek Computer Inc. G60JX/G60JX, BIOS 206 03/15/2010
task: ffff8801948944c0 ti: ffff8800bad70000 task.ti: ffff8800bad70000
RIP: 0010:[<ffffffff8110eb39>] [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP: 0018:ffff8800bad71c10 EFLAGS: 00010246
RAX: 00000000000181a4 RBX: ffff880198648480 RCX: 0000000000000000
RDX: 0000000000000004 RSI: ffff880172010450 RDI: 0000000000000000
RBP: ffff880198490e40 R08: 0000000000000000 R09: 0000000000000000
R10: ffff880172010450 R11: ffffea0002c51e80 R12: 0000000000002000
R13: 000000000000001a R14: 0000000000000000 R15: ffff880198490e40
FS: 00007ff224caa700(0000) GS:ffff88019fcc0000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000000bb07f000 CR4: 00000000000007e0
Stack:
ffffffff811826e8 ffff8800a39d8000 0000000000000000 000000000000001a
ffff8800a01d0000 ffff8800a39d8000 ffffffff81185fd5 ffffffff81082c2c
00000001a39d8000 53d0abbc98490e40 0000000000000037 ffff8800a39d8220
Call Trace:
[<ffffffff811826e8>] ? ecryptfs_setxattr+0x40/0x52
[<ffffffff81185fd5>] ? ecryptfs_write_metadata+0x1b3/0x223
[<ffffffff81082c2c>] ? should_resched+0x5/0x23
[<ffffffff8118322b>] ? ecryptfs_initialize_file+0xaf/0xd4
[<ffffffff81183344>] ? ecryptfs_create+0xf4/0x142
[<ffffffff810f8c0d>] ? vfs_create+0x48/0x71
[<ffffffff810f9c86>] ? do_last.isra.68+0x559/0x952
[<ffffffff810f7ce7>] ? link_path_walk+0xbd/0x458
[<ffffffff810fa2a3>] ? path_openat+0x224/0x472
[<ffffffff810fa7bd>] ? do_filp_open+0x2b/0x6f
[<ffffffff81103606>] ? __alloc_fd+0xd6/0xe7
[<ffffffff810ee6ab>] ? do_sys_open+0x65/0xe9
[<ffffffff8157d022>] ? system_call_fastpath+0x16/0x1b
RIP [<ffffffff8110eb39>] fsstack_copy_attr_all+0x2/0x61
RSP <ffff8800bad71c10>
CR2: 0000000000000000
---[ end trace df9dba5f1ddb8565 ]---"

If we create a file when we mount with ecryptfs_xattr_metadata option, we will
encounter a crash in this path:
->ecryptfs_create
->ecryptfs_initialize_file
->ecryptfs_write_metadata
->ecryptfs_write_metadata_to_xattr
->ecryptfs_setxattr
->fsstack_copy_attr_all
It's because our dentry->d_inode used in fsstack_copy_attr_all is NULL, and it
will be initialized when ecryptfs_initialize_file finish.

So we should skip copying attr from lower inode when the value of ->d_inode is
invalid.

Signed-off-by: Chao Yu <chao...@samsung.com>
Signed-off-by: Tyler Hicks <tyh...@canonical.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ecryptfs/inode.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 11030b2..b5b9b40 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -1093,7 +1093,7 @@ ecryptfs_setxattr(struct dentry *dentry, const char *name, const void *value,
}

rc = vfs_setxattr(lower_dentry, name, value, size, flags);
- if (!rc)
+ if (!rc && dentry->d_inode)
fsstack_copy_attr_all(dentry->d_inode, lower_dentry->d_inode);
out:
return rc;

li...@kernel.org

unread,
Jan 27, 2015, 11:17:42 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Cong Wang, Cong Wang, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo, Linus Torvalds, Ingo Molnar, Zefan Li
From: Cong Wang <cw...@twopensource.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 3577af70a2ce4853d58e57d832e687d739281479 upstream.

We saw a kernel soft lockup in perf_remove_from_context(),
it looks like the `perf` process, when exiting, could not go
out of the retry loop. Meanwhile, the target process was forking
a child. So either the target process should execute the smp
function call to deactive the event (if it was running) or it should
do a context switch which deactives the event.

It seems we optimize out a context switch in perf_event_context_sched_out(),
and what's more important, we still test an obsolete task pointer when
retrying, so no one actually would deactive that event in this situation.
Fix it directly by reloading the task pointer in perf_remove_from_context().

This should cure the above soft lockup.

Signed-off-by: Cong Wang <cw...@twopensource.com>
Signed-off-by: Cong Wang <xiyou.w...@gmail.com>
Signed-off-by: Peter Zijlstra <pet...@infradead.org>
Cc: Paul Mackerras <pau...@samba.org>
Cc: Arnaldo Carvalho de Melo <ac...@kernel.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Link: http://lkml.kernel.org/r/1409696840-843-1-git-se...@gmail.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/events/core.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 685ce46..c958be1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1702,6 +1702,16 @@ retry:
*/
if (ctx->is_active) {
raw_spin_unlock_irq(&ctx->lock);
+ /*
+ * Reload the task pointer, it might have been changed by
+ * a concurrent perf_event_context_sched_out().
+ */
+ task = ctx->task;
+ /*
+ * Reload the task pointer, it might have been changed by
+ * a concurrent perf_event_context_sched_out().
+ */
+ task = ctx->task;
goto retry;

li...@kernel.org

unread,
Jan 27, 2015, 11:17:46 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Ben Hutchings, Thomas Gleixner, Zefan Li
From: Ben Hutchings <b...@decadent.org.uk>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 0e6d3112a4e95d55cf6dca88f298d5f4b8f29bd1 upstream.

It is currently possible to execve() an x32 executable on an x86_64
kernel that has only ia32 compat enabled. However all its syscalls
will fail, even _exit(). This usually causes it to segfault.

Change the ELF compat architecture check so that x32 executables are
rejected if we don't support the x32 ABI.

Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
Link: http://lkml.kernel.org/r/1410120305....@decadent.org.uk
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/include/asm/elf.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 5939f44..06ec1fe 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -155,8 +155,9 @@ do { \
#define elf_check_arch(x) \
((x)->e_machine == EM_X86_64)

-#define compat_elf_check_arch(x) \
- (elf_check_arch_ia32(x) || (x)->e_machine == EM_X86_64)
+#define compat_elf_check_arch(x) \
+ (elf_check_arch_ia32(x) || \
+ (IS_ENABLED(CONFIG_X86_X32_ABI) && (x)->e_machine == EM_X86_64))

#if __USER32_DS != __USER_DS
# error "The following code assumes __USER32_DS == __USER_DS"

li...@kernel.org

unread,
Jan 27, 2015, 11:17:56 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mikulas Patocka, Al Viro, Zefan Li
From: Mikulas Patocka <mpat...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit c2ca0fcd202863b14bd041a7fece2e789926c225 upstream.

This patch makes it possible to kill a process looping in
cont_expand_zero. A process may spend a lot of time in this function, so
it is desirable to be able to kill it.

It happened to me that I wanted to copy a piece data from the disk to a
file. By mistake, I used the "seek" parameter to dd instead of "skip". Due
to the "seek" parameter, dd attempted to extend the file and became stuck
doing so - the only possibility was to reset the machine or wait many
hours until the filesystem runs out of space and cont_expand_zero fails.
We need this patch to be able to terminate the process.

Signed-off-by: Mikulas Patocka <mpat...@redhat.com>
Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/buffer.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/fs/buffer.c b/fs/buffer.c
index f235e18..104425b 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -2221,6 +2221,11 @@ static int cont_expand_zero(struct file *file, struct address_space *mapping,
err = 0;

balance_dirty_pages_ratelimited(mapping);
+
+ if (unlikely(fatal_signal_pending(current))) {
+ err = -EINTR;
+ goto out;
+ }
}

/* page covers the boundary, find the boundary offset */

li...@kernel.org

unread,
Jan 27, 2015, 11:18:03 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Al Viro, Zefan Li
From: Al Viro <vi...@zeniv.linux.org.uk>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 24dff96a37a2ca319e75a74d3929b2de22447ca6 upstream.

we used to check for "nobody else could start doing anything with
that opened file" by checking that refcount was 2 or less - one
for descriptor table and one we'd acquired in fget() on the way to
wherever we are. That was race-prone (somebody else might have
had a reference to descriptor table and do fget() just as we'd
been checking) and it had become flat-out incorrect back when
we switched to fget_light() on those codepaths - unlike fget(),
it doesn't grab an extra reference unless the descriptor table
is shared. The same change allowed a race-free check, though -
we are safe exactly when refcount is less than 2.

It was a long time ago; pre-2.6.12 for ioctl() (the codepath leading
to ppp one) and 2.6.17 for sendmsg() (netlink one). OTOH,
netlink hadn't grown that check until 3.9 and ppp used to live
in drivers/net, not drivers/net/ppp until 3.1. The bug existed
well before that, though, and the same fix used to apply in old
location of file.

Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
[lizf: Backported to 3.4: drop the change to netlink_mmap_sendmsg()]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/net/ppp/ppp_generic.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/ppp/ppp_generic.c b/drivers/net/ppp/ppp_generic.c
index 21d7151..1207bb1 100644
--- a/drivers/net/ppp/ppp_generic.c
+++ b/drivers/net/ppp/ppp_generic.c
@@ -588,7 +588,7 @@ static long ppp_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
if (file == ppp->owner)
ppp_shutdown_interface(ppp);
}
- if (atomic_long_read(&file->f_count) <= 2) {
+ if (atomic_long_read(&file->f_count) < 2) {
ppp_release(NULL, file);
err = 0;
} else

li...@kernel.org

unread,
Jan 27, 2015, 11:18:07 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mike Snitzer, Jens Axboe, Zefan Li
From: Mike Snitzer <sni...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit b8839b8c55f3fdd60dc36abcda7e0266aff7985c upstream.

The math in both blk_stack_limits() and queue_limit_alignment_offset()
assume that a block device's io_min (aka minimum_io_size) is always a
power-of-2. Fix the math such that it works for non-power-of-2 io_min.

This issue (of alignment_offset != 0) became apparent when testing
dm-thinp with a thinp blocksize that matches a RAID6 stripesize of
1280K. Commit fdfb4c8c1 ("dm thin: set minimum_io_size to pool's data
block size") unlocked the potential for alignment_offset != 0 due to
the dm-thin-pool's io_min possibly being a non-power-of-2.

Signed-off-by: Mike Snitzer <sni...@redhat.com>
Acked-by: Martin K. Petersen <martin....@oracle.com>
Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
block/blk-settings.c | 4 ++--
include/linux/blkdev.h | 5 ++---
2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/block/blk-settings.c b/block/blk-settings.c
index b74cc58..14f1d30 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -538,7 +538,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
bottom = max(b->physical_block_size, b->io_min) + alignment;

/* Verify that top and bottom intervals line up */
- if (max(top, bottom) & (min(top, bottom) - 1)) {
+ if (max(top, bottom) % min(top, bottom)) {
t->misaligned = 1;
ret = -1;
}
@@ -579,7 +579,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,

/* Find lowest common alignment_offset */
t->alignment_offset = lcm(t->alignment_offset, alignment)
- & (max(t->physical_block_size, t->io_min) - 1);
+ % max(t->physical_block_size, t->io_min);

/* Verify that new alignment_offset is on a logical block boundary */
if (t->alignment_offset & (t->logical_block_size - 1)) {
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 4d4ac24..01b7047 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1069,10 +1069,9 @@ static inline int queue_alignment_offset(struct request_queue *q)
static inline int queue_limit_alignment_offset(struct queue_limits *lim, sector_t sector)
{
unsigned int granularity = max(lim->physical_block_size, lim->io_min);
- unsigned int alignment = (sector << 9) & (granularity - 1);
+ unsigned int alignment = sector_div(sector, granularity >> 9) << 9;

- return (granularity + lim->alignment_offset - alignment)
- & (granularity - 1);
+ return (granularity + lim->alignment_offset - alignment) % granularity;
}

static inline int bdev_alignment_offset(struct block_device *bdev)

li...@kernel.org

unread,
Jan 27, 2015, 11:18:16 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Trond Myklebust, Zefan Li
From: Trond Myklebust <trond.m...@primarydata.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit cd9288ffaea4359d5cfe2b8d264911506aed26a4 upstream.

James Drew reports another bug whereby the NFS client is now sending
an OPEN_DOWNGRADE in a situation where it should really have sent a
CLOSE: the client is opening the file for O_RDWR, but then trying to
do a downgrade to O_RDONLY, which is not allowed by the NFSv4 spec.

Reported-by: James Drews <dr...@engr.wisc.edu>
Link: http://lkml.kernel.org/r/541AD7E5...@engr.wisc.edu
Fixes: aee7af356e15 (NFSv4: Fix problems with close in the presence...)
Signed-off-by: Trond Myklebust <trond.m...@primarydata.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/nfs/nfs4proc.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/fs/nfs/nfs4proc.c b/fs/nfs/nfs4proc.c
index 01afcd5..527a4fc 100644
--- a/fs/nfs/nfs4proc.c
+++ b/fs/nfs/nfs4proc.c
@@ -2063,23 +2063,23 @@ static void nfs4_close_prepare(struct rpc_task *task, void *data)
is_rdwr = test_bit(NFS_O_RDWR_STATE, &state->flags);
is_rdonly = test_bit(NFS_O_RDONLY_STATE, &state->flags);
is_wronly = test_bit(NFS_O_WRONLY_STATE, &state->flags);
- /* Calculate the current open share mode */
- calldata->arg.fmode = 0;
- if (is_rdonly || is_rdwr)
- calldata->arg.fmode |= FMODE_READ;
- if (is_wronly || is_rdwr)
- calldata->arg.fmode |= FMODE_WRITE;
/* Calculate the change in open mode */
+ calldata->arg.fmode = 0;
if (state->n_rdwr == 0) {
- if (state->n_rdonly == 0) {
- call_close |= is_rdonly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_READ;
- }
- if (state->n_wronly == 0) {
- call_close |= is_wronly || is_rdwr;
- calldata->arg.fmode &= ~FMODE_WRITE;
- }
- }
+ if (state->n_rdonly == 0)
+ call_close |= is_rdonly;
+ else if (is_rdonly)
+ calldata->arg.fmode |= FMODE_READ;
+ if (state->n_wronly == 0)
+ call_close |= is_wronly;
+ else if (is_wronly)
+ calldata->arg.fmode |= FMODE_WRITE;
+ } else if (is_rdwr)
+ calldata->arg.fmode |= FMODE_READ|FMODE_WRITE;
+
+ if (calldata->arg.fmode == 0)
+ call_close |= is_rdwr;
+
spin_unlock(&state->owner->so_lock);

if (!call_close) {

li...@kernel.org

unread,
Jan 27, 2015, 11:18:35 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mike Christie, Christoph Hellwig, James Bottomley, Zefan Li
From: Mike Christie <mich...@cs.wisc.edu>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit db9bfd64b14a3a8f1868d2164518fdeab1b26ad1 upstream.

This patches fixes a potential buffer overrun in __iscsi_conn_send_pdu.
This function is used by iscsi drivers and userspace to send iscsi PDUs/
commands. For login commands, we have a set buffer size. For all other
commands we do not support data buffers.

This was reported by Dan Carpenter here:
http://www.spinics.net/lists/linux-scsi/msg66838.html

Reported-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Mike Christie <mich...@cs.wisc.edu>
Reviewed-by: Sagi Grimberg <sa...@mellanox.com>
Signed-off-by: Christoph Hellwig <h...@lst.de>
Signed-off-by: James Bottomley <JBott...@Parallels.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/scsi/libiscsi.c | 10 ++++++++++
1 file changed, 10 insertions(+)

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 82c3fd4..1243d2f 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -718,11 +718,21 @@ __iscsi_conn_send_pdu(struct iscsi_conn *conn, struct iscsi_hdr *hdr,
return NULL;
}

+ if (data_size > ISCSI_DEF_MAX_RECV_SEG_LEN) {
+ iscsi_conn_printk(KERN_ERR, conn, "Invalid buffer len of %u for login task. Max len is %u\n", data_size, ISCSI_DEF_MAX_RECV_SEG_LEN);
+ return NULL;
+ }
+
task = conn->login_task;
} else {
if (session->state != ISCSI_STATE_LOGGED_IN)
return NULL;

+ if (data_size != 0) {
+ iscsi_conn_printk(KERN_ERR, conn, "Can not send data buffer of len %u for op 0x%x\n", data_size, opcode);
+ return NULL;
+ }
+
BUG_ON(conn->c_stage == ISCSI_CONN_INITIAL_STAGE);
BUG_ON(conn->c_stage == ISCSI_CONN_STOPPED);

li...@kernel.org

unread,
Jan 27, 2015, 11:18:48 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Eric W. Biederman, Andy Lutomirski, Zefan Li
From: "Eric W. Biederman" <ebie...@xmission.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 0d0826019e529f21c84687521d03f60cd241ca7d upstream.

Andy Lutomirski recently demonstrated that when chroot is used to set
the root path below the path for the new ``root'' passed to pivot_root
the pivot_root system call succeeds and leaks mounts.

In examining the code I see that starting with a new root that is
below the current root in the mount tree will result in a loop in the
mount tree after the mounts are detached and then reattached to one
another. Resulting in all kinds of ugliness including a leak of that
mounts involved in the leak of the mount loop.

Prevent this problem by ensuring that the new mount is reachable from
the current root of the mount tree.

[Added stable cc. Fixes CVE-2014-7970. --Andy]

Reported-by: Andy Lutomirski <lu...@amacapital.net>
Reviewed-by: Andy Lutomirski <lu...@amacapital.net>
Link: http://lkml.kernel.org/r/87bnpmi...@x220.int.ebiederm.org
Signed-off-by: "Eric W. Biederman" <ebie...@xmission.com>
Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/namespace.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/fs/namespace.c b/fs/namespace.c
index f0f2e06..f7be8d9 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -2508,6 +2508,9 @@ SYSCALL_DEFINE2(pivot_root, const char __user *, new_root,
/* make sure we can reach put_old from new_root */
if (!is_path_reachable(real_mount(old.mnt), old.dentry, &new))
goto out4;
+ /* make certain new is below the root */
+ if (!is_path_reachable(new_mnt, new.dentry, &root))
+ goto out4;
br_write_lock(vfsmount_lock);
detach_mnt(new_mnt, &parent_path);
detach_mnt(root_mnt, &root_parent);

li...@kernel.org

unread,
Jan 27, 2015, 11:18:50 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, David Jander, Marc Kleine-Budde, Zefan Li
From: David Jander <da...@protonic.nl>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit fc05b884a31dbf259cc73cc856e634ec3acbebb6 upstream.

Apparently mailboxes may contain random data at startup, causing some of them
being prepared for message reception. This causes overruns being missed or even
confusing the IRQ check for trasmitted messages, increasing the transmit
counter instead of the error counter.

This patch initializes all mailboxes after the FIFO as RX_INACTIVE.

Signed-off-by: David Jander <da...@protonic.nl>
Signed-off-by: Marc Kleine-Budde <m...@pengutronix.de>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/net/can/flexcan.c | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c
index 062bbe2..1331558 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -680,6 +680,7 @@ static int flexcan_chip_start(struct net_device *dev)
struct flexcan_regs __iomem *regs = priv->base;
int err;
u32 reg_mcr, reg_ctrl;
+ int i;

/* enable module */
flexcan_chip_enable(priv);
@@ -745,6 +746,12 @@ static int flexcan_chip_start(struct net_device *dev)
netdev_dbg(dev, "%s: writing ctrl=0x%08x", __func__, reg_ctrl);
flexcan_write(reg_ctrl, &regs->ctrl);

+ /* clear and invalidate all mailboxes first */
+ for (i = FLEXCAN_TX_BUF_ID; i < ARRAY_SIZE(regs->cantxfg); i++) {
+ flexcan_write(FLEXCAN_MB_CODE_RX_INACTIVE,
+ &regs->cantxfg[i].can_ctrl);
+ }
+
/* mark TX mailbox as INACTIVE */
flexcan_write(FLEXCAN_MB_CODE_TX_INACTIVE,
&regs->cantxfg[FLEXCAN_TX_BUF_ID].can_ctrl);

li...@kernel.org

unread,
Jan 27, 2015, 11:19:00 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Stephen Smalley, Paul Moore, Zefan Li
From: Stephen Smalley <s...@tycho.nsa.gov>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 923190d32de4428afbea5e5773be86bea60a9925 upstream.

sb_finish_set_opts() can race with inode_free_security()
when initializing inode security structures for inodes
created prior to initial policy load or by the filesystem
during ->mount(). This appears to have always been
a possible race, but commit 3dc91d4 ("SELinux: Fix possible
NULL pointer dereference in selinux_inode_permission()")
made it more evident by immediately reusing the unioned
list/rcu element of the inode security structure for call_rcu()
upon an inode_free_security(). But the underlying issue
was already present before that commit as a possible use-after-free
of isec.

Shivnandan Kumar reported the list corruption and proposed
a patch to split the list and rcu elements out of the union
as separate fields of the inode_security_struct so that setting
the rcu element would not affect the list element. However,
this would merely hide the issue and not truly fix the code.

This patch instead moves up the deletion of the list entry
prior to dropping the sbsec->isec_lock initially. Then,
if the inode is dropped subsequently, there will be no further
references to the isec.

Reported-by: Shivnandan Kumar <shivna...@samsung.com>
Signed-off-by: Stephen Smalley <s...@tycho.nsa.gov>
Signed-off-by: Paul Moore <pmo...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
security/selinux/hooks.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c
index 639e5c4..cbae6d3 100644
--- a/security/selinux/hooks.c
+++ b/security/selinux/hooks.c
@@ -436,6 +436,7 @@ next_inode:
list_entry(sbsec->isec_head.next,
struct inode_security_struct, list);
struct inode *inode = isec->inode;
+ list_del_init(&isec->list);
spin_unlock(&sbsec->isec_lock);
inode = igrab(inode);
if (inode) {
@@ -444,7 +445,6 @@ next_inode:
iput(inode);
}
spin_lock(&sbsec->isec_lock);
- list_del_init(&isec->list);
goto next_inode;
}
spin_unlock(&sbsec->isec_lock);

li...@kernel.org

unread,
Jan 27, 2015, 11:19:08 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Catalin Marinas, Darren Hart, Thomas Gleixner, Peter Zijlstra, Ingo Molnar, Paul E. McKenney, Linus Torvalds, Zefan Li
From: Catalin Marinas <catalin...@arm.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 76835b0ebf8a7fe85beb03c75121419a7dec52f0 upstream.

Commit b0c29f79ecea (futexes: Avoid taking the hb->lock if there's
nothing to wake up) changes the futex code to avoid taking a lock when
there are no waiters. This code has been subsequently fixed in commit
11d4616bd07f (futex: revert back to the explicit waiter counting code).
Both the original commit and the fix-up rely on get_futex_key_refs() to
always imply a barrier.

However, for private futexes, none of the cases in the switch statement
of get_futex_key_refs() would be hit and the function completes without
a memory barrier as required before checking the "waiters" in
futex_wake() -> hb_waiters_pending(). The consequence is a race with a
thread waiting on a futex on another CPU, allowing the waker thread to
read "waiters == 0" while the waiter thread to have read "futex_val ==
locked" (in kernel).

Without this fix, the problem (user space deadlocks) can be seen with
Android bionic's mutex implementation on an arm64 multi-cluster system.

Signed-off-by: Catalin Marinas <catalin...@arm.com>
Reported-by: Matteo Franchin <Matteo....@arm.com>
Fixes: b0c29f79ecea (futexes: Avoid taking the hb->lock if there's nothing to wake up)
Acked-by: Davidlohr Bueso <da...@stgolabs.net>
Tested-by: Mike Galbraith <umgwana...@gmail.com>
Cc: Darren Hart <dvh...@linux.intel.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Paul E. McKenney <pau...@linux.vnet.ibm.com>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/futex.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/kernel/futex.c b/kernel/futex.c
index 41dfb18..fafc54f 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -212,6 +212,8 @@ static void drop_futex_key_refs(union futex_key *key)
case FUT_OFF_MMSHARED:
mmdrop(key->private.mm);
break;
+ default:
+ smp_mb(); /* explicit MB (B) */

li...@kernel.org

unread,
Jan 27, 2015, 11:19:18 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Lutomirski, Petr Matousek, Gleb Natapov, Linus Torvalds, Zefan Li
From: Andy Lutomirski <lu...@amacapital.net>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d974baa398f34393db76be45f7d4d04fbdbb4a0a upstream.

CR4 isn't constant; at least the TSD and PCE bits can vary.

TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
like it's correct.

This adds a branch and a read from cr4 to each vm entry. Because it is
extremely likely that consecutive entries into the same vcpu will have
the same host cr4 value, this fixes up the vmcs instead of restoring cr4
after the fact. A subsequent patch will add a kernel-wide cr4 shadow,
reducing the overhead in the common case to just two memory reads and a
branch.

Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
Acked-by: Paolo Bonzini <pbon...@redhat.com>
Cc: Petr Matousek <pmat...@redhat.com>
Cc: Gleb Natapov <gl...@kernel.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[lizf: Backported to 3.4:
- adjust context
- add parameter struct vcpu_vmx *vmx to vmx_set_constant_host_state()]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kvm/vmx.c | 21 +++++++++++++++++----
1 file changed, 17 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 617b00b..443635d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -388,6 +388,7 @@ struct vcpu_vmx {
u16 fs_sel, gs_sel, ldt_sel;
int gs_ldt_reload_needed;
int fs_reload_needed;
+ unsigned long vmcs_host_cr4; /* May not match real cr4 */
} host_state;
struct {
int vm86_active;
@@ -3622,16 +3623,21 @@ static void vmx_disable_intercept_for_msr(u32 msr, bool longmode_only)
* Note that host-state that does change is set elsewhere. E.g., host-state
* that is set differently for each CPU is set in vmx_vcpu_load(), not here.
*/
-static void vmx_set_constant_host_state(void)
+static void vmx_set_constant_host_state(struct vcpu_vmx *vmx)
{
u32 low32, high32;
unsigned long tmpl;
struct desc_ptr dt;
+ unsigned long cr4;

vmcs_writel(HOST_CR0, read_cr0() | X86_CR0_TS); /* 22.2.3 */
- vmcs_writel(HOST_CR4, read_cr4()); /* 22.2.3, 22.2.5 */
vmcs_writel(HOST_CR3, read_cr3()); /* 22.2.3 FIXME: shadow tables */

+ /* Save the most likely value for this task's CR4 in the VMCS. */
+ cr4 = read_cr4();
+ vmcs_writel(HOST_CR4, cr4); /* 22.2.3, 22.2.5 */
+ vmx->host_state.vmcs_host_cr4 = cr4;
+
vmcs_write16(HOST_CS_SELECTOR, __KERNEL_CS); /* 22.2.4 */
vmcs_write16(HOST_DS_SELECTOR, __KERNEL_DS); /* 22.2.4 */
vmcs_write16(HOST_ES_SELECTOR, __KERNEL_DS); /* 22.2.4 */
@@ -3753,7 +3759,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx *vmx)

vmcs_write16(HOST_FS_SELECTOR, 0); /* 22.2.4 */
vmcs_write16(HOST_GS_SELECTOR, 0); /* 22.2.4 */
- vmx_set_constant_host_state();
+ vmx_set_constant_host_state(vmx);
#ifdef CONFIG_X86_64
rdmsrl(MSR_FS_BASE, a);
vmcs_writel(HOST_FS_BASE, a); /* 22.2.4 */
@@ -6101,6 +6107,7 @@ static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx)
static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
{
struct vcpu_vmx *vmx = to_vmx(vcpu);
+ unsigned long cr4;

if (is_guest_mode(vcpu) && !vmx->nested.nested_run_pending) {
struct vmcs12 *vmcs12 = get_vmcs12(vcpu);
@@ -6131,6 +6138,12 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu)
if (test_bit(VCPU_REGS_RIP, (unsigned long *)&vcpu->arch.regs_dirty))
vmcs_writel(GUEST_RIP, vcpu->arch.regs[VCPU_REGS_RIP]);

+ cr4 = read_cr4();
+ if (unlikely(cr4 != vmx->host_state.vmcs_host_cr4)) {
+ vmcs_writel(HOST_CR4, cr4);
+ vmx->host_state.vmcs_host_cr4 = cr4;
+ }
+
/* When single-stepping over STI and MOV SS, we must clear the
* corresponding interruptibility bits in the guest state. Otherwise
* vmentry fails as it then expects bit 14 (BS) in pending debug
@@ -6589,7 +6602,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12)
* Other fields are different per CPU, and will be set later when
* vmx_vcpu_load() is called, and when vmx_save_host_state() is called.
*/
- vmx_set_constant_host_state();
+ vmx_set_constant_host_state(vmx);

/*
* HOST_RSP is normally set correctly in vmx_vcpu_run() just before

li...@kernel.org

unread,
Jan 27, 2015, 11:19:29 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Darrick J. Wong, Theodore Ts'o, Zefan Li
From: "Darrick J. Wong" <darric...@oracle.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit a0626e75954078cfacddb00a4545dde821170bc5 upstream.

When loading extended attributes, check each entry's value offset to
make sure it doesn't collide with the entries.

Without this check it is easy to crash the kernel by mounting a
malicious FS containing a file with an EA wherein e_value_offs = 0 and
e_value_size > 0 and then deleting the EA, which corrupts the name
list.

(See the f_ea_value_crash test's FS image in e2fsprogs for an example.)

Signed-off-by: Darrick J. Wong <darric...@oracle.com>
Signed-off-by: Theodore Ts'o <ty...@mit.edu>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ext4/xattr.c | 32 ++++++++++++++++++++++++--------
1 file changed, 24 insertions(+), 8 deletions(-)

diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c
index 5743e9d..96455e6 100644
--- a/fs/ext4/xattr.c
+++ b/fs/ext4/xattr.c
@@ -144,14 +144,28 @@ ext4_listxattr(struct dentry *dentry, char *buffer, size_t size)
}

static int
-ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end)
+ext4_xattr_check_names(struct ext4_xattr_entry *entry, void *end,
+ void *value_start)
{
- while (!IS_LAST_ENTRY(entry)) {
- struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(entry);
+ struct ext4_xattr_entry *e = entry;
+
+ while (!IS_LAST_ENTRY(e)) {
+ struct ext4_xattr_entry *next = EXT4_XATTR_NEXT(e);
if ((void *)next >= end)
return -EIO;
- entry = next;
+ e = next;
}
+
+ while (!IS_LAST_ENTRY(entry)) {
+ if (entry->e_value_size != 0 &&
+ (value_start + le16_to_cpu(entry->e_value_offs) <
+ (void *)e + sizeof(__u32) ||
+ value_start + le16_to_cpu(entry->e_value_offs) +
+ le32_to_cpu(entry->e_value_size) > end))
+ return -EIO;
+ entry = EXT4_XATTR_NEXT(entry);
+ }
+
return 0;
}

@@ -161,7 +175,8 @@ ext4_xattr_check_block(struct buffer_head *bh)
if (BHDR(bh)->h_magic != cpu_to_le32(EXT4_XATTR_MAGIC) ||
BHDR(bh)->h_blocks != cpu_to_le32(1))
return -EIO;
- return ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size);
+ return ext4_xattr_check_names(BFIRST(bh), bh->b_data + bh->b_size,
+ bh->b_data);
}

static inline int
@@ -274,7 +289,7 @@ ext4_xattr_ibody_get(struct inode *inode, int name_index, const char *name,
header = IHDR(inode, raw_inode);
entry = IFIRST(header);
end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
- error = ext4_xattr_check_names(entry, end);
+ error = ext4_xattr_check_names(entry, end, entry);
if (error)
goto cleanup;
error = ext4_xattr_find_entry(&entry, name_index, name,
@@ -402,7 +417,7 @@ ext4_xattr_ibody_list(struct dentry *dentry, char *buffer, size_t buffer_size)
raw_inode = ext4_raw_inode(&iloc);
header = IHDR(inode, raw_inode);
end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
- error = ext4_xattr_check_names(IFIRST(header), end);
+ error = ext4_xattr_check_names(IFIRST(header), end, IFIRST(header));
if (error)
goto cleanup;
error = ext4_xattr_list_entries(dentry, IFIRST(header),
@@ -914,7 +929,8 @@ ext4_xattr_ibody_find(struct inode *inode, struct ext4_xattr_info *i,
is->s.here = is->s.first;
is->s.end = (void *)raw_inode + EXT4_SB(inode->i_sb)->s_inode_size;
if (ext4_test_inode_state(inode, EXT4_STATE_XATTR)) {
- error = ext4_xattr_check_names(IFIRST(header), is->s.end);
+ error = ext4_xattr_check_names(IFIRST(header), is->s.end,
+ IFIRST(header));
if (error)
return error;
/* Find the named attribute. */

li...@kernel.org

unread,
Jan 27, 2015, 11:19:36 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Wanpeng Li, David Rientjes, Prarit Bhargava, Steven Rostedt, Peter Zijlstra, Ingo Molnar, Zefan Li
From: Wanpeng Li <wanpe...@linux.intel.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 03bd4e1f7265548832a76e7919a81f3137c44fd1 upstream.

The following bug can be triggered by hot adding and removing a large number of
xen domain0's vcpus repeatedly:

BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 IP: [..] find_busiest_group
PGD 5a9d5067 PUD 13067 PMD 0
Oops: 0000 [#3] SMP
[...]
Call Trace:
load_balance
? _raw_spin_unlock_irqrestore
idle_balance
__schedule
schedule
schedule_timeout
? lock_timer_base
schedule_timeout_uninterruptible
msleep
lock_device_hotplug_sysfs
online_store
dev_attr_store
sysfs_write_file
vfs_write
SyS_write
system_call_fastpath

Last level cache shared mask is built during CPU up and the
build_sched_domain() routine takes advantage of it to setup
the sched domain CPU topology.

However, llc_shared_mask is not released during CPU disable,
which leads to an invalid sched domainCPU topology.

This patch fix it by releasing the llc_shared_mask correctly
during CPU disable.

Yasuaki also reported that this can happen on real hardware:

https://lkml.org/lkml/2014/7/22/1018

His case is here:

==
Here is an example on my system.
My system has 4 sockets and each socket has 15 cores and HT is
enabled. In this case, each core of sockes is numbered as
follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-44, 90-104
Socket#3 | 45-59, 105-119

Then llc_shared_mask of CPU#30 has 0x3fff80000001fffc0000000.

It means that last level cache of Socket#2 is shared with
CPU#30-44 and 90-104.

When hot-removing socket#2 and #3, each core of sockets is
numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89

But llc_shared_mask is not cleared. So llc_shared_mask of CPU#30
remains having 0x3fff80000001fffc0000000.

After that, when hot-adding socket#2 and #3, each core of
sockets is numbered as follows:

| CPU#
Socket#0 | 0-14 , 60-74
Socket#1 | 15-29, 75-89
Socket#2 | 30-59
Socket#3 | 90-119

Then llc_shared_mask of CPU#30 becomes
0x3fff8000fffffffc0000000. It means that last level cache of
Socket#2 is shared with CPU#30-59 and 90-104. So the mask has
the wrong value.

Signed-off-by: Wanpeng Li <wanpe...@linux.intel.com>
Tested-by: Linn Crosetto <li...@hp.com>
Reviewed-by: Borislav Petkov <b...@suse.de>
Reviewed-by: Toshi Kani <toshi...@hp.com>
Reviewed-by: Yasuaki Ishimatsu <isimatu...@jp.fujitsu.com>
Cc: David Rientjes <rien...@google.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Steven Rostedt <sros...@redhat.com>
Cc: Peter Zijlstra <pet...@infradead.org>
Link: http://lkml.kernel.org/r/1411547885-48165-1-git...@linux.intel.com
Signed-off-by: Ingo Molnar <mi...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kernel/smpboot.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index d28c595..c7dbf02 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1248,6 +1248,9 @@ static void remove_siblinginfo(int cpu)

for_each_cpu(sibling, cpu_sibling_mask(cpu))
cpumask_clear_cpu(cpu, cpu_sibling_mask(sibling));
+ for_each_cpu(sibling, cpu_llc_shared_mask(cpu))
+ cpumask_clear_cpu(cpu, cpu_llc_shared_mask(sibling));
+ cpumask_clear(cpu_llc_shared_mask(cpu));
cpumask_clear(cpu_sibling_mask(cpu));
cpumask_clear(cpu_core_mask(cpu));
c->phys_proc_id = 0;

li...@kernel.org

unread,
Jan 27, 2015, 11:19:41 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Jan Kara, Theodore Ts'o, Zefan Li
From: Jan Kara <ja...@suse.cz>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 90a8020278c1598fafd071736a0846b38510309c upstream.

->page_mkwrite() is used by filesystems to allocate blocks under a page
which is becoming writeably mmapped in some process' address space. This
allows a filesystem to return a page fault if there is not enough space
available, user exceeds quota or similar problem happens, rather than
silently discarding data later when writepage is called.

However VFS fails to call ->page_mkwrite() in all the cases where
filesystems need it when blocksize < pagesize. For example when
blocksize = 1024, pagesize = 4096 the following is problematic:
ftruncate(fd, 0);
pwrite(fd, buf, 1024, 0);
map = mmap(NULL, 1024, PROT_WRITE, MAP_SHARED, fd, 0);
map[0] = 'a'; ----> page_mkwrite() for index 0 is called
ftruncate(fd, 10000); /* or even pwrite(fd, buf, 1, 10000) */
mremap(map, 1024, 10000, 0);
map[4095] = 'a'; ----> no page_mkwrite() called

At the moment ->page_mkwrite() is called, filesystem can allocate only
one block for the page because i_size == 1024. Otherwise it would create
blocks beyond i_size which is generally undesirable. But later at
->writepage() time, we also need to store data at offset 4095 but we
don't have block allocated for it.

This patch introduces a helper function filesystems can use to have
->page_mkwrite() called at all the necessary moments.

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Theodore Ts'o <ty...@mit.edu>
[lizf: Backported to 3.4:
- adjust context
- truncate_setsize() already has an oldsize variable]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/buffer.c | 3 +++
include/linux/mm.h | 1 +
mm/truncate.c | 62 +++++++++++++++++++++++++++++++++++++++++++++++++++---
3 files changed, 63 insertions(+), 3 deletions(-)

diff --git a/fs/buffer.c b/fs/buffer.c
index 104425b..ed2dc70 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1982,6 +1982,7 @@ int generic_write_end(struct file *file, struct address_space *mapping,
struct page *page, void *fsdata)
{
struct inode *inode = mapping->host;
+ loff_t old_size = inode->i_size;
int i_size_changed = 0;

copied = block_write_end(file, mapping, pos, len, copied, page, fsdata);
@@ -2001,6 +2002,8 @@ int generic_write_end(struct file *file, struct address_space *mapping,
unlock_page(page);
page_cache_release(page);

+ if (old_size < pos)
+ pagecache_isize_extended(inode, old_size, pos);
/*
* Don't mark the inode dirty under page lock. First, it unnecessarily
* makes the holding time of page lock longer. Second, it forces lock
diff --git a/include/linux/mm.h b/include/linux/mm.h
index dbca4b2..656b4e9 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -953,6 +953,7 @@ static inline void unmap_shared_mapping_range(struct address_space *mapping,

extern void truncate_pagecache(struct inode *inode, loff_t old, loff_t new);
extern void truncate_setsize(struct inode *inode, loff_t newsize);
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to);
extern int vmtruncate(struct inode *inode, loff_t offset);
extern int vmtruncate_range(struct inode *inode, loff_t offset, loff_t end);
void truncate_pagecache_range(struct inode *inode, loff_t offset, loff_t end);
diff --git a/mm/truncate.c b/mm/truncate.c
index f38055c..708a499 100644
--- a/mm/truncate.c
+++ b/mm/truncate.c
@@ -20,6 +20,7 @@
#include <linux/buffer_head.h> /* grr. try_to_release_page,
do_invalidatepage */
#include <linux/cleancache.h>
+#include <linux/rmap.h>
#include "internal.h"


@@ -571,16 +572,71 @@ EXPORT_SYMBOL(truncate_pagecache);
*/
void truncate_setsize(struct inode *inode, loff_t newsize)
{
- loff_t oldsize;
-
- oldsize = inode->i_size;
+ loff_t oldsize = inode->i_size;
i_size_write(inode, newsize);

+ if (newsize > oldsize)
+ pagecache_isize_extended(inode, oldsize, newsize);
truncate_pagecache(inode, oldsize, newsize);
}
EXPORT_SYMBOL(truncate_setsize);

/**
+ * pagecache_isize_extended - update pagecache after extension of i_size
+ * @inode: inode for which i_size was extended
+ * @from: original inode size
+ * @to: new inode size
+ *
+ * Handle extension of inode size either caused by extending truncate or by
+ * write starting after current i_size. We mark the page straddling current
+ * i_size RO so that page_mkwrite() is called on the nearest write access to
+ * the page. This way filesystem can be sure that page_mkwrite() is called on
+ * the page before user writes to the page via mmap after the i_size has been
+ * changed.
+ *
+ * The function must be called after i_size is updated so that page fault
+ * coming after we unlock the page will already see the new i_size.
+ * The function must be called while we still hold i_mutex - this not only
+ * makes sure i_size is stable but also that userspace cannot observe new
+ * i_size value before we are prepared to store mmap writes at new inode size.
+ */
+void pagecache_isize_extended(struct inode *inode, loff_t from, loff_t to)
+{
+ int bsize = 1 << inode->i_blkbits;
+ loff_t rounded_from;
+ struct page *page;
+ pgoff_t index;
+
+ WARN_ON(!mutex_is_locked(&inode->i_mutex));
+ WARN_ON(to > inode->i_size);
+
+ if (from >= to || bsize == PAGE_CACHE_SIZE)
+ return;
+ /* Page straddling @from will not have any hole block created? */
+ rounded_from = round_up(from, bsize);
+ if (to <= rounded_from || !(rounded_from & (PAGE_CACHE_SIZE - 1)))
+ return;
+
+ index = from >> PAGE_CACHE_SHIFT;
+ page = find_lock_page(inode->i_mapping, index);
+ /* Page not cached? Nothing to do */
+ if (!page)
+ return;
+ /*
+ * See clear_page_dirty_for_io() for details why set_page_dirty()
+ * is needed.
+ */
+ if (page_mkclean(page))
+ set_page_dirty(page);
+ unlock_page(page);
+ page_cache_release(page);
+}
+EXPORT_SYMBOL(pagecache_isize_extended);
+
+/**
+ * truncate_pagecache_range - unmap and remove pagecache that is hole-punched
+ * @inode: inode
+ * @lstart: offset of beginning of hole
* vmtruncate - unmap mappings "freed" by truncate() syscall
* @inode: inode of the file used
* @newsize: file offset to start truncating

li...@kernel.org

unread,
Jan 27, 2015, 11:19:51 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Zefan Li, Peter Zijlstra, Ingo Molnar, Miao Xie, Kees Cook, Tejun Heo
From: Zefan Li <liz...@huawei.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 2ad654bc5e2b211e92f66da1d819e47d79a866f0 upstream.

When we change cpuset.memory_spread_{page,slab}, cpuset will flip
PF_SPREAD_{PAGE,SLAB} bit of tsk->flags for each task in that cpuset.
This should be done using atomic bitops, but currently we don't,
which is broken.

Tetsuo reported a hard-to-reproduce kernel crash on RHEL6, which happened
when one thread tried to clear PF_USED_MATH while at the same time another
thread tried to flip PF_SPREAD_PAGE/PF_SPREAD_SLAB. They both operate on
the same task.

Here's the full report:
https://lkml.org/lkml/2014/9/19/230

To fix this, we make PF_SPREAD_PAGE and PF_SPREAD_SLAB atomic flags.

v4:
- updated mm/slab.c. (Fengguang Wu)
- updated Documentation.

Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Miao Xie <mi...@cn.fujitsu.com>
Cc: Kees Cook <kees...@chromium.org>
Fixes: 950592f7b991 ("cpusets: update tasks' page/slab spread flags in time")
Reported-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Signed-off-by: Zefan Li <liz...@huawei.com>
Signed-off-by: Tejun Heo <t...@kernel.org>
[lizf: Backported to 3.4:
- adjust context
- check current->flags & PF_MEMPOLICY rather than current->mempolicy]
---
Documentation/cgroups/cpusets.txt | 6 +++---
include/linux/cpuset.h | 4 ++--
include/linux/sched.h | 12 ++++++++++--
kernel/cpuset.c | 9 +++++----
mm/slab.c | 4 ++--
5 files changed, 22 insertions(+), 13 deletions(-)

diff --git a/Documentation/cgroups/cpusets.txt b/Documentation/cgroups/cpusets.txt
index cefd3d8..a52a39f 100644
--- a/Documentation/cgroups/cpusets.txt
+++ b/Documentation/cgroups/cpusets.txt
@@ -345,14 +345,14 @@ the named feature on.
The implementation is simple.

Setting the flag 'cpuset.memory_spread_page' turns on a per-process flag
-PF_SPREAD_PAGE for each task that is in that cpuset or subsequently
+PFA_SPREAD_PAGE for each task that is in that cpuset or subsequently
joins that cpuset. The page allocation calls for the page cache
-is modified to perform an inline check for this PF_SPREAD_PAGE task
+is modified to perform an inline check for this PFA_SPREAD_PAGE task
flag, and if set, a call to a new routine cpuset_mem_spread_node()
returns the node to prefer for the allocation.

Similarly, setting 'cpuset.memory_spread_slab' turns on the flag
-PF_SPREAD_SLAB, and appropriately marked slab caches will allocate
+PFA_SPREAD_SLAB, and appropriately marked slab caches will allocate
pages from the node returned by cpuset_mem_spread_node().

The cpuset_mem_spread_node() routine is also simple. It uses the
diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h
index 668f66b..bb48be7 100644
--- a/include/linux/cpuset.h
+++ b/include/linux/cpuset.h
@@ -74,12 +74,12 @@ extern int cpuset_slab_spread_node(void);

static inline int cpuset_do_page_mem_spread(void)
{
- return current->flags & PF_SPREAD_PAGE;
+ return task_spread_page(current);
}

static inline int cpuset_do_slab_mem_spread(void)
{
- return current->flags & PF_SPREAD_SLAB;
+ return task_spread_slab(current);
}

extern int current_cpuset_is_being_rebound(void);
diff --git a/include/linux/sched.h b/include/linux/sched.h
index b85b719..56d8233 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1834,8 +1834,6 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
#define PF_SWAPWRITE 0x00800000 /* Allowed to write to swap */
-#define PF_SPREAD_PAGE 0x01000000 /* Spread page cache over cpuset */
-#define PF_SPREAD_SLAB 0x02000000 /* Spread some slab caches over cpuset */
#define PF_THREAD_BOUND 0x04000000 /* Thread bound to specific cpu */
#define PF_MCE_EARLY 0x08000000 /* Early kill for mce process policy */
#define PF_MEMPOLICY 0x10000000 /* Non-default NUMA mempolicy */
@@ -1868,6 +1866,8 @@ extern void thread_group_times(struct task_struct *p, cputime_t *ut, cputime_t *
#define used_math() tsk_used_math(current)

/* Per-process atomic flags. */
+#define PFA_SPREAD_PAGE 1 /* Spread page cache over cpuset */
+#define PFA_SPREAD_SLAB 2 /* Spread some slab caches over cpuset */

#define TASK_PFA_TEST(name, func) \
static inline bool task_##func(struct task_struct *p) \
@@ -1970,6 +1970,14 @@ static inline int set_cpus_allowed(struct task_struct *p, cpumask_t new_mask)
}
#endif

+TASK_PFA_TEST(SPREAD_PAGE, spread_page)
+TASK_PFA_SET(SPREAD_PAGE, spread_page)
+TASK_PFA_CLEAR(SPREAD_PAGE, spread_page)
+
+TASK_PFA_TEST(SPREAD_SLAB, spread_slab)
+TASK_PFA_SET(SPREAD_SLAB, spread_slab)
+TASK_PFA_CLEAR(SPREAD_SLAB, spread_slab)
+
/*
* Do not use outside of architecture code which knows its limitations.
*
diff --git a/kernel/cpuset.c b/kernel/cpuset.c
index 9cb82b9..7f3bde5 100644
--- a/kernel/cpuset.c
+++ b/kernel/cpuset.c
@@ -326,13 +326,14 @@ static void cpuset_update_task_spread_flag(struct cpuset *cs,
struct task_struct *tsk)
{
if (is_spread_page(cs))
- tsk->flags |= PF_SPREAD_PAGE;
+ task_set_spread_page(tsk);
else
- tsk->flags &= ~PF_SPREAD_PAGE;
+ task_clear_spread_page(tsk);
+
if (is_spread_slab(cs))
- tsk->flags |= PF_SPREAD_SLAB;
+ task_set_spread_slab(tsk);
else
- tsk->flags &= ~PF_SPREAD_SLAB;
+ task_clear_spread_slab(tsk);
}

/*
diff --git a/mm/slab.c b/mm/slab.c
index 3eb1c38..3714dd9 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3321,7 +3321,7 @@ static inline void *____cache_alloc(struct kmem_cache *cachep, gfp_t flags)

#ifdef CONFIG_NUMA
/*
- * Try allocating on another node if PF_SPREAD_SLAB|PF_MEMPOLICY.
+ * Try allocating on another node if PFA_SPREAD_SLAB|PF_MEMPOLICY.
*
* If we are in_interrupt, then process context, including cpusets and
* mempolicy, may not apply and should not be used for allocation policy.
@@ -3562,7 +3562,7 @@ __do_cache_alloc(struct kmem_cache *cache, gfp_t flags)
{
void *objp;

- if (unlikely(current->flags & (PF_SPREAD_SLAB | PF_MEMPOLICY))) {
+ if (unlikely((current->flags & PF_MEMPOLICY) || cpuset_do_slab_mem_spread())) {
objp = alternate_node_alloc(cache, flags);
if (objp)
goto out;

li...@kernel.org

unread,
Jan 27, 2015, 11:19:58 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Eric Sandeen, Theodore Ts'o, Zefan Li
From: Eric Sandeen <san...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 0ff8947fc5f700172b37cbca811a38eb9cb81e08 upstream.

Delalloc write journal reservations only reserve 1 credit,
to update the inode if necessary. However, it may happen
once in a filesystem's lifetime that a file will cross
the 2G threshold, and require the LARGE_FILE feature to
be set in the superblock as well, if it was not set already.

This overruns the transaction reservation, and can be
demonstrated simply on any ext4 filesystem without the LARGE_FILE
feature already set:

dd if=/dev/zero of=testfile bs=1 seek=2147483646 count=1 \
conv=notrunc of=testfile
sync
dd if=/dev/zero of=testfile bs=1 seek=2147483647 count=1 \
conv=notrunc of=testfile

leads to:

EXT4-fs: ext4_do_update_inode:4296: aborting transaction: error 28 in __ext4_handle_dirty_super
EXT4-fs error (device loop0) in ext4_do_update_inode:4301: error 28
EXT4-fs error (device loop0) in ext4_reserve_inode_write:4757: Readonly filesystem
EXT4-fs error (device loop0) in ext4_dirty_inode:4876: error 28
EXT4-fs error (device loop0) in ext4_da_write_end:2685: error 28

Adjust the number of credits based on whether the flag is
already set, and whether the current write may extend past the
LARGE_FILE limit.

Signed-off-by: Eric Sandeen <san...@redhat.com>
Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Reviewed-by: Andreas Dilger <adi...@dilger.ca>
[lizf: Backported to 3.4:
- adjust context
- ext4_journal_start() has no parameter type]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ext4/inode.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index c3598f1..9e9db42 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2408,6 +2408,20 @@ static int ext4_nonda_switch(struct super_block *sb)
return 0;
}

+/* We always reserve for an inode update; the superblock could be there too */
+static int ext4_da_write_credits(struct inode *inode, loff_t pos, unsigned len)
+{
+ if (likely(EXT4_HAS_RO_COMPAT_FEATURE(inode->i_sb,
+ EXT4_FEATURE_RO_COMPAT_LARGE_FILE)))
+ return 1;
+
+ if (pos + len <= 0x7fffffffULL)
+ return 1;
+
+ /* We might need to update the superblock to set LARGE_FILE */
+ return 2;
+}
+
static int ext4_da_write_begin(struct file *file, struct address_space *mapping,
loff_t pos, unsigned len, unsigned flags,
struct page **pagep, void **fsdata)
@@ -2434,7 +2448,8 @@ retry:
* to journalling the i_disksize update if writes to the end
* of file which has an already mapped buffer.
*/
- handle = ext4_journal_start(inode, 1);
+ handle = ext4_journal_start(inode,
+ ext4_da_write_credits(inode, pos, len));
if (IS_ERR(handle)) {
ret = PTR_ERR(handle);

li...@kernel.org

unread,
Jan 27, 2015, 11:19:59 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Cesar Eduardo Barros, Herbert Xu, Greg Kroah-Hartman, Zefan Li
From: Cesar Eduardo Barros <ces...@cesarb.eti.br>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit fe8c8a126806fea4465c43d62a1f9d273a572bf5 upstream.

[Only use the compiler.h portion of this patch, to get the
OPTIMIZER_HIDE_VAR() macro, which we need for other -stable patches
- gregkh]

Disabling compiler optimizations can be fragile, since a new
optimization could be added to -O0 or -Os that breaks the assumptions
the code is making.

Instead of disabling compiler optimizations, use a dummy inline assembly
(based on RELOC_HIDE) to block the problematic kinds of optimization,
while still allowing other optimizations to be applied to the code.

The dummy inline assembly is added after every OR, and has the
accumulator variable as its input and output. The compiler is forced to
assume that the dummy inline assembly could both depend on the
accumulator variable and change the accumulator variable, so it is
forced to compute the value correctly before the inline assembly, and
cannot assume anything about its value after the inline assembly.

This change should be enough to make crypto_memneq work correctly (with
data-independent timing) even if it is inlined at its call sites. That
can be done later in a followup patch.

Compile-tested on x86_64.

Signed-off-by: Cesar Eduardo Barros <ces...@cesarb.eti.br>
Acked-by: Daniel Borkmann <dbor...@redhat.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/linux/compiler-gcc.h | 3 +++
include/linux/compiler-intel.h | 7 +++++++
2 files changed, 10 insertions(+)

diff --git a/include/linux/compiler-gcc.h b/include/linux/compiler-gcc.h
index 7970e31..ea9cffe 100644
--- a/include/linux/compiler-gcc.h
+++ b/include/linux/compiler-gcc.h
@@ -37,6 +37,9 @@
__asm__ ("" : "=r"(__ptr) : "0"(ptr)); \
(typeof(ptr)) (__ptr + (off)); })

+/* Make the optimizer believe the variable can be manipulated arbitrarily. */
+#define OPTIMIZER_HIDE_VAR(var) __asm__ ("" : "=r" (var) : "0" (var))
+
#ifdef __CHECKER__
#define __must_be_array(arr) 0
#else
diff --git a/include/linux/compiler-intel.h b/include/linux/compiler-intel.h
index cba9593..1a97cac 100644
--- a/include/linux/compiler-intel.h
+++ b/include/linux/compiler-intel.h
@@ -15,6 +15,7 @@
*/
#undef barrier
#undef RELOC_HIDE
+#undef OPTIMIZER_HIDE_VAR

#define barrier() __memory_barrier()

@@ -23,6 +24,12 @@
__ptr = (unsigned long) (ptr); \
(typeof(ptr)) (__ptr + (off)); })

+/* This should act as an optimization barrier on var.
+ * Given that this compiler does not have inline assembly, a compiler barrier
+ * is the best we can do.
+ */
+#define OPTIMIZER_HIDE_VAR(var) barrier()
+
/* Intel ECC compiler doesn't support __builtin_types_compatible_p() */
#define __must_be_array(a) 0

li...@kernel.org

unread,
Jan 27, 2015, 11:20:16 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Joseph Qi, Joel Becker, Mark Fasheh, Andrew Morton, Linus Torvalds, Zefan Li
From: Joseph Qi <jose...@huawei.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 5760a97c7143c208fa3a8f8cad0ed7dd672ebd28 upstream.

There is a deadlock case which reported by Guozhonghua:
https://oss.oracle.com/pipermail/ocfs2-devel/2014-September/010079.html

This case is caused by &res->spinlock and &dlm->master_lock
misordering in different threads.

It was introduced by commit 8d400b81cc83 ("ocfs2/dlm: Clean up refmap
helpers"). Since lockres is new, it doesn't not require the
&res->spinlock. So remove it.

Fixes: 8d400b81cc83 ("ocfs2/dlm: Clean up refmap helpers")
Signed-off-by: Joseph Qi <jose...@huawei.com>
Reviewed-by: joyce.xue <xuej...@huawei.com>
Reported-by: Guozhonghua <guozh...@h3c.com>
Cc: Joel Becker <jl...@evilplan.org>
Cc: Mark Fasheh <mfa...@suse.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ocfs2/dlm/dlmmaster.c | 18 ++++++++++--------
1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/fs/ocfs2/dlm/dlmmaster.c b/fs/ocfs2/dlm/dlmmaster.c
index 005261c..dbc372e 100644
--- a/fs/ocfs2/dlm/dlmmaster.c
+++ b/fs/ocfs2/dlm/dlmmaster.c
@@ -653,12 +653,9 @@ void dlm_lockres_clear_refmap_bit(struct dlm_ctxt *dlm,
clear_bit(bit, res->refmap);
}

-
-void dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
+static void __dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
struct dlm_lock_resource *res)
{
- assert_spin_locked(&res->spinlock);
-
res->inflight_locks++;

mlog(0, "%s: res %.*s, inflight++: now %u, %ps()\n", dlm->name,
@@ -666,6 +663,13 @@ void dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
__builtin_return_address(0));
}

+void dlm_lockres_grab_inflight_ref(struct dlm_ctxt *dlm,
+ struct dlm_lock_resource *res)
+{
+ assert_spin_locked(&res->spinlock);
+ __dlm_lockres_grab_inflight_ref(dlm, res);
+}
+
void dlm_lockres_drop_inflight_ref(struct dlm_ctxt *dlm,
struct dlm_lock_resource *res)
{
@@ -855,10 +859,8 @@ lookup:
/* finally add the lockres to its hash bucket */
__dlm_insert_lockres(dlm, res);

- /* Grab inflight ref to pin the resource */
- spin_lock(&res->spinlock);
- dlm_lockres_grab_inflight_ref(dlm, res);
- spin_unlock(&res->spinlock);
+ /* since this lockres is new it doesn't not require the spinlock */
+ __dlm_lockres_grab_inflight_ref(dlm, res);

/* get an extra ref on the mle in case this is a BLOCK
* if so, the creator of the BLOCK may try to put the last

li...@kernel.org

unread,
Jan 27, 2015, 11:20:17 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Miklos Szeredi, Al Viro, Zefan Li
From: Miklos Szeredi <msze...@suse.cz>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit b928095b0a7cff7fb9fcf4c706348ceb8ab2c295 upstream.

If overwriting an empty directory with rename, then need to drop the extra
nlink.

Test prog:

#include <stdio.h>
#include <fcntl.h>
#include <err.h>
#include <sys/stat.h>

int main(void)
{
const char *test_dir1 = "test-dir1";
const char *test_dir2 = "test-dir2";
int res;
int fd;
struct stat statbuf;

res = mkdir(test_dir1, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir1);

res = mkdir(test_dir2, 0777);
if (res == -1)
err(1, "mkdir(\"%s\")", test_dir2);

fd = open(test_dir2, O_RDONLY);
if (fd == -1)
err(1, "open(\"%s\")", test_dir2);

res = rename(test_dir1, test_dir2);
if (res == -1)
err(1, "rename(\"%s\", \"%s\")", test_dir1, test_dir2);

res = fstat(fd, &statbuf);
if (res == -1)
err(1, "fstat(%i)", fd);

if (statbuf.st_nlink != 0) {
fprintf(stderr, "nlink is %lu, should be 0\n", statbuf.st_nlink);
return 1;
}

return 0;
}

Signed-off-by: Miklos Szeredi <msze...@suse.cz>
Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
mm/shmem.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/shmem.c b/mm/shmem.c
index 4bb5a80..0b77082 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1725,8 +1725,10 @@ static int shmem_rename(struct inode *old_dir, struct dentry *old_dentry, struct

if (new_dentry->d_inode) {
(void) shmem_unlink(new_dir, new_dentry);
- if (they_are_dirs)
+ if (they_are_dirs) {
+ drop_nlink(new_dentry->d_inode);
drop_nlink(old_dir);
+ }
} else if (they_are_dirs) {
drop_nlink(old_dir);
inc_nlink(new_dir);

li...@kernel.org

unread,
Jan 27, 2015, 11:20:29 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Robin Murphy, Russell King, Zefan Li
From: Robin Murphy <robin....@arm.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 5ca918e5e3f9df4634077c06585c42bc6a8d699a upstream.

The alignment fixup incorrectly decodes faulting ARM VLDn/VSTn
instructions (where the optional alignment hint is given but incorrect)
as LDR/STR, leading to register corruption. Detect these and correctly
treat them as unhandled, so that userspace gets the fault it expects.

Reported-by: Simon Hosie <simon...@arm.com>
Signed-off-by: Robin Murphy <robin....@arm.com>
Signed-off-by: Russell King <rmk+k...@arm.linux.org.uk>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/arm/mm/alignment.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/arm/mm/alignment.c b/arch/arm/mm/alignment.c
index fc000e3..17f4ea2 100644
--- a/arch/arm/mm/alignment.c
+++ b/arch/arm/mm/alignment.c
@@ -39,6 +39,7 @@
* This code is not portable to processors with late data abort handling.
*/
#define CODING_BITS(i) (i & 0x0e000000)
+#define COND_BITS(i) (i & 0xf0000000)

#define LDST_I_BIT(i) (i & (1 << 26)) /* Immediate constant */
#define LDST_P_BIT(i) (i & (1 << 24)) /* Preindex */
@@ -813,6 +814,8 @@ do_alignment(unsigned long addr, unsigned int fsr, struct pt_regs *regs)
break;

case 0x04000000: /* ldr or str immediate */
+ if (COND_BITS(instr) == 0xf0000000) /* NEON VLDn, VSTn */
+ goto bad;
offset.un = OFFSET_BITS(instr);
handler = do_alignment_ldrstr;
break;

li...@kernel.org

unread,
Jan 27, 2015, 11:20:37 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Oleg Nesterov, Eric W. Biederman, Michal Hocko, Pavel Emelyanov, Sergey Dyasly, Andrew Morton, Linus Torvalds, Zefan Li
From: Oleg Nesterov <ol...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 80628ca06c5d42929de6bc22c0a41589a834d151 upstream.

Cleanup and preparation for the next changes.

Move the "if (clone_flags & CLONE_THREAD)" code down under "if
(likely(p->pid))" and turn it into into the "else" branch. This makes the
process/thread initialization more symmetrical and removes one check.

Signed-off-by: Oleg Nesterov <ol...@redhat.com>
Cc: "Eric W. Biederman" <ebie...@xmission.com>
Cc: Michal Hocko <mho...@suse.cz>
Cc: Pavel Emelyanov <xe...@parallels.com>
Cc: Sergey Dyasly <dse...@gmail.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/fork.c | 15 +++++++--------
1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/kernel/fork.c b/kernel/fork.c
index 878dcb2..0a189c0 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1412,14 +1412,6 @@ static struct task_struct *copy_process(unsigned long clone_flags,
goto bad_fork_free_pid;
}

- if (clone_flags & CLONE_THREAD) {
- current->signal->nr_threads++;
- atomic_inc(&current->signal->live);
- atomic_inc(&current->signal->sigcnt);
- p->group_leader = current->group_leader;
- list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
- }
-
if (likely(p->pid)) {
ptrace_init_task(p, (clone_flags & CLONE_PTRACE) || trace);

@@ -1434,6 +1426,13 @@ static struct task_struct *copy_process(unsigned long clone_flags,
list_add_tail(&p->sibling, &p->real_parent->children);
list_add_tail_rcu(&p->tasks, &init_task.tasks);
__this_cpu_inc(process_counts);
+ } else {
+ current->signal->nr_threads++;
+ atomic_inc(&current->signal->live);
+ atomic_inc(&current->signal->sigcnt);
+ p->group_leader = current->group_leader;
+ list_add_tail_rcu(&p->thread_group,
+ &p->group_leader->thread_group);
}
attach_pid(p, PIDTYPE_PID, pid);
nr_threads++;

li...@kernel.org

unread,
Jan 27, 2015, 11:20:43 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Oleg Nesterov, Eric W. Biederman, Frederic Weisbecker, Mandeep Singh Baines, Ma, Xindong, Michal Hocko, Tu, Xiaobing, Andrew Morton, Linus Torvalds, Zefan Li
From: Oleg Nesterov <ol...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c upstream.

while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.

1. Unless g == current, the lockless while_each_thread() is not safe.

while_each_thread(g, t) can loop forever if g exits, next_thread()
can't reach the unhashed thread in this case. Note that this can
happen even if g is the group leader, it can exec.

2. Even if while_each_thread() itself was correct, people often use
it wrongly.

It was never safe to just take rcu_read_lock() and loop unless
you verify that pid_alive(g) == T, even the first next_thread()
can point to the already freed/reused memory.

This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.

Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().

Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node. But
we can't do this right now because this will lead to subtle behavioural
changes. For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died. Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.

So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.

Signed-off-by: Oleg Nesterov <ol...@redhat.com>
Reviewed-by: Sergey Dyasly <dse...@gmail.com>
Tested-by: Sergey Dyasly <dse...@gmail.com>
Reviewed-by: Sameer Nanda <sna...@chromium.org>
Acked-by: David Rientjes <rien...@google.com>
Cc: "Eric W. Biederman" <ebie...@xmission.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Cc: Mandeep Singh Baines <m...@chromium.org>
Cc: "Ma, Xindong" <xindo...@intel.com>
Cc: Michal Hocko <mho...@suse.cz>
Cc: "Tu, Xiaobing" <xiaob...@intel.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/linux/init_task.h | 2 ++
include/linux/sched.h | 12 ++++++++++++
kernel/exit.c | 1 +
kernel/fork.c | 7 +++++++
4 files changed, 22 insertions(+)

diff --git a/include/linux/init_task.h b/include/linux/init_task.h
index e7bafa4..b11e298 100644
--- a/include/linux/init_task.h
+++ b/include/linux/init_task.h
@@ -38,6 +38,7 @@ extern struct fs_struct init_fs;

#define INIT_SIGNALS(sig) { \
.nr_threads = 1, \
+ .thread_head = LIST_HEAD_INIT(init_task.thread_node), \
.wait_chldexit = __WAIT_QUEUE_HEAD_INITIALIZER(sig.wait_chldexit),\
.shared_pending = { \
.list = LIST_HEAD_INIT(sig.shared_pending.list), \
@@ -202,6 +203,7 @@ extern struct task_group root_task_group;
[PIDTYPE_SID] = INIT_PID_LINK(PIDTYPE_SID), \
}, \
.thread_group = LIST_HEAD_INIT(tsk.thread_group), \
+ .thread_node = LIST_HEAD_INIT(init_signals.thread_head), \
INIT_IDS \
INIT_PERF_EVENTS(tsk) \
INIT_TRACE_IRQFLAGS \
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 56d8233..d529bd9 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -534,6 +534,7 @@ struct signal_struct {
atomic_t sigcnt;
atomic_t live;
int nr_threads;
+ struct list_head thread_head;

wait_queue_head_t wait_chldexit; /* for wait4() */

@@ -1394,6 +1395,7 @@ struct task_struct {
/* PID/PID hash table linkage. */
struct pid_link pids[PIDTYPE_MAX];
struct list_head thread_group;
+ struct list_head thread_node;

struct completion *vfork_done; /* for vfork() */
int __user *set_child_tid; /* CLONE_CHILD_SETTID */
@@ -2397,6 +2399,16 @@ extern bool current_is_single_threaded(void);
#define while_each_thread(g, t) \
while ((t = next_thread(t)) != g)

+#define __for_each_thread(signal, t) \
+ list_for_each_entry_rcu(t, &(signal)->thread_head, thread_node)
+
+#define for_each_thread(p, t) \
+ __for_each_thread((p)->signal, t)
+
+/* Careful: this is a double loop, 'break' won't work as expected. */
+#define for_each_process_thread(p, t) \
+ for_each_process(p) for_each_thread(p, t)
+
static inline int get_nr_threads(struct task_struct *tsk)
{
return tsk->signal->nr_threads;
diff --git a/kernel/exit.c b/kernel/exit.c
index 3eb4dcf..38980ea 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -74,6 +74,7 @@ static void __unhash_process(struct task_struct *p, bool group_dead)
__this_cpu_dec(process_counts);
}
list_del_rcu(&p->thread_group);
+ list_del_rcu(&p->thread_node);
}

/*
diff --git a/kernel/fork.c b/kernel/fork.c
index 0a189c0..be2495f 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -1026,6 +1026,11 @@ static int copy_signal(unsigned long clone_flags, struct task_struct *tsk)
sig->nr_threads = 1;
atomic_set(&sig->live, 1);
atomic_set(&sig->sigcnt, 1);
+
+ /* list_add(thread_node, thread_head) without INIT_LIST_HEAD() */
+ sig->thread_head = (struct list_head)LIST_HEAD_INIT(tsk->thread_node);
+ tsk->thread_node = (struct list_head)LIST_HEAD_INIT(sig->thread_head);
+
init_waitqueue_head(&sig->wait_chldexit);
if (clone_flags & CLONE_NEWPID)
sig->flags |= SIGNAL_UNKILLABLE;
@@ -1433,6 +1438,8 @@ static struct task_struct *copy_process(unsigned long clone_flags,
p->group_leader = current->group_leader;
list_add_tail_rcu(&p->thread_group,
&p->group_leader->thread_group);
+ list_add_tail_rcu(&p->thread_node,
+ &p->signal->thread_head);

li...@kernel.org

unread,
Jan 27, 2015, 11:20:51 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Michal Hocko, Rafael J. Wysocki, Zefan Li
From: Michal Hocko <mho...@suse.cz>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 5695be142e203167e3cb515ef86a88424f3524eb upstream.

PM freezer relies on having all tasks frozen by the time devices are
getting frozen so that no task will touch them while they are getting
frozen. But OOM killer is allowed to kill an already frozen task in
order to handle OOM situtation. In order to protect from late wake ups
OOM killer is disabled after all tasks are frozen. This, however, still
keeps a window open when a killed task didn't manage to die by the time
freeze_processes finishes.

Reduce the race window by checking all tasks after OOM killer has been
disabled. This is still not race free completely unfortunately because
oom_killer_disable cannot stop an already ongoing OOM killer so a task
might still wake up from the fridge and get killed without
freeze_processes noticing. Full synchronization of OOM and freezer is,
however, too heavy weight for this highly unlikely case.

Introduce and check oom_kills counter which gets incremented early when
the allocator enters __alloc_pages_may_oom path and only check all the
tasks if the counter changes during the freezing attempt. The counter
is updated so early to reduce the race window since allocator checked
oom_killer_disabled which is set by PM-freezing code. A false positive
will push the PM-freezer into a slow path but that is not a big deal.

Changes since v1
- push the re-check loop out of freeze_processes into
check_frozen_processes and invert the condition to make the code more
readable as per Rafael

Fixes: f660daac474c6f (oom: thaw threads if oom killed thread is frozen before deferring)
Signed-off-by: Michal Hocko <mho...@suse.cz>
Signed-off-by: Rafael J. Wysocki <rafael.j...@intel.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/linux/oom.h | 4 ++++
kernel/power/process.c | 40 +++++++++++++++++++++++++++++++++++++++-
mm/oom_kill.c | 17 +++++++++++++++++
mm/page_alloc.c | 8 ++++++++
4 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index 3d76475..d6ed7b0 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -45,6 +45,10 @@ extern int test_set_oom_score_adj(int new_val);

extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *memcg,
const nodemask_t *nodemask, unsigned long totalpages);
+
+extern int oom_kills_count(void);
+extern void note_oom_kill(void);
+
extern int try_set_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);
extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags);

diff --git a/kernel/power/process.c b/kernel/power/process.c
index f27d0c8..286ff57 100644
--- a/kernel/power/process.c
+++ b/kernel/power/process.c
@@ -114,6 +114,28 @@ static int try_to_freeze_tasks(bool user_only)
return todo ? -EBUSY : 0;
}

+/*
+ * Returns true if all freezable tasks (except for current) are frozen already
+ */
+static bool check_frozen_processes(void)
+{
+ struct task_struct *g, *p;
+ bool ret = true;
+
+ read_lock(&tasklist_lock);
+ for_each_process_thread(g, p) {
+ if (p != current && !freezer_should_skip(p) &&
+ !frozen(p)) {
+ ret = false;
+ goto done;
+ }
+ }
+done:
+ read_unlock(&tasklist_lock);
+
+ return ret;
+}
+
/**
* freeze_processes - Signal user space processes to enter the refrigerator.
*
@@ -122,6 +144,7 @@ static int try_to_freeze_tasks(bool user_only)
int freeze_processes(void)
{
int error;
+ int oom_kills_saved;

error = __usermodehelper_disable(UMH_FREEZING);
if (error)
@@ -132,12 +155,27 @@ int freeze_processes(void)

printk("Freezing user space processes ... ");
pm_freezing = true;
+ oom_kills_saved = oom_kills_count();
error = try_to_freeze_tasks(true);
if (!error) {
- printk("done.");
__usermodehelper_set_disable_depth(UMH_DISABLED);
oom_killer_disable();
+
+ /*
+ * There might have been an OOM kill while we were
+ * freezing tasks and the killed task might be still
+ * on the way out so we have to double check for race.
+ */
+ if (oom_kills_count() != oom_kills_saved &&
+ !check_frozen_processes()) {
+ __usermodehelper_set_disable_depth(UMH_ENABLED);
+ printk("OOM in progress.");
+ error = -EBUSY;
+ goto done;
+ }
+ printk("done.");
}
+done:
printk("\n");
BUG_ON(in_atomic());

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 597ecac..cb1f046 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -435,6 +435,23 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
dump_tasks(memcg, nodemask);
}

+/*
+ * Number of OOM killer invocations (including memcg OOM killer).
+ * Primarily used by PM freezer to check for potential races with
+ * OOM killed frozen task.
+ */
+static atomic_t oom_kills = ATOMIC_INIT(0);
+
+int oom_kills_count(void)
+{
+ return atomic_read(&oom_kills);
+}
+
+void note_oom_kill(void)
+{
+ atomic_inc(&oom_kills);
+}
+
#define K(x) ((x) << (PAGE_SHIFT-10))
static void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
unsigned int points, unsigned long totalpages,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index ff0b099..2891a90 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1982,6 +1982,14 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
}

/*
+ * PM-freezer should be notified that there might be an OOM killer on
+ * its way to kill and wake somebody up. This is too early and we might
+ * end up not killing anything but false positives are acceptable.
+ * See freeze_processes.
+ */
+ note_oom_kill();
+
+ /*
* Go through the zonelist yet one more time, keep very high watermark
* here, this is only to catch a parallel oom killing, we must fail if
* we're still under heavy pressure.

li...@kernel.org

unread,
Jan 27, 2015, 11:20:59 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Huacai Chen, Binbin Zhou, John Crispin, Steven J. Hill, linux...@linux-mips.org, Fuxin Zhang, Zhangjin Wu, Ralf Baechle, Zefan Li
From: Huacai Chen <che...@lemote.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 8393c524a25609a30129e4a8975cf3b91f6c16a5 upstream.

In commit 2c8c53e28f1 (MIPS: Optimize TLB handlers for Octeon CPUs)
build_r4000_tlb_refill_handler() is modified. But it doesn't compatible
with the original code in HUGETLB case. Because there is a copy & paste
error and one line of code is missing. It is very easy to produce a bug
with LTP's hugemmap05 test.

Signed-off-by: Huacai Chen <che...@lemote.com>
Signed-off-by: Binbin Zhou <zho...@lemote.com>
Cc: John Crispin <jo...@phrozen.org>
Cc: Steven J. Hill <Steve...@imgtec.com>
Cc: linux...@linux-mips.org
Cc: Fuxin Zhang <zha...@lemote.com>
Cc: Zhangjin Wu <wuzha...@gmail.com>
Patchwork: https://patchwork.linux-mips.org/patch/7496/
Signed-off-by: Ralf Baechle <ra...@linux-mips.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/mips/mm/tlbex.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index 0bc485b..f5abdfa 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1283,6 +1283,7 @@ static void __cpuinit build_r4000_tlb_refill_handler(void)
}
#ifdef CONFIG_HUGETLB_PAGE
uasm_l_tlb_huge_update(&l, p);
+ UASM_i_LW(&p, K0, 0, K1);
build_huge_update_entries(&p, htlb_info.huge_pte, K1);
build_huge_tlb_write_entry(&p, &l, &r, K0, tlb_random,
htlb_info.restore_scratch);

li...@kernel.org

unread,
Jan 27, 2015, 11:21:01 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, David Daney, Huacai Chen, Fuxin Zhang, Zhangjin Wu, linux...@linux-mips.org, Ralf Baechle, Zefan Li
From: David Daney <david...@cavium.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 9e0f162a36914937a937358fcb45e0609ef2bfc4 upstream.

In commit 8393c524a25609 (MIPS: tlbex: Fix a missing statement for
HUGETLB), the TLB Refill handler was fixed so that non-OCTEON targets
would work properly with huge pages. The change was incorrect in that
it broke the OCTEON case.

The problem is shown here:

xxx0: df7a0000 ld k0,0(k1)
.
.
.
xxxc0: df610000 ld at,0(k1)
xxxc4: 335a0ff0 andi k0,k0,0xff0
xxxc8: e825ffcd bbit1 at,0x5,0x0
xxxcc: 003ad82d daddu k1,at,k0
.
.
.

In the non-octeon case there is a destructive test for the huge PTE
bit, and then at 0, $k0 is reloaded (that is what the 8393c524a25609
patch added).

In the octeon case, we modify k1 in the branch delay slot, but we
never need k0 again, so the new load is not needed, but since k1 is
modified, if we do the load, we load from a garbage location and then
get a nested TLB Refill, which is seen in userspace as either SIGBUS
or SIGSEGV (depending on the garbage).

The real fix is to only do this reloading if it is needed, and never
where it is harmful.

Signed-off-by: David Daney <david...@cavium.com>
Cc: Huacai Chen <che...@lemote.com>
Cc: Fuxin Zhang <zha...@lemote.com>
Cc: Zhangjin Wu <wuzha...@gmail.com>
Cc: linux...@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/8151/
Signed-off-by: Ralf Baechle <ra...@linux-mips.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/mips/mm/tlbex.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/mips/mm/tlbex.c b/arch/mips/mm/tlbex.c
index f5abdfa..6d64efe 100644
--- a/arch/mips/mm/tlbex.c
+++ b/arch/mips/mm/tlbex.c
@@ -1041,6 +1041,7 @@ static void __cpuinit build_update_entries(u32 **p, unsigned int tmp,
struct mips_huge_tlb_info {
int huge_pte;
int restore_scratch;
+ bool need_reload_pte;
};

static struct mips_huge_tlb_info __cpuinit
@@ -1055,6 +1056,7 @@ build_fast_tlb_refill_handler (u32 **p, struct uasm_label **l,

rv.huge_pte = scratch;
rv.restore_scratch = 0;
+ rv.need_reload_pte = false;

if (check_for_high_segbits) {
UASM_i_MFC0(p, tmp, C0_BADVADDR);
@@ -1247,6 +1249,7 @@ static void __cpuinit build_r4000_tlb_refill_handler(void)
} else {
htlb_info.huge_pte = K0;
htlb_info.restore_scratch = 0;
+ htlb_info.need_reload_pte = true;
vmalloc_mode = refill_noscratch;
/*
* create the plain linear handler
@@ -1283,7 +1286,8 @@ static void __cpuinit build_r4000_tlb_refill_handler(void)
}
#ifdef CONFIG_HUGETLB_PAGE
uasm_l_tlb_huge_update(&l, p);
- UASM_i_LW(&p, K0, 0, K1);
+ if (htlb_info.need_reload_pte)
+ UASM_i_LW(&p, htlb_info.huge_pte, 0, K1);

li...@kernel.org

unread,
Jan 27, 2015, 11:21:03 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Johannes Berg, Zefan Li
From: Johannes Berg <johann...@intel.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit bd8c78e78d5011d8111bc2533ee73b13a3bd6c42 upstream.

In testmode and vendor command reply/event SKBs we use the
skb cb data to store nl80211 parameters between allocation
and sending. This causes the code for CONFIG_NETLINK_MMAP
to get confused, because it takes ownership of the skb cb
data when the SKB is handed off to netlink, and it doesn't
explicitly clear it.

Clear the skb cb explicitly when we're done and before it
gets passed to netlink to avoid this issue.

Reported-by: Assaf Azulay <assaf....@intel.com>
Reported-by: David Spinadel <david.s...@intel.com>
Signed-off-by: Johannes Berg <johann...@intel.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
net/wireless/nl80211.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/net/wireless/nl80211.c b/net/wireless/nl80211.c
index add9f94..52646f9 100644
--- a/net/wireless/nl80211.c
+++ b/net/wireless/nl80211.c
@@ -5059,6 +5059,9 @@ int cfg80211_testmode_reply(struct sk_buff *skb)
void *hdr = ((void **)skb->cb)[1];
struct nlattr *data = ((void **)skb->cb)[2];

+ /* clear CB data for netlink core to own from now on */
+ memset(skb->cb, 0, sizeof(skb->cb));
+
if (WARN_ON(!rdev->testmode_info)) {
kfree_skb(skb);
return -EINVAL;
@@ -5085,6 +5088,9 @@ void cfg80211_testmode_event(struct sk_buff *skb, gfp_t gfp)
void *hdr = ((void **)skb->cb)[1];
struct nlattr *data = ((void **)skb->cb)[2];

+ /* clear CB data for netlink core to own from now on */
+ memset(skb->cb, 0, sizeof(skb->cb));
+
nla_nest_end(skb, data);
genlmsg_end(skb, hdr);
genlmsg_multicast_netns(wiphy_net(&rdev->wiphy), skb, 0,

li...@kernel.org

unread,
Jan 27, 2015, 11:21:18 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Theodore Ts'o, Zefan Li
From: Theodore Ts'o <ty...@mit.edu>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 36de928641ee48b2078d3fe9514242aaa2f92013 upstream.

If we run into some kind of error, such as ENOMEM, while calling
ext4_getblk() or ext4_dx_find_entry(), we need to make sure this error
gets propagated up to ext4_find_entry() and then to its callers. This
way, transient errors such as ENOMEM can get propagated to the VFS.
This is important so that the system calls return the appropriate
error, and also so that in the case of ext4_lookup(), we return an
error instead of a NULL inode, since that will result in a negative
dentry cache entry that will stick around long past the OOM condition
which caused a transient ENOMEM error.

Google-Bug-Id: #17142205

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
[lizf: Backported to 3.4:
- adjust context
- s/old.bh/old_bh/g
- s/new.bh/new_bh/g
- drop the changes to ext4_find_delete_entry() and ext4_cross_rename()
- add return value check for one more exr4_find_entry() in ext4_rename()]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ext4/ext4.h | 2 +-
fs/ext4/namei.c | 31 ++++++++++++++++++++++++++++---
2 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 2922486..521ba9d 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -1674,7 +1674,7 @@ ext4_group_first_block_no(struct super_block *sb, ext4_group_t group_no)
/*
* Special error return code only used by dx_probe() and its callers.
*/
-#define ERR_BAD_DX_DIR -75000
+#define ERR_BAD_DX_DIR (-(MAX_ERRNO - 1))

void ext4_get_group_no_and_offset(struct super_block *sb, ext4_fsblk_t blocknr,
ext4_group_t *blockgrpp, ext4_grpblk_t *offsetp);
diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index 54ad9a5..a2efbda 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -861,7 +861,7 @@ static struct buffer_head * ext4_find_entry (struct inode *dir,
buffer */
int num = 0;
ext4_lblk_t nblocks;
- int i, err;
+ int i, err = 0;
int namelen;

*res_dir = NULL;
@@ -886,7 +886,11 @@ static struct buffer_head * ext4_find_entry (struct inode *dir,
* return. Otherwise, fall back to doing a search the
* old fashioned way.
*/
- if (bh || (err != ERR_BAD_DX_DIR))
+ if (err == -ENOENT)
+ return NULL;
+ if (err && err != ERR_BAD_DX_DIR)
+ return ERR_PTR(err);
+ if (bh)
return bh;
dxtrace(printk(KERN_DEBUG "ext4_find_entry: dx failed, "
"falling back\n"));
@@ -917,6 +921,11 @@ restart:
}
num++;
bh = ext4_getblk(NULL, dir, b++, 0, &err);
+ if (unlikely(err)) {
+ if (ra_max == 0)
+ return ERR_PTR(err);
+ break;
+ }
bh_use[ra_max] = bh;
if (bh)
ll_rw_block(READ | REQ_META | REQ_PRIO,
@@ -1026,6 +1035,8 @@ static struct dentry *ext4_lookup(struct inode *dir, struct dentry *dentry, stru
return ERR_PTR(-ENAMETOOLONG);

bh = ext4_find_entry(dir, &dentry->d_name, &de);
+ if (IS_ERR(bh))
+ return (struct dentry *) bh;
inode = NULL;
if (bh) {
__u32 ino = le32_to_cpu(de->inode);
@@ -1063,6 +1074,8 @@ struct dentry *ext4_get_parent(struct dentry *child)
struct buffer_head *bh;

bh = ext4_find_entry(child->d_inode, &dotdot, &de);
+ if (IS_ERR(bh))
+ return (struct dentry *) bh;
if (!bh)
return ERR_PTR(-ENOENT);
ino = le32_to_cpu(de->inode);
@@ -2137,6 +2150,8 @@ static int ext4_rmdir(struct inode *dir, struct dentry *dentry)

retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de);
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
if (!bh)
goto end_rmdir;

@@ -2202,6 +2217,8 @@ static int ext4_unlink(struct inode *dir, struct dentry *dentry)

retval = -ENOENT;
bh = ext4_find_entry(dir, &dentry->d_name, &de);
+ if (IS_ERR(bh))
+ return PTR_ERR(bh);
if (!bh)
goto end_unlink;

@@ -2418,6 +2435,8 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
ext4_handle_sync(handle);

old_bh = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de);
+ if (IS_ERR(old_bh))
+ return PTR_ERR(old_bh);
/*
* Check for inode number is _not_ due to possible IO errors.
* We might rmdir the source, keep it as pwd of some process
@@ -2431,6 +2450,10 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,

new_inode = new_dentry->d_inode;
new_bh = ext4_find_entry(new_dir, &new_dentry->d_name, &new_de);
+ if (IS_ERR(new_bh)) {
+ retval = PTR_ERR(new_bh);
+ goto end_rename;
+ }
if (new_bh) {
if (!new_inode) {
brelse(new_bh);
@@ -2509,7 +2532,9 @@ static int ext4_rename(struct inode *old_dir, struct dentry *old_dentry,
struct ext4_dir_entry_2 *old_de2;

old_bh2 = ext4_find_entry(old_dir, &old_dentry->d_name, &old_de2);
- if (old_bh2) {
+ if (IS_ERR(old_bh2)) {
+ retval = PTR_ERR(old_bh2);
+ } else if (old_bh2) {
retval = ext4_delete_entry(handle, old_dir,
old_de2, old_bh2);
brelse(old_bh2);

li...@kernel.org

unread,
Jan 27, 2015, 11:21:20 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Honig, Paolo Bonzini, Zefan Li
From: Andy Honig <aho...@google.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 8b3c3104c3f4f706e99365c3e0d2aa61b95f969f upstream.

The previous patch blocked invalid writes directly when the MSR
is written. As a precaution, prevent future similar mistakes by
gracefulling handle GPs caused by writes to shared MSRs.

Signed-off-by: Andrew Honig <aho...@google.com>
[Remove parts obsoleted by Nadav's patch. - Paolo]
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
[lizf: Backported to 3.4:
- adjust context
- s/wrmsrl_safe/checking_wrmsrl/]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/include/asm/kvm_host.h | 2 +-
arch/x86/kvm/vmx.c | 7 +++++--
arch/x86/kvm/x86.c | 11 ++++++++---
3 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c1e0ef0..4f78757 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -954,7 +954,7 @@ int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu);
int kvm_cpu_get_interrupt(struct kvm_vcpu *v);

void kvm_define_shared_msr(unsigned index, u32 msr);
-void kvm_set_shared_msr(unsigned index, u64 val, u64 mask);
+int kvm_set_shared_msr(unsigned index, u64 val, u64 mask);

bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip);

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 7806435..338f478 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -2210,12 +2210,15 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, u32 msr_index, u64 data)
break;
msr = find_msr_entry(vmx, msr_index);
if (msr) {
+ u64 old_msr_data = msr->data;
msr->data = data;
if (msr - vmx->guest_msrs < vmx->save_nmsrs) {
preempt_disable();
- kvm_set_shared_msr(msr->index, msr->data,
- msr->mask);
+ ret = kvm_set_shared_msr(msr->index, msr->data,
+ msr->mask);
preempt_enable();
+ if (ret)
+ msr->data = old_msr_data;
}
break;
}
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 51cc661..318a245 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -220,19 +220,24 @@ static void kvm_shared_msr_cpu_online(void)
shared_msr_update(i, shared_msrs_global.msrs[i]);
}

-void kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
+int kvm_set_shared_msr(unsigned slot, u64 value, u64 mask)
{
struct kvm_shared_msrs *smsr = &__get_cpu_var(shared_msrs);
+ int err;

if (((value ^ smsr->values[slot].curr) & mask) == 0)
- return;
+ return 0;
smsr->values[slot].curr = value;
- wrmsrl(shared_msrs_global.msrs[slot], value);
+ err = checking_wrmsrl(shared_msrs_global.msrs[slot], value);
+ if (err)
+ return 1;
+
if (!smsr->registered) {
smsr->urn.on_user_return = kvm_on_user_return;
user_return_notifier_register(&smsr->urn);
smsr->registered = true;
}
+ return 0;
}
EXPORT_SYMBOL_GPL(kvm_set_shared_msr);

li...@kernel.org

unread,
Jan 27, 2015, 11:21:24 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Andy Honig, Paolo Bonzini, Zefan Li
From: Andy Honig <aho...@google.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 2febc839133280d5a5e8e1179c94ea674489dae2 upstream.

There's a race condition in the PIT emulation code in KVM. In
__kvm_migrate_pit_timer the pit_timer object is accessed without
synchronization. If the race condition occurs at the wrong time this
can crash the host kernel.

This fixes CVE-2014-3611.

Signed-off-by: Andrew Honig <aho...@google.com>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kvm/i8254.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d68f99d..db336f9 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -263,8 +263,10 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
return;

timer = &pit->pit_state.pit_timer.timer;
+ mutex_lock(&pit->pit_state.lock);
if (hrtimer_cancel(timer))
hrtimer_start_expires(timer, HRTIMER_MODE_ABS);
+ mutex_unlock(&pit->pit_state.lock);
}

static void destroy_pit_timer(struct kvm_pit *pit)

li...@kernel.org

unread,
Jan 27, 2015, 11:21:30 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Nadav Amit, Paolo Bonzini, Zefan Li
From: Nadav Amit <na...@cs.technion.ac.il>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 05c83ec9b73c8124555b706f6af777b10adf0862 upstream.

Relative jumps and calls do the masking according to the operand size, and not
according to the address size as the KVM emulator does today.

This patch fixes KVM behavior.

Signed-off-by: Nadav Amit <na...@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kvm/emulate.c | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 8375622..fa23346 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -459,11 +459,6 @@ register_address_increment(struct x86_emulate_ctxt *ctxt, unsigned long *reg, in
*reg = (*reg & ~ad_mask(ctxt)) | ((*reg + inc) & ad_mask(ctxt));
}

-static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
-{
- register_address_increment(ctxt, &ctxt->_eip, rel);
-}
-
static u32 desc_limit_scaled(struct desc_struct *desc)
{
u32 limit = get_desc_limit(desc);
@@ -537,6 +532,28 @@ static int emulate_nm(struct x86_emulate_ctxt *ctxt)
return emulate_exception(ctxt, NM_VECTOR, 0, false);
}

+static inline void assign_eip_near(struct x86_emulate_ctxt *ctxt, ulong dst)
+{
+ switch (ctxt->op_bytes) {
+ case 2:
+ ctxt->_eip = (u16)dst;
+ break;
+ case 4:
+ ctxt->_eip = (u32)dst;
+ break;
+ case 8:
+ ctxt->_eip = dst;
+ break;
+ default:
+ WARN(1, "unsupported eip assignment size\n");
+ }
+}
+
+static inline void jmp_rel(struct x86_emulate_ctxt *ctxt, int rel)
+{
+ assign_eip_near(ctxt, ctxt->_eip + rel);
+}
+
static u16 get_segment_selector(struct x86_emulate_ctxt *ctxt, unsigned seg)
{
u16 selector;

li...@kernel.org

unread,
Jan 27, 2015, 11:21:38 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Eric W. Biederman, Francis Moreau, Zefan Li
From: "Eric W. Biederman" <ebie...@xmission.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit a6138db815df5ee542d848318e5dae681590fccd upstream.

Kenton Varda <ken...@sandstorm.io> discovered that by remounting a
read-only bind mount read-only in a user namespace the
MNT_LOCK_READONLY bit would be cleared, allowing an unprivileged user
to the remount a read-only mount read-write.

Correct this by replacing the mask of mount flags to preserve
with a mask of mount flags that may be changed, and preserve
all others. This ensures that any future bugs with this mask and
remount will fail in an easy to detect way where new mount flags
simply won't change.

Acked-by: Serge E. Hallyn <serge....@ubuntu.com>
Signed-off-by: "Eric W. Biederman" <ebie...@xmission.com>
Cc: Francis Moreau <franci...@gmail.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/namespace.c | 2 +-
include/linux/mount.h | 4 +++-
2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/namespace.c b/fs/namespace.c
index 2985879..f0f2e06 100644
--- a/fs/namespace.c
+++ b/fs/namespace.c
@@ -1681,7 +1681,7 @@ static int do_remount(struct path *path, int flags, int mnt_flags,
err = do_remount_sb(sb, flags, data, 0);
if (!err) {
br_write_lock(vfsmount_lock);
- mnt_flags |= mnt->mnt.mnt_flags & MNT_PROPAGATION_MASK;
+ mnt_flags |= mnt->mnt.mnt_flags & ~MNT_USER_SETTABLE_MASK;
mnt->mnt.mnt_flags = mnt_flags;
br_write_unlock(vfsmount_lock);
}
diff --git a/include/linux/mount.h b/include/linux/mount.h
index d7029f4..2044aac 100644
--- a/include/linux/mount.h
+++ b/include/linux/mount.h
@@ -42,7 +42,9 @@ struct mnt_namespace;
* flag, consider how it interacts with shared mounts.
*/
#define MNT_SHARED_MASK (MNT_UNBINDABLE)
-#define MNT_PROPAGATION_MASK (MNT_SHARED | MNT_UNBINDABLE)
+#define MNT_USER_SETTABLE_MASK (MNT_NOSUID | MNT_NODEV | MNT_NOEXEC \
+ | MNT_NOATIME | MNT_NODIRATIME | MNT_RELATIME \
+ | MNT_READONLY)


#define MNT_INTERNAL 0x4000

li...@kernel.org

unread,
Jan 27, 2015, 11:21:46 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mikulas Patocka, Mike Snitzer, Zefan Li
From: Mikulas Patocka <mpat...@redhat.com>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d49ec52ff6ddcda178fc2476a109cf1bd1fa19ed upstream.

The DM crypt target accesses memory beyond allocated space resulting in
a crash on 32 bit x86 systems.

This bug is very old (it dates back to 2.6.25 commit 3a7f6c990ad04 "dm
crypt: use async crypto"). However, this bug was masked by the fact
that kmalloc rounds the size up to the next power of two. This bug
wasn't exposed until 3.17-rc1 commit 298a9fa08a ("dm crypt: use per-bio
data"). By switching to using per-bio data there was no longer any
padding beyond the end of a dm-crypt allocated memory block.

To minimize allocation overhead dm-crypt puts several structures into one
block allocated with kmalloc. The block holds struct ablkcipher_request,
cipher-specific scratch pad (crypto_ablkcipher_reqsize(any_tfm(cc))),
struct dm_crypt_request and an initialization vector.

The variable dmreq_start is set to offset of struct dm_crypt_request
within this memory block. dm-crypt allocates the block with this size:
cc->dmreq_start + sizeof(struct dm_crypt_request) + cc->iv_size.

When accessing the initialization vector, dm-crypt uses the function
iv_of_dmreq, which performs this calculation: ALIGN((unsigned long)(dmreq
+ 1), crypto_ablkcipher_alignmask(any_tfm(cc)) + 1).

dm-crypt allocated "cc->iv_size" bytes beyond the end of dm_crypt_request
structure. However, when dm-crypt accesses the initialization vector, it
takes a pointer to the end of dm_crypt_request, aligns it, and then uses
it as the initialization vector. If the end of dm_crypt_request is not
aligned on a crypto_ablkcipher_alignmask(any_tfm(cc)) boundary the
alignment causes the initialization vector to point beyond the allocated
space.

Fix this bug by calculating the variable iv_size_padding and adding it
to the allocated size.

Also correct the alignment of dm_crypt_request. struct dm_crypt_request
is specific to dm-crypt (it isn't used by the crypto subsystem at all),
so it is aligned on __alignof__(struct dm_crypt_request).

Also align per_bio_data_size on ARCH_KMALLOC_MINALIGN, so that it is
aligned as if the block was allocated with kmalloc.

Reported-by: Krzysztof Kolasa <kko...@winsoft.pl>
Tested-by: Milan Broz <gmaz...@gmail.com>
Signed-off-by: Mikulas Patocka <mpat...@redhat.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>
[lizf: Backported by Mikulas]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/md/dm-crypt.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 535c3e2..926989d 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1566,6 +1566,7 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
unsigned int key_size, opt_params;
unsigned long long tmpll;
int ret;
+ size_t iv_size_padding;
struct dm_arg_set as;
const char *opt_string;
char dummy;
@@ -1602,12 +1603,23 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)

cc->dmreq_start = sizeof(struct ablkcipher_request);
cc->dmreq_start += crypto_ablkcipher_reqsize(any_tfm(cc));
- cc->dmreq_start = ALIGN(cc->dmreq_start, crypto_tfm_ctx_alignment());
- cc->dmreq_start += crypto_ablkcipher_alignmask(any_tfm(cc)) &
- ~(crypto_tfm_ctx_alignment() - 1);
+ cc->dmreq_start = ALIGN(cc->dmreq_start, __alignof__(struct dm_crypt_request));
+
+ if (crypto_ablkcipher_alignmask(any_tfm(cc)) < CRYPTO_MINALIGN) {
+ /* Allocate the padding exactly */
+ iv_size_padding = -(cc->dmreq_start + sizeof(struct dm_crypt_request))
+ & crypto_ablkcipher_alignmask(any_tfm(cc));
+ } else {
+ /*
+ * If the cipher requires greater alignment than kmalloc
+ * alignment, we don't know the exact position of the
+ * initialization vector. We must assume worst case.
+ */
+ iv_size_padding = crypto_ablkcipher_alignmask(any_tfm(cc));
+ }

cc->req_pool = mempool_create_kmalloc_pool(MIN_IOS, cc->dmreq_start +
- sizeof(struct dm_crypt_request) + cc->iv_size);
+ sizeof(struct dm_crypt_request) + iv_size_padding + cc->iv_size);
if (!cc->req_pool) {
ti->error = "Cannot allocate crypt request mempool";
goto bad;

li...@kernel.org

unread,
Jan 27, 2015, 11:21:51 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Nadav Amit, Paolo Bonzini, Zefan Li
From: Nadav Amit <na...@cs.technion.ac.il>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d1442d85cc30ea75f7d399474ca738e0bc96f715 upstream.

Far jmp/call/ret may fault while loading a new RIP. Currently KVM does not
handle this case, and may result in failed vm-entry once the assignment is
done. The tricky part of doing so is that loading the new CS affects the
VMCS/VMCB state, so if we fail during loading the new RIP, we are left in
unconsistent state. Therefore, this patch saves on 64-bit the old CS
descriptor and restores it if loading RIP failed.

This fixes CVE-2014-3647.

Signed-off-by: Nadav Amit <na...@cs.technion.ac.il>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
[lizf: Backported to 3.4:
- adjust context
- __load_segment_descriptor() doesn't take in_task_switch parameter]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kvm/emulate.c | 118 ++++++++++++++++++++++++++++++++++++-------------
1 file changed, 88 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1089e43..e125c3e 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1252,7 +1252,9 @@ static int write_segment_descriptor(struct x86_emulate_ctxt *ctxt,

/* Does not support long mode */
static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
- u16 selector, int seg, u8 cpl)
+ u16 selector, int seg, u8 cpl,
+ bool in_task_switch,
+ struct desc_struct *desc)
{
struct desc_struct seg_desc;
u8 dpl, rpl;
@@ -1362,6 +1364,8 @@ static int __load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
}
load:
ctxt->ops->set_segment(ctxt, selector, &seg_desc, 0, seg);
+ if (desc)
+ *desc = seg_desc;
return X86EMUL_CONTINUE;
exception:
emulate_exception(ctxt, err_vec, err_code, true);
@@ -1372,7 +1376,7 @@ static int load_segment_descriptor(struct x86_emulate_ctxt *ctxt,
u16 selector, int seg)
{
u8 cpl = ctxt->ops->cpl(ctxt);
- return __load_segment_descriptor(ctxt, selector, seg, cpl);
+ return __load_segment_descriptor(ctxt, selector, seg, cpl, false, NULL);
}

static void write_register_operand(struct operand *op)
@@ -1714,17 +1718,31 @@ static int em_iret(struct x86_emulate_ctxt *ctxt)
static int em_jmp_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned short sel;
+ unsigned short sel, old_sel;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ u8 cpl = ctxt->ops->cpl(ctxt);
+
+ /* Assignment of RIP may only fail in 64-bit mode */
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_sel, &old_desc, NULL,
+ VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);

- rc = load_segment_descriptor(ctxt, sel, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
+ &new_desc);
if (rc != X86EMUL_CONTINUE)
return rc;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
- return X86EMUL_CONTINUE;
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ /* assigning eip failed; restore the old cs */
+ ops->set_segment(ctxt, old_sel, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+ }
+ return rc;
}

static int em_grp2(struct x86_emulate_ctxt *ctxt)
@@ -1871,17 +1889,30 @@ static int em_ret(struct x86_emulate_ctxt *ctxt)
static int em_ret_far(struct x86_emulate_ctxt *ctxt)
{
int rc;
- unsigned long cs;
+ unsigned long eip, cs;
+ u16 old_cs;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL,
+ VCPU_SREG_CS);

- rc = emulate_pop(ctxt, &ctxt->_eip, ctxt->op_bytes);
+ rc = emulate_pop(ctxt, &eip, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
- if (ctxt->op_bytes == 4)
- ctxt->_eip = (u32)ctxt->_eip;
rc = emulate_pop(ctxt, &cs, ctxt->op_bytes);
if (rc != X86EMUL_CONTINUE)
return rc;
- rc = load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS);
+ rc = __load_segment_descriptor(ctxt, (u16)cs, VCPU_SREG_CS, 0, false,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
+ return rc;
+ rc = assign_eip_far(ctxt, eip, new_desc.l);
+ if (rc != X86EMUL_CONTINUE) {
+ WARN_ON(!ctxt->mode != X86EMUL_MODE_PROT64);
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ }
return rc;
}

@@ -2296,19 +2327,24 @@ static int load_state_from_tss16(struct x86_emulate_ctxt *ctxt,
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt, VCPU_SREG_LDTR, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2434,25 +2470,32 @@ static int load_state_from_tss32(struct x86_emulate_ctxt *ctxt,
* Now load segment descriptors. If fault happenes at this stage
* it is handled in a context of new task
*/
- ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ldt_selector, VCPU_SREG_LDTR,
+ cpl, true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->es, VCPU_SREG_ES, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->cs, VCPU_SREG_CS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ss, VCPU_SREG_SS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->ds, VCPU_SREG_DS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->fs, VCPU_SREG_FS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;
- ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl);
+ ret = __load_segment_descriptor(ctxt, tss->gs, VCPU_SREG_GS, cpl,
+ true, NULL);
if (ret != X86EMUL_CONTINUE)
return ret;

@@ -2689,24 +2732,39 @@ static int em_call_far(struct x86_emulate_ctxt *ctxt)
u16 sel, old_cs;
ulong old_eip;
int rc;
+ struct desc_struct old_desc, new_desc;
+ const struct x86_emulate_ops *ops = ctxt->ops;
+ int cpl = ctxt->ops->cpl(ctxt);

- old_cs = get_segment_selector(ctxt, VCPU_SREG_CS);
old_eip = ctxt->_eip;
+ ops->get_segment(ctxt, &old_cs, &old_desc, NULL, VCPU_SREG_CS);

memcpy(&sel, ctxt->src.valptr + ctxt->op_bytes, 2);
- if (load_segment_descriptor(ctxt, sel, VCPU_SREG_CS))
+ rc = __load_segment_descriptor(ctxt, sel, VCPU_SREG_CS, cpl, false,
+ &new_desc);
+ if (rc != X86EMUL_CONTINUE)
return X86EMUL_CONTINUE;

- ctxt->_eip = 0;
- memcpy(&ctxt->_eip, ctxt->src.valptr, ctxt->op_bytes);
+ rc = assign_eip_far(ctxt, ctxt->src.val, new_desc.l);
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;

ctxt->src.val = old_cs;
rc = em_push(ctxt);
if (rc != X86EMUL_CONTINUE)
- return rc;
+ goto fail;

ctxt->src.val = old_eip;
- return em_push(ctxt);
+ rc = em_push(ctxt);
+ /* If we failed, we tainted the memory, but the very least we should
+ restore cs */
+ if (rc != X86EMUL_CONTINUE)
+ goto fail;
+ return rc;
+fail:
+ ops->set_segment(ctxt, old_cs, &old_desc, 0, VCPU_SREG_CS);
+ return rc;
+
}

static int em_ret_near_imm(struct x86_emulate_ctxt *ctxt)

li...@kernel.org

unread,
Jan 27, 2015, 11:22:06 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Michael S. Tsirkin, Paolo Bonzini, Zefan Li
From: "Michael S. Tsirkin" <m...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 2bc19dc3754fc066c43799659f0d848631c44cfe upstream.

KVM_EXIT_UNKNOWN is a kvm bug, we don't really know whether it was
triggered by a priveledged application. Let's not kill the guest: WARN
and inject #UD instead.

Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/x86/kvm/svm.c | 6 +++---
arch/x86/kvm/vmx.c | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 1cd908c..86c74c0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -3494,9 +3494,9 @@ static int handle_exit(struct kvm_vcpu *vcpu)

if (exit_code >= ARRAY_SIZE(svm_exit_handlers)
|| !svm_exit_handlers[exit_code]) {
- kvm_run->exit_reason = KVM_EXIT_UNKNOWN;
- kvm_run->hw.hardware_exit_reason = exit_code;
- return 0;
+ WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_code);
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
}

return svm_exit_handlers[exit_code](svm);
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 43c39db..2eb4e5a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5936,10 +5936,10 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
&& kvm_vmx_exit_handlers[exit_reason])
return kvm_vmx_exit_handlers[exit_reason](vcpu);
else {
- vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
- vcpu->run->hw.hardware_exit_reason = exit_reason;
+ WARN_ONCE(1, "vmx: unexpected exit reason 0x%x\n", exit_reason);
+ kvm_queue_exception(vcpu, UD_VECTOR);
+ return 1;
}
- return 0;
}

static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)

li...@kernel.org

unread,
Jan 27, 2015, 11:22:11 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Quentin Casasnovas, Vegard Nossum, Jamie Iles, Paolo Bonzini, Zefan Li
From: Quentin Casasnovas <quentin.c...@oracle.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 3d32e4dbe71374a6780eaf51d719d76f9a9bf22f upstream.

The third parameter of kvm_unpin_pages() when called from
kvm_iommu_map_pages() is wrong, it should be the number of pages to un-pin
and not the page size.

This error was facilitated with an inconsistent API: kvm_pin_pages() takes
a size, but kvn_unpin_pages() takes a number of pages, so fix the problem
by matching the two.

This was introduced by commit 350b8bd ("kvm: iommu: fix the third parameter
of kvm_iommu_put_pages (CVE-2014-3601)"), which fixes the lack of
un-pinning for pages intended to be un-pinned (i.e. memory leak) but
unfortunately potentially aggravated the number of pages we un-pin that
should have stayed pinned. As far as I understand though, the same
practical mitigations apply.

This issue was found during review of Red Hat 6.6 patches to prepare
Ksplice rebootless updates.

Thanks to Vegard for his time on a late Friday evening to help me in
understanding this code.

Fixes: 350b8bd ("kvm: iommu: fix the third parameter of... (CVE-2014-3601)")
Signed-off-by: Quentin Casasnovas <quentin.c...@oracle.com>
Signed-off-by: Vegard Nossum <vegard...@oracle.com>
Signed-off-by: Jamie Iles <jamie...@oracle.com>
Reviewed-by: Sasha Levin <sasha...@oracle.com>
Signed-off-by: Paolo Bonzini <pbon...@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
virt/kvm/iommu.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/virt/kvm/iommu.c b/virt/kvm/iommu.c
index defc9ba..3225903 100644
--- a/virt/kvm/iommu.c
+++ b/virt/kvm/iommu.c
@@ -43,13 +43,13 @@ static void kvm_iommu_put_pages(struct kvm *kvm,
gfn_t base_gfn, unsigned long npages);

static pfn_t kvm_pin_pages(struct kvm *kvm, struct kvm_memory_slot *slot,
- gfn_t gfn, unsigned long size)
+ gfn_t gfn, unsigned long npages)
{
gfn_t end_gfn;
pfn_t pfn;

pfn = gfn_to_pfn_memslot(kvm, slot, gfn);
- end_gfn = gfn + (size >> PAGE_SHIFT);
+ end_gfn = gfn + npages;
gfn += 1;

if (is_error_pfn(pfn))
@@ -117,7 +117,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
* Pin all pages we are about to map in memory. This is
* important because we unmap and unpin in 4kb steps later.
*/
- pfn = kvm_pin_pages(kvm, slot, gfn, page_size);
+ pfn = kvm_pin_pages(kvm, slot, gfn, page_size >> PAGE_SHIFT);
if (is_error_pfn(pfn)) {
gfn += 1;
continue;
@@ -129,7 +129,7 @@ int kvm_iommu_map_pages(struct kvm *kvm, struct kvm_memory_slot *slot)
if (r) {
printk(KERN_ERR "kvm_iommu_map_address:"
"iommu failed to map pfn=%llx\n", pfn);
- kvm_unpin_pages(kvm, pfn, page_size);
+ kvm_unpin_pages(kvm, pfn, page_size >> PAGE_SHIFT);
goto unmap_pages;

li...@kernel.org

unread,
Jan 27, 2015, 11:22:20 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Hannes Frederic Sowa, David S. Miller, Ben Hutchings, Zefan Li
From: Hannes Frederic Sowa <han...@stressinduktion.org>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 916e4cf46d0204806c062c8c6c4d1f633852c5b6 upstream.

Currently we generate a new fragmentation id on UFO segmentation. It
is pretty hairy to identify the correct net namespace and dst there.
Especially tunnels use IFF_XMIT_DST_RELEASE and thus have no skb_dst
available at all.

This causes unreliable or very predictable ipv6 fragmentation id
generation while segmentation.

Luckily we already have pregenerated the ip6_frag_id in
ip6_ufo_append_data and can use it here.

Signed-off-by: Hannes Frederic Sowa <han...@stressinduktion.org>
Signed-off-by: David S. Miller <da...@davemloft.net>
[bwh: Backported to 3.2: adjust filename, indentation]
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
net/ipv6/udp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 98fd738..ef9052f 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -1366,7 +1366,7 @@ static struct sk_buff *udp6_ufo_fragment(struct sk_buff *skb,
fptr = (struct frag_hdr *)(skb_network_header(skb) + unfrag_ip6hlen);
fptr->nexthdr = nexthdr;
fptr->reserved = 0;
- ipv6_select_ident(fptr, (struct rt6_info *)skb_dst(skb));
+ fptr->identification = skb_shinfo(skb)->ip6_frag_id;

/* Fragment the skb. ipv6 header and the remaining fields of the
* fragment header are updated in ipv6_gso_segment()

li...@kernel.org

unread,
Jan 27, 2015, 11:22:27 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Benjamin Poirier, Tom Herbert, David S. Miller, Zefan Li
From: Benjamin Poirier <bpoi...@suse.de>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit cdb3f4a31b64c3a1c6eef40bc01ebc9594c58a8c upstream.

There are many cases where this feature does not improve performance or even
reduces it.

For example, here are the results from tests that I've run using 3.12.6 on one
Intel Xeon W3565 and one i7 920 connected by ixgbe adapters. The results are
from the Xeon, but they're similar on the i7. All numbers report the
mean±stddev over 10 runs of 10s.

1) latency tests similar to what is described in "c6e1a0d net: Allow no-cache
copy from user on transmit"
There is no statistically significant difference between tx-nocache-copy
on/off.
nic irqs spread out (one queue per cpu)

200x netperf -r 1400,1
tx-nocache-copy off
692000±1000 tps
50/90/95/99% latency (us): 275±2/643.8±0.4/799±1/2474.4±0.3
tx-nocache-copy on
693000±1000 tps
50/90/95/99% latency (us): 274±1/644.1±0.7/800±2/2474.5±0.7

200x netperf -r 14000,14000
tx-nocache-copy off
86450±80 tps
50/90/95/99% latency (us): 334.37±0.02/838±1/2100±20/3990±40
tx-nocache-copy on
86110±60 tps
50/90/95/99% latency (us): 334.28±0.01/837±2/2110±20/3990±20

2) single stream throughput tests
tx-nocache-copy leads to higher service demand

throughput cpu0 cpu1 demand
(Gb/s) (Gcycle) (Gcycle) (cycle/B)

nic irqs and netperf on cpu0 (1x netperf -T0,0 -t omni -- -d send)

tx-nocache-copy off 9402±5 9.4±0.2 0.80±0.01
tx-nocache-copy on 9403±3 9.85±0.04 0.838±0.004

nic irqs on cpu0, netperf on cpu1 (1x netperf -T1,1 -t omni -- -d send)

tx-nocache-copy off 9401±5 5.83±0.03 5.0±0.1 0.923±0.007
tx-nocache-copy on 9404±2 5.74±0.03 5.523±0.009 0.958±0.002

As a second example, here are some results from Eric Dumazet with latest
net-next.
tx-nocache-copy also leads to higher service demand

(cpu is Intel(R) Xeon(R) CPU X5660 @ 2.80GHz)

lpq83:~# ./ethtool -K eth0 tx-nocache-copy on
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 16384 10.00 9407.44 2.50 -1.00 0.522 -1.000

Performance counter stats for './netperf -H lpq84 -c':

4282.648396 task-clock # 0.423 CPUs utilized
9,348 context-switches # 0.002 M/sec
88 CPU-migrations # 0.021 K/sec
355 page-faults # 0.083 K/sec
11,812,797,651 cycles # 2.758 GHz [82.79%]
9,020,522,817 stalled-cycles-frontend # 76.36% frontend cycles idle [82.54%]
4,579,889,681 stalled-cycles-backend # 38.77% backend cycles idle [67.33%]
6,053,172,792 instructions # 0.51 insns per cycle
# 1.49 stalled cycles per insn [83.64%]
597,275,583 branches # 139.464 M/sec [83.70%]
8,960,541 branch-misses # 1.50% of all branches [83.65%]

10.128990264 seconds time elapsed

lpq83:~# ./ethtool -K eth0 tx-nocache-copy off
lpq83:~# perf stat ./netperf -H lpq84 -c
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to lpq84.prod.google.com () port 0 AF_INET
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % U us/KB us/KB

87380 16384 16384 10.00 9412.45 2.15 -1.00 0.449 -1.000

Performance counter stats for './netperf -H lpq84 -c':

2847.375441 task-clock # 0.281 CPUs utilized
11,632 context-switches # 0.004 M/sec
49 CPU-migrations # 0.017 K/sec
354 page-faults # 0.124 K/sec
7,646,889,749 cycles # 2.686 GHz [83.34%]
6,115,050,032 stalled-cycles-frontend # 79.97% frontend cycles idle [83.31%]
1,726,460,071 stalled-cycles-backend # 22.58% backend cycles idle [66.55%]
2,079,702,453 instructions # 0.27 insns per cycle
# 2.94 stalled cycles per insn [83.22%]
363,773,213 branches # 127.757 M/sec [83.29%]
4,242,732 branch-misses # 1.17% of all branches [83.51%]

10.128449949 seconds time elapsed

CC: Tom Herbert <ther...@google.com>
Signed-off-by: Benjamin Poirier <bpoi...@suse.de>
Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
net/core/dev.c | 5 -----
1 file changed, 5 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index b47375d..0770364 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -5546,13 +5546,8 @@ int register_netdevice(struct net_device *dev)
dev->features |= NETIF_F_SOFT_FEATURES;
dev->wanted_features = dev->features & dev->hw_features;

- /* Turn on no cache copy if HW is doing checksum */
if (!(dev->flags & IFF_LOOPBACK)) {
dev->hw_features |= NETIF_F_NOCACHE_COPY;
- if (dev->features & NETIF_F_ALL_CSUM) {
- dev->wanted_features |= NETIF_F_NOCACHE_COPY;
- dev->features |= NETIF_F_NOCACHE_COPY;
- }
}

/* Make NETIF_F_HIGHDMA inheritable to VLAN devices.

li...@kernel.org

unread,
Jan 27, 2015, 11:22:31 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Lars-Peter Clausen, Jonathan Cameron, Zefan Li
From: Lars-Peter Clausen <la...@metafoo.de>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 6822ee34ad57b29a3b44df2c2829910f03c34fa4 upstream.

"raw" is the name of a channel property, but should not be part of the
channel name itself.

Signed-off-by: Lars-Peter Clausen <la...@metafoo.de>
Signed-off-by: Jonathan Cameron <ji...@kernel.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/staging/iio/impedance-analyzer/ad5933.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/iio/impedance-analyzer/ad5933.c b/drivers/staging/iio/impedance-analyzer/ad5933.c
index 06b9fe2..2db80b1 100644
--- a/drivers/staging/iio/impedance-analyzer/ad5933.c
+++ b/drivers/staging/iio/impedance-analyzer/ad5933.c
@@ -124,7 +124,7 @@ static struct iio_chan_spec ad5933_channels[] = {
.type = IIO_VOLTAGE,
.indexed = 1,
.channel = 0,
- .extend_name = "real_raw",
+ .extend_name = "real",
.info_mask = IIO_CHAN_INFO_SCALE_SEPARATE_BIT,
.address = AD5933_REG_REAL_DATA,
.scan_index = 0,
@@ -137,7 +137,7 @@ static struct iio_chan_spec ad5933_channels[] = {
.type = IIO_VOLTAGE,
.indexed = 1,
.channel = 0,
- .extend_name = "imag_raw",
+ .extend_name = "imag",
.info_mask = IIO_CHAN_INFO_SCALE_SEPARATE_BIT,
.address = AD5933_REG_IMAG_DATA,
.scan_index = 1,

li...@kernel.org

unread,
Jan 27, 2015, 11:22:40 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Guillaume Nault, David S. Miller, Zefan Li
From: Guillaume Nault <g.n...@alphalink.fr>

3.4.105-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit eed4d839b0cdf9d84b0a9bc63de90fd5e1e886fb upstream.

Use dst_entry held by sk_dst_get() to retrieve tunnel's PMTU.

The dst_mtu(__sk_dst_get(tunnel->sock)) call was racy. __sk_dst_get()
could return NULL if tunnel->sock->sk_dst_cache was reset just before the
call, thus making dst_mtu() dereference a NULL pointer:

[ 1937.661598] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[ 1937.664005] IP: [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] PGD daf0c067 PUD d9f93067 PMD 0
[ 1937.664005] Oops: 0000 [#1] SMP
[ 1937.664005] Modules linked in: l2tp_ppp l2tp_netlink l2tp_core ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables udp_tunnel pppoe pppox ppp_generic slhc deflate ctr twofish_generic twofish_x86_64_3way xts lrw gf128mul glue_helper twofish_x86_64 twofish_common blowfish_generic blowfish_x86_64 blowfish_common des_generic cbc xcbc rmd160 sha512_generic hmac crypto_null af_key xfrm_algo 8021q garp bridge stp llc tun atmtcp clip atm ext3 mbcache jbd iTCO_wdt coretemp kvm_intel iTCO_vendor_support kvm pcspkr evdev ehci_pci lpc_ich mfd_core i5400_edac edac_core i5k_amb shpchp button processor thermal_sys xfs crc32c_generic libcrc32c dm_mod usbhid sg hid sr_mod sd_mod cdrom crc_t10dif crct10dif_common ata_generic ahci ata_piix tg3 libahci libata uhci_hcd ptp ehci_hcd pps_core usbcore scsi_mod libphy usb_common [last unloaded: l2tp_core]
[ 1937.664005] CPU: 0 PID: 10022 Comm: l2tpstress Tainted: G O 3.17.0-rc1 #1
[ 1937.664005] Hardware name: HP ProLiant DL160 G5, BIOS O12 08/22/2008
[ 1937.664005] task: ffff8800d8fda790 ti: ffff8800c43c4000 task.ti: ffff8800c43c4000
[ 1937.664005] RIP: 0010:[<ffffffffa049db88>] [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP: 0018:ffff8800c43c7de8 EFLAGS: 00010282
[ 1937.664005] RAX: ffff8800da8a7240 RBX: ffff8800d8c64600 RCX: 000001c325a137b5
[ 1937.664005] RDX: 8c6318c6318c6320 RSI: 000000000000010c RDI: 0000000000000000
[ 1937.664005] RBP: ffff8800c43c7ea8 R08: 0000000000000000 R09: 0000000000000000
[ 1937.664005] R10: ffffffffa048e2c0 R11: ffff8800d8c64600 R12: ffff8800ca7a5000
[ 1937.664005] R13: ffff8800c439bf40 R14: 000000000000000c R15: 0000000000000009
[ 1937.664005] FS: 00007fd7f610f700(0000) GS:ffff88011a600000(0000) knlGS:0000000000000000
[ 1937.664005] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 1937.664005] CR2: 0000000000000020 CR3: 00000000d9d75000 CR4: 00000000000027e0
[ 1937.664005] Stack:
[ 1937.664005] ffffffffa049da80 ffff8800d8fda790 000000000000005b ffff880000000009
[ 1937.664005] ffff8800daf3f200 0000000000000003 ffff8800c43c7e48 ffffffff81109b57
[ 1937.664005] ffffffff81109b0e ffffffff8114c566 0000000000000000 0000000000000000
[ 1937.664005] Call Trace:
[ 1937.664005] [<ffffffffa049da80>] ? pppol2tp_connect+0x235/0x41e [l2tp_ppp]
[ 1937.664005] [<ffffffff81109b57>] ? might_fault+0x9e/0xa5
[ 1937.664005] [<ffffffff81109b0e>] ? might_fault+0x55/0xa5
[ 1937.664005] [<ffffffff8114c566>] ? rcu_read_unlock+0x1c/0x26
[ 1937.664005] [<ffffffff81309196>] SYSC_connect+0x87/0xb1
[ 1937.664005] [<ffffffff813e56f7>] ? sysret_check+0x1b/0x56
[ 1937.664005] [<ffffffff8107590d>] ? trace_hardirqs_on_caller+0x145/0x1a1
[ 1937.664005] [<ffffffff81213dee>] ? trace_hardirqs_on_thunk+0x3a/0x3f
[ 1937.664005] [<ffffffff8114c262>] ? spin_lock+0x9/0xb
[ 1937.664005] [<ffffffff813092b4>] SyS_connect+0x9/0xb
[ 1937.664005] [<ffffffff813e56d2>] system_call_fastpath+0x16/0x1b
[ 1937.664005] Code: 10 2a 84 81 e8 65 76 bd e0 65 ff 0c 25 10 bb 00 00 4d 85 ed 74 37 48 8b 85 60 ff ff ff 48 8b 80 88 01 00 00 48 8b b8 10 02 00 00 <48> 8b 47 20 ff 50 20 85 c0 74 0f 83 e8 28 89 83 10 01 00 00 89
[ 1937.664005] RIP [<ffffffffa049db88>] pppol2tp_connect+0x33d/0x41e [l2tp_ppp]
[ 1937.664005] RSP <ffff8800c43c7de8>
[ 1937.664005] CR2: 0000000000000020
[ 1939.559375] ---[ end trace 82d44500f28f8708 ]---

Fixes: f34c4a35d879 ("l2tp: take PMTU from tunnel UDP socket")
Signed-off-by: Guillaume Nault <g.n...@alphalink.fr>
Acked-by: Eric Dumazet <edum...@google.com>
Signed-off-by: David S. Miller <da...@davemloft.net>
Cc: Guillaume Nault <g.n...@alphalink.fr>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
net/l2tp/l2tp_ppp.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/l2tp/l2tp_ppp.c b/net/l2tp/l2tp_ppp.c
index 82ed7df..cf7821b7 100644
--- a/net/l2tp/l2tp_ppp.c
+++ b/net/l2tp/l2tp_ppp.c
@@ -774,7 +774,8 @@ static int pppol2tp_connect(struct socket *sock, struct sockaddr *uservaddr,
/* If PMTU discovery was enabled, use the MTU that was discovered */
dst = sk_dst_get(tunnel->sock);
if (dst != NULL) {
- u32 pmtu = dst_mtu(__sk_dst_get(tunnel->sock));
+ u32 pmtu = dst_mtu(dst);
+
if (pmtu != 0)
session->mtu = session->mru = pmtu -
PPPOL2TP_HEADER_OVERHEAD;

li...@kernel.org

unread,
Jan 27, 2015, 11:22:46 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Nathaniel Ting, Johan Hovold, Zefan Li
From: Nathaniel Ting <nathani...@silabs.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 35cc83eab097e5720a9cc0ec12bdc3a726f58381 upstream.

Enable Silicon Labs Ember VID chips to enumerate with the cp210x usb serial
driver. EM358x devices operating with the Ember Z-Net 5.1.2 stack may now
connect to host PCs over a USB serial link.

Signed-off-by: Nathaniel Ting <nathani...@silabs.com>
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/cp210x.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/usb/serial/cp210x.c b/drivers/usb/serial/cp210x.c
index 37bca6b..19074db 100644
--- a/drivers/usb/serial/cp210x.c
+++ b/drivers/usb/serial/cp210x.c
@@ -161,6 +161,7 @@ static const struct usb_device_id id_table[] = {
{ USB_DEVICE(0x18EF, 0xE00F) }, /* ELV USB-I2C-Interface */
{ USB_DEVICE(0x1ADB, 0x0001) }, /* Schweitzer Engineering C662 Cable */
{ USB_DEVICE(0x1B1C, 0x1C00) }, /* Corsair USB Dongle */
+ { USB_DEVICE(0x1BA4, 0x0002) }, /* Silicon Labs 358x factory default */
{ USB_DEVICE(0x1BE3, 0x07A6) }, /* WAGO 750-923 USB Service Cable */
{ USB_DEVICE(0x1D6F, 0x0010) }, /* Seluxit ApS RF Dongle */
{ USB_DEVICE(0x1E29, 0x0102) }, /* Festo CPX-USB */

li...@kernel.org

unread,
Jan 27, 2015, 11:22:59 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Frans Klaver, Johan Hovold, Zefan Li
From: Frans Klaver <frans....@xsens.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit edd74ffab1f6909eee400c7de8ce621870aacac9 upstream.

Add new IDs for the Xsens Awinda Station and Awinda Dongle.

While at it, order the definitions by PID and add a logical separation
between devices using Xsens' VID and those using FTDI's VID.

Signed-off-by: Frans Klaver <frans....@xsens.com>
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/ftdi_sio.c | 2 ++
drivers/usb/serial/ftdi_sio_ids.h | 6 +++++-
2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index c7c980b..7736459 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -685,6 +685,8 @@ static struct usb_device_id id_table_combined [] = {
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_5_PID) },
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_6_PID) },
{ USB_DEVICE(FTDI_VID, XSENS_CONVERTER_7_PID) },
+ { USB_DEVICE(XSENS_VID, XSENS_AWINDA_DONGLE_PID) },
+ { USB_DEVICE(XSENS_VID, XSENS_AWINDA_STATION_PID) },
{ USB_DEVICE(XSENS_VID, XSENS_CONVERTER_PID) },
{ USB_DEVICE(XSENS_VID, XSENS_MTW_PID) },
{ USB_DEVICE(FTDI_VID, FTDI_OMNI1509) },
diff --git a/drivers/usb/serial/ftdi_sio_ids.h b/drivers/usb/serial/ftdi_sio_ids.h
index 02abda9..bd2a2c9 100644
--- a/drivers/usb/serial/ftdi_sio_ids.h
+++ b/drivers/usb/serial/ftdi_sio_ids.h
@@ -143,8 +143,12 @@
* Xsens Technologies BV products (http://www.xsens.com).
*/
#define XSENS_VID 0x2639
-#define XSENS_CONVERTER_PID 0xD00D /* Xsens USB-serial converter */
+#define XSENS_AWINDA_STATION_PID 0x0101
+#define XSENS_AWINDA_DONGLE_PID 0x0102
#define XSENS_MTW_PID 0x0200 /* Xsens MTw */
+#define XSENS_CONVERTER_PID 0xD00D /* Xsens USB-serial converter */
+
+/* Xsens devices using FTDI VID */
#define XSENS_CONVERTER_0_PID 0xD388 /* Xsens USB converter */
#define XSENS_CONVERTER_1_PID 0xD389 /* Xsens Wireless Receiver */
#define XSENS_CONVERTER_2_PID 0xD38A

li...@kernel.org

unread,
Jan 27, 2015, 11:23:05 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Jan Kara, Jens Axboe, linux...@vger.kernel.org, Jens Axboe, Zefan Li
From: Jan Kara <ja...@suse.cz>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 84ce0f0e94ac97217398b3b69c21c7a62ebeed05 upstream.

When sg_scsi_ioctl() fails to prepare request to submit in
blk_rq_map_kern() we jump to a label where we just end up copying
(luckily zeroed-out) kernel buffer to userspace instead of reporting
error. Fix the problem by jumping to the right label.

CC: Jens Axboe <ax...@kernel.dk>
CC: linux...@vger.kernel.org
Coverity-id: 1226871
Signed-off-by: Jan Kara <ja...@suse.cz>

Fixed up the, now unused, out label.

Signed-off-by: Jens Axboe <ax...@fb.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
block/scsi_ioctl.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/scsi_ioctl.c b/block/scsi_ioctl.c
index 9a87daa..f1c00c9 100644
--- a/block/scsi_ioctl.c
+++ b/block/scsi_ioctl.c
@@ -505,7 +505,7 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,

if (bytes && blk_rq_map_kern(q, rq, buffer, bytes, __GFP_WAIT)) {
err = DRIVER_ERROR << 24;
- goto out;
+ goto error;
}

memset(sense, 0, sizeof(sense));
@@ -515,7 +515,6 @@ int sg_scsi_ioctl(struct request_queue *q, struct gendisk *disk, fmode_t mode,

blk_execute_rq(q, disk, rq, 0);

-out:
err = rq->errors & 0xff; /* only 8 bit SCSI status */
if (err) {
if (rq->sense_len && rq->sense) {

li...@kernel.org

unread,
Jan 27, 2015, 11:23:11 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, J. Bruce Fields, Zefan Li
From: "J. Bruce Fields" <bfi...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 51904b08072a8bf2b9ed74d1bd7a5300a614471d upstream.

Unknown operation numbers are caught in nfsd4_decode_compound() which
sets op->opnum to OP_ILLEGAL and op->status to nfserr_op_illegal. The
error causes the main loop in nfsd4_proc_compound() to skip most
processing. But nfsd4_proc_compound also peeks ahead at the next
operation in one case and doesn't take similar precautions there.

Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/nfsd/nfs4proc.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c
index 22beaff..b2ce878 100644
--- a/fs/nfsd/nfs4proc.c
+++ b/fs/nfsd/nfs4proc.c
@@ -1132,7 +1132,8 @@ static bool need_wrongsec_check(struct svc_rqst *rqstp)
*/
if (argp->opcnt == resp->opcnt)
return false;
-
+ if (next->opnum == OP_ILLEGAL)
+ return false;
nextd = OPDESC(next);
/*
* Rest of 2.6.3.1.1: certain operations will return WRONGSEC

li...@kernel.org

unread,
Jan 27, 2015, 11:23:15 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Hans de Goede, Dmitry Torokhov, Zefan Li
From: Hans de Goede <hdeg...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 993b3a3f80a7842a48cd46c2b41e1b3ef6302468 upstream.

These models need i8042.notimeout, otherwise the touchpad will not work.

BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=69731
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1111138
Signed-off-by: Hans de Goede <hdeg...@redhat.com>
Signed-off-by: Dmitry Torokhov <dmitry....@gmail.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/input/serio/i8042-x86ia64io.h | 16 ++++++++++++++++
1 file changed, 16 insertions(+)

diff --git a/drivers/input/serio/i8042-x86ia64io.h b/drivers/input/serio/i8042-x86ia64io.h
index 40ff494..ce715b1 100644
--- a/drivers/input/serio/i8042-x86ia64io.h
+++ b/drivers/input/serio/i8042-x86ia64io.h
@@ -615,6 +615,22 @@ static const struct dmi_system_id __initconst i8042_dmi_notimeout_table[] = {
},
},
{
+ /* Fujitsu A544 laptop */
+ /* https://bugzilla.redhat.com/show_bug.cgi?id=1111138 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK A544"),
+ },
+ },
+ {
+ /* Fujitsu AH544 laptop */
+ /* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "FUJITSU"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "LIFEBOOK AH544"),
+ },
+ },
+ {
/* Fujitsu U574 laptop */
/* https://bugzilla.kernel.org/show_bug.cgi?id=69731 */
.matches = {

li...@kernel.org

unread,
Jan 27, 2015, 11:23:25 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Brian Silverman, austin...@gmail.com, dar...@dvhart.com, pet...@infradead.org, Thomas Gleixner, Zefan Li
From: Brian Silverman <bsilve...@gmail.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 30a6b8031fe14031ab27c1fa3483cb9780e7f63c upstream.

free_pi_state and exit_pi_state_list both clean up futex_pi_state's.
exit_pi_state_list takes the hb lock first, and most callers of
free_pi_state do too. requeue_pi doesn't, which means free_pi_state
can free the pi_state out from under exit_pi_state_list. For example:

task A | task B
exit_pi_state_list |
pi_state = |
curr->pi_state_list->next |
| futex_requeue(requeue_pi=1)
| // pi_state is the same as
| // the one in task A
| free_pi_state(pi_state)
| list_del_init(&pi_state->list)
| kfree(pi_state)
list_del_init(&pi_state->list) |

Move the free_pi_state calls in requeue_pi to before it drops the hb
locks which it's already holding.

[ tglx: Removed a pointless free_pi_state() call and the hb->lock held
debugging. The latter comes via a seperate patch ]

Signed-off-by: Brian Silverman <bsilve...@gmail.com>
Cc: austin...@gmail.com
Cc: dar...@dvhart.com
Cc: pet...@infradead.org
Link: http://lkml.kernel.org/r/1414282837-23092-1-git-...@gmail.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/futex.c | 22 +++++++++++-----------
1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index fafc54f..6b320c2 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -486,8 +486,14 @@ static struct futex_pi_state * alloc_pi_state(void)
return pi_state;
}

+/*
+ * Must be called with the hb lock held.
+ */
static void free_pi_state(struct futex_pi_state *pi_state)
{
+ if (!pi_state)
+ return;
+
if (!atomic_dec_and_test(&pi_state->refcount))
return;

@@ -1401,15 +1407,6 @@ static int futex_requeue(u32 __user *uaddr1, unsigned int flags,
}

retry:
- if (pi_state != NULL) {
- /*
- * We will have to lookup the pi_state again, so free this one
- * to keep the accounting correct.
- */
- free_pi_state(pi_state);
- pi_state = NULL;
- }
-
ret = get_futex_key(uaddr1, flags & FLAGS_SHARED, &key1, VERIFY_READ);
if (unlikely(ret != 0))
goto out;
@@ -1497,6 +1494,8 @@ retry_private:
case 0:
break;
case -EFAULT:
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1506,6 +1505,8 @@ retry_private:
goto out;
case -EAGAIN:
/* The owner was exiting, try again. */
+ free_pi_state(pi_state);
+ pi_state = NULL;
double_unlock_hb(hb1, hb2);
put_futex_key(&key2);
put_futex_key(&key1);
@@ -1582,6 +1583,7 @@ retry_private:
}

out_unlock:
+ free_pi_state(pi_state);
double_unlock_hb(hb1, hb2);

/*
@@ -1598,8 +1600,6 @@ out_put_keys:
out_put_key1:
put_futex_key(&key1);
out:
- if (pi_state != NULL)
- free_pi_state(pi_state);
return ret ? ret : task_count;

li...@kernel.org

unread,
Jan 27, 2015, 11:23:30 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Takashi Iwai, Zefan Li
From: Takashi Iwai <ti...@suse.de>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 317168d0c766defd14b3d0e9c2c4a9a258b803ee upstream.

In compat mode, we copy each field of snd_pcm_status struct but don't
touch the reserved fields, and this leaves uninitialized values
there. Meanwhile the native ioctl does zero-clear the whole
structure, so we should follow the same rule in compat mode, too.

Reported-by: Pierre-Louis Bossart <pierre-lou...@linux.intel.com>
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
sound/core/pcm_compat.c | 2 ++
1 file changed, 2 insertions(+)

diff --git a/sound/core/pcm_compat.c b/sound/core/pcm_compat.c
index 91cdf94..4dbb66e 100644
--- a/sound/core/pcm_compat.c
+++ b/sound/core/pcm_compat.c
@@ -204,6 +204,8 @@ static int snd_pcm_status_user_compat(struct snd_pcm_substream *substream,
if (err < 0)
return err;

+ if (clear_user(src, sizeof(*src)))
+ return -EFAULT;
if (put_user(status.state, &src->state) ||
put_user(status.trigger_tstamp.tv_sec, &src->trigger_tstamp.tv_sec) ||
put_user(status.trigger_tstamp.tv_nsec, &src->trigger_tstamp.tv_nsec) ||

li...@kernel.org

unread,
Jan 27, 2015, 11:23:37 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Will Deacon, Linus Torvalds, Zefan Li
From: Will Deacon <will....@arm.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit ce9ec37bddb633404a0c23e1acb181a264e7f7f2 upstream.

When unmapping a range of pages in zap_pte_range, the page being
unmapped is added to an mmu_gather_batch structure for asynchronous
freeing. If we run out of space in the batch structure before the range
has been completely unmapped, then we break out of the loop, force a
TLB flush and free the pages that we have batched so far. If there are
further pages to unmap, then we resume the loop where we left off.

Unfortunately, we forget to update addr when we break out of the loop,
which causes us to truncate the range being invalidated as the end
address is exclusive. When we re-enter the loop at the same address, the
page has already been freed and the pte_present test will fail, meaning
that we do not reconsider the address for invalidation.

This patch fixes the problem by incrementing addr by the PAGE_SIZE
before breaking out of the loop on batch failure.

Signed-off-by: Will Deacon <will....@arm.com>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
mm/memory.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/mm/memory.c b/mm/memory.c
index ffd74f3..c34e60a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -1164,8 +1164,10 @@ again:
if (unlikely(page_mapcount(page) < 0))
print_bad_pte(vma, addr, ptent, page);
force_flush = !__tlb_remove_page(tlb, page);
- if (force_flush)
+ if (force_flush) {
+ addr += PAGE_SIZE;
break;
+ }
continue;
}
/*

li...@kernel.org

unread,
Jan 27, 2015, 11:23:46 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, David Rientjes, Kirill A. Shutemov, Andrea Arcangeli, Andrew Morton, Linus Torvalds, Zefan Li
From: David Rientjes <rien...@google.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 6d50e60cd2edb5a57154db5a6f64eef5aa59b751 upstream.

If an anonymous mapping is not allowed to fault thp memory and then
madvise(MADV_HUGEPAGE) is used after fault, khugepaged will never
collapse this memory into thp memory.

This occurs because the madvise(2) handler for thp, hugepage_madvise(),
clears VM_NOHUGEPAGE on the stack and it isn't stored in vma->vm_flags
until the final action of madvise_behavior(). This causes the
khugepaged_enter_vma_merge() to be a no-op in hugepage_madvise() when
the vma had previously had VM_NOHUGEPAGE set.

Fix this by passing the correct vma flags to the khugepaged mm slot
handler. There's no chance khugepaged can run on this vma until after
madvise_behavior() returns since we hold mm->mmap_sem.

It would be possible to clear VM_NOHUGEPAGE directly from vma->vm_flags
in hugepage_advise(), but I didn't want to introduce special case
behavior into madvise_behavior(). I think it's best to just let it
always set vma->vm_flags itself.

Signed-off-by: David Rientjes <rien...@google.com>
Reported-by: Suleiman Souhlal <sule...@google.com>
Cc: "Kirill A. Shutemov" <kirill....@linux.intel.com>
Cc: Andrea Arcangeli <aarc...@redhat.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
include/linux/khugepaged.h | 17 ++++++++++-------
mm/huge_memory.c | 11 ++++++-----
mm/mmap.c | 8 ++++----
3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h
index 6b394f0..eeb3079 100644
--- a/include/linux/khugepaged.h
+++ b/include/linux/khugepaged.h
@@ -6,7 +6,8 @@
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
extern int __khugepaged_enter(struct mm_struct *mm);
extern void __khugepaged_exit(struct mm_struct *mm);
-extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma);
+extern int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags);

#define khugepaged_enabled() \
(transparent_hugepage_flags & \
@@ -35,13 +36,13 @@ static inline void khugepaged_exit(struct mm_struct *mm)
__khugepaged_exit(mm);
}

-static inline int khugepaged_enter(struct vm_area_struct *vma)
+static inline int khugepaged_enter(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
if (!test_bit(MMF_VM_HUGEPAGE, &vma->vm_mm->flags))
if ((khugepaged_always() ||
- (khugepaged_req_madv() &&
- vma->vm_flags & VM_HUGEPAGE)) &&
- !(vma->vm_flags & VM_NOHUGEPAGE))
+ (khugepaged_req_madv() && (vm_flags & VM_HUGEPAGE))) &&
+ !(vm_flags & VM_NOHUGEPAGE))
if (__khugepaged_enter(vma->vm_mm))
return -ENOMEM;
return 0;
@@ -54,11 +55,13 @@ static inline int khugepaged_fork(struct mm_struct *mm, struct mm_struct *oldmm)
static inline void khugepaged_exit(struct mm_struct *mm)
{
}
-static inline int khugepaged_enter(struct vm_area_struct *vma)
+static inline int khugepaged_enter(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
return 0;
}
-static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+static inline int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
return 0;
}
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 3da5c0b..8978c1b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -711,7 +711,7 @@ int do_huge_pmd_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (haddr >= vma->vm_start && haddr + HPAGE_PMD_SIZE <= vma->vm_end) {
if (unlikely(anon_vma_prepare(vma)))
return VM_FAULT_OOM;
- if (unlikely(khugepaged_enter(vma)))
+ if (unlikely(khugepaged_enter(vma, vma->vm_flags)))
return VM_FAULT_OOM;
page = alloc_hugepage_vma(transparent_hugepage_defrag(vma),
vma, haddr, numa_node_id(), 0);
@@ -1505,7 +1505,7 @@ int hugepage_madvise(struct vm_area_struct *vma,
* register it here without waiting a page fault that
* may not happen any time soon.
*/
- if (unlikely(khugepaged_enter_vma_merge(vma)))
+ if (unlikely(khugepaged_enter_vma_merge(vma, *vm_flags)))
return -ENOMEM;
break;
case MADV_NOHUGEPAGE:
@@ -1637,7 +1637,8 @@ int __khugepaged_enter(struct mm_struct *mm)
return 0;
}

-int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
+int khugepaged_enter_vma_merge(struct vm_area_struct *vma,
+ unsigned long vm_flags)
{
unsigned long hstart, hend;
if (!vma->anon_vma)
@@ -1653,11 +1654,11 @@ int khugepaged_enter_vma_merge(struct vm_area_struct *vma)
* If is_pfn_mapping() is true is_learn_pfn_mapping() must be
* true too, verify it here.
*/
- VM_BUG_ON(is_linear_pfn_mapping(vma) || vma->vm_flags & VM_NO_THP);
+ VM_BUG_ON(is_linear_pfn_mapping(vma) || vm_flags & VM_NO_THP);
hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK;
hend = vma->vm_end & HPAGE_PMD_MASK;
if (hstart < hend)
- return khugepaged_enter(vma);
+ return khugepaged_enter(vma, vm_flags);
return 0;
}

diff --git a/mm/mmap.c b/mm/mmap.c
index 69367e4..94fdbe8 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -826,7 +826,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
end, prev->vm_pgoff, NULL);
if (err)
return NULL;
- khugepaged_enter_vma_merge(prev);
+ khugepaged_enter_vma_merge(prev, vm_flags);
return prev;
}

@@ -845,7 +845,7 @@ struct vm_area_struct *vma_merge(struct mm_struct *mm,
next->vm_pgoff - pglen, NULL);
if (err)
return NULL;
- khugepaged_enter_vma_merge(area);
+ khugepaged_enter_vma_merge(area, vm_flags);
return area;
}

@@ -1820,7 +1820,7 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
}
}
vma_unlock_anon_vma(vma);
- khugepaged_enter_vma_merge(vma);
+ khugepaged_enter_vma_merge(vma, vma->vm_flags);
return error;
}
#endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */
@@ -1871,7 +1871,7 @@ int expand_downwards(struct vm_area_struct *vma,
}
}
vma_unlock_anon_vma(vma);
- khugepaged_enter_vma_merge(vma);
+ khugepaged_enter_vma_merge(vma, vma->vm_flags);
return error;

li...@kernel.org

unread,
Jan 27, 2015, 11:23:49 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Jan Kara, Theodore Ts'o, Zefan Li
From: Jan Kara <ja...@suse.cz>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 9378c6768e4fca48971e7b6a9075bc006eda981d upstream.

When there are no meta block groups update_backups() will compute the
backup block in 32-bit arithmetics thus possibly overflowing the block
number and corrupting the filesystem. OTOH filesystems without meta
block groups larger than 16 TB should be rare. Fix the problem by doing
the counting in 64-bit arithmetics.

Coverity-id: 741252
Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Reviewed-by: Lukas Czerner <lcze...@redhat.com>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ext4/resize.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/resize.c b/fs/ext4/resize.c
index a43e43c..cfd3211 100644
--- a/fs/ext4/resize.c
+++ b/fs/ext4/resize.c
@@ -991,7 +991,7 @@ static void update_backups(struct super_block *sb,
(err = ext4_journal_restart(handle, EXT4_MAX_TRANS_DATA)))
break;

- bh = sb_getblk(sb, group * bpg + blk_off);
+ bh = sb_getblk(sb, ((ext4_fsblk_t)group) * bpg + blk_off);
if (!bh) {
err = -ENOMEM;
break;

li...@kernel.org

unread,
Jan 27, 2015, 11:23:57 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Cyril Brulebois, John W. Linville, Zefan Li
From: Cyril Brulebois <ki...@debian.org>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 664d6a792785cc677c2091038ce10322c8d04ae1 upstream.

0x1b75 0xa200 AirLive WN-200USB wireless 11b/g/n dongle

References: https://bugs.debian.org/766802
Reported-by: Martin Mokrejs <mmok...@fold.natur.cuni.cz>
Signed-off-by: Cyril Brulebois <ki...@debian.org>
Acked-by: Stanislaw Gruszka <sgru...@redhat.com>
Signed-off-by: John W. Linville <linv...@tuxdriver.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/net/wireless/rt2x00/rt2800usb.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/drivers/net/wireless/rt2x00/rt2800usb.c b/drivers/net/wireless/rt2x00/rt2800usb.c
index 664e93d..49baf0c 100644
--- a/drivers/net/wireless/rt2x00/rt2800usb.c
+++ b/drivers/net/wireless/rt2x00/rt2800usb.c
@@ -1081,6 +1081,7 @@ static struct usb_device_id rt2800usb_device_table[] = {
/* Ovislink */
{ USB_DEVICE(0x1b75, 0x3071) },
{ USB_DEVICE(0x1b75, 0x3072) },
+ { USB_DEVICE(0x1b75, 0xa200) },
/* Para */
{ USB_DEVICE(0x20b8, 0x8888) },
/* Pegatron */

li...@kernel.org

unread,
Jan 27, 2015, 11:24:00 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Will Deacon, Jason Baron, Wade Farnsworth, Frederic Weisbecker, Steven Rostedt, Zefan Li
From: Will Deacon <will....@arm.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 60916a9382e88fbf5e54fd36a3e658efd7ab7bed upstream.

syscall_get_nr can return -1 in the case that the task is not executing
a system call.

This patch fixes perf_syscall_{enter,exit} to check that the syscall
number is valid before using it as an index into a bitmap.

Link: http://lkml.kernel.org/r/1345137254-7377-1-git-...@arm.com

Cc: Jason Baron <jba...@redhat.com>
Cc: Wade Farnsworth <wade_fa...@mentor.com>
Cc: Frederic Weisbecker <fwei...@gmail.com>
Signed-off-by: Will Deacon <will....@arm.com>
Signed-off-by: Steven Rostedt <ros...@goodmis.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
kernel/trace/trace_syscalls.c | 4 ++++
1 file changed, 4 insertions(+)

diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index c9ce09a..aed38fd 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -519,6 +519,8 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
int size;

syscall_nr = syscall_get_nr(current, regs);
+ if (syscall_nr < 0)
+ return;
if (!test_bit(syscall_nr, enabled_perf_enter_syscalls))
return;

@@ -593,6 +595,8 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
int size;

syscall_nr = syscall_get_nr(current, regs);
+ if (syscall_nr < 0)
+ return;
if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
return;

li...@kernel.org

unread,
Jan 27, 2015, 11:24:11 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Jan Kara, Theodore Ts'o, Zefan Li
From: Jan Kara <ja...@suse.cz>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 6050d47adcadbb53582434d919ed7f038d936712 upstream.

When ext4_handle_dirty_dx_node() or ext4_handle_dirty_dirent_node()
fail, there's really something wrong with the fs and there's no point in
continuing further. Just return error from make_indexed_dir() in that
case. Also initialize frames array so that if we return early due to
error, dx_release() doesn't try to dereference uninitialized memory
(which could happen also due to error in do_split()).

Coverity-id: 741300
Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Theodore Ts'o <ty...@mit.edu>
[lizf: Backported to 3.4:
- adjust context
- replace ext4_handle_dirty_{dx,dirent}_node() with
ext4_handle_dirty_metadata()]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
fs/ext4/namei.c | 27 +++++++++++++++++----------
1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
index c6dd936..dc58523 100644
--- a/fs/ext4/namei.c
+++ b/fs/ext4/namei.c
@@ -1421,31 +1421,38 @@ static int make_indexed_dir(handle_t *handle, struct dentry *dentry,
hinfo.hash_version += EXT4_SB(dir->i_sb)->s_hash_unsigned;
hinfo.seed = EXT4_SB(dir->i_sb)->s_hash_seed;
ext4fs_dirhash(name, namelen, &hinfo);
+ memset(frames, 0, sizeof(frames));
frame = frames;
frame->entries = entries;
frame->at = entries;
frame->bh = bh;
bh = bh2;

- ext4_handle_dirty_metadata(handle, dir, frame->bh);
- ext4_handle_dirty_metadata(handle, dir, bh);
+ retval = ext4_handle_dirty_metadata(handle, dir, frame->bh);
+ if (retval)
+ goto out_frames;
+ retval = ext4_handle_dirty_metadata(handle, dir, bh);
+ if (retval)
+ goto out_frames;

de = do_split(handle,dir, &bh, frame, &hinfo, &retval);
if (!de) {
- /*
- * Even if the block split failed, we have to properly write
- * out all the changes we did so far. Otherwise we can end up
- * with corrupted filesystem.
- */
- ext4_mark_inode_dirty(handle, dir);
- dx_release(frames);
- return retval;
+ goto out_frames;
}
dx_release(frames);

retval = add_dirent_to_buf(handle, dentry, inode, de, bh);
brelse(bh);
return retval;
+out_frames:
+ /*
+ * Even if the block split failed, we have to properly write
+ * out all the changes we did so far. Otherwise we can end up
+ * with corrupted filesystem.
+ */
+ ext4_mark_inode_dirty(handle, dir);
+ dx_release(frames);
+ return retval;
}

/*

li...@kernel.org

unread,
Jan 27, 2015, 11:24:17 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Hans de Goede, Darren Hart, Zefan Li
From: Hans de Goede <hdeg...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 183fd8fcd7f8afb7ac5ec68f83194872f9fecc84 upstream.

The acpi-video backlight interface on the Acer KAV80 is broken, and worse
it causes the entire machine to slow down significantly after a suspend/resume.

Blacklist it, and use the acer-wmi backlight interface instead. Note that
the KAV80 is somewhat unique in that it is the only Acer model where we
fall back to acer-wmi after blacklisting, rather then using the native
(e.g. intel) backlight driver. This is done because there is no native
backlight interface on this model.

BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1128309
Signed-off-by: Hans de Goede <hdeg...@redhat.com>
Signed-off-by: Darren Hart <dvh...@linux.intel.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/platform/x86/acer-wmi.c | 11 +++++++++++
1 file changed, 11 insertions(+)

diff --git a/drivers/platform/x86/acer-wmi.c b/drivers/platform/x86/acer-wmi.c
index c1a3fd8..4d04731 100644
--- a/drivers/platform/x86/acer-wmi.c
+++ b/drivers/platform/x86/acer-wmi.c
@@ -523,6 +523,17 @@ static const struct dmi_system_id video_vendor_dmi_table[] = {
DMI_MATCH(DMI_PRODUCT_NAME, "TravelMate 4750"),
},
},
+ {
+ /*
+ * Note no video_set_backlight_video_vendor, we must use the
+ * acer interface, as there is no native backlight interface.
+ */
+ .ident = "Acer KAV80",
+ .matches = {
+ DMI_MATCH(DMI_SYS_VENDOR, "Acer"),
+ DMI_MATCH(DMI_PRODUCT_NAME, "KAV80"),
+ },
+ },
{}
};

li...@kernel.org

unread,
Jan 27, 2015, 11:24:27 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Anton Blanchard, Michael Ellerman, Zefan Li
From: Anton Blanchard <an...@samba.org>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 808be31426af57af22268ef0fcb42617beb3d15b upstream.

Back in 7230c5644188 ("powerpc: Rework lazy-interrupt handling") we
added a call out to restore_interrupts() (written in c) before calling
do_notify_resume:

bl restore_interrupts
addi r3,r1,STACK_FRAME_OVERHEAD
bl do_notify_resume

Unfortunately do_notify_resume takes two arguments, the second one
being the thread_info flags:

void do_notify_resume(struct pt_regs *regs, unsigned long thread_info_flags)

We do populate r4 (the second argument) earlier, but
restore_interrupts() is free to muck it up all it wants. My guess is
the gcc compiler gods shone down on us and its register allocator
never used r4. Sometimes, rarely, luck is on our side.

LLVM on the other hand did trample r4.

Signed-off-by: Anton Blanchard <an...@samba.org>
Signed-off-by: Michael Ellerman <m...@ellerman.id.au>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
arch/powerpc/kernel/entry_64.S | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index e500969..c3fc39e 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -813,7 +813,13 @@ user_work:
b .ret_from_except_lite

1: bl .save_nvgprs
+ /*
+ * Use a non volatile GPR to save and restore our thread_info flags
+ * across the call to restore_interrupts.
+ */
+ mr r30,r4
bl .restore_interrupts
+ mr r4,r30
addi r3,r1,STACK_FRAME_OVERHEAD
bl .do_notify_resume
b .ret_from_except

li...@kernel.org

unread,
Jan 27, 2015, 11:24:32 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Johan Hovold, Zefan Li
From: Johan Hovold <jo...@kernel.org>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit e681286de221af78fc85db9222b6a203148c005a upstream.

Write may be called from interrupt context so make sure to use
GFP_ATOMIC for all allocations in write.

Fixes: 0d930e51cfe6 ("USB: opticon: Add Opticon OPN2001 write support")
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/serial/opticon.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/usb/serial/opticon.c b/drivers/usb/serial/opticon.c
index 1f85006..58b7cec 100644
--- a/drivers/usb/serial/opticon.c
+++ b/drivers/usb/serial/opticon.c
@@ -293,7 +293,7 @@ static int opticon_write(struct tty_struct *tty, struct usb_serial_port *port,

/* The conncected devices do not have a bulk write endpoint,
* to transmit data to de barcode device the control endpoint is used */
- dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_NOIO);
+ dr = kmalloc(sizeof(struct usb_ctrlrequest), GFP_ATOMIC);
if (!dr) {
dev_err(&port->dev, "out of memory\n");
count = -ENOMEM;

li...@kernel.org

unread,
Jan 27, 2015, 11:24:38 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Johan Hovold, Greg Kroah-Hartman, Zefan Li
From: Johan Hovold <jo...@kernel.org>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 2a159389bf5d962359349a76827b2f683276a1c7 upstream.

Add new quirk for devices that cannot handle requests for the
device_qualifier descriptor.

A USB-2.0 compliant device must respond to requests for the
device_qualifier descriptor (even if it's with a request error), but at
least one device is known to misbehave after such a request.

Suggested-by: Bjørn Mork <bj...@mork.no>
Signed-off-by: Johan Hovold <jo...@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/core/hub.c | 3 +++
include/linux/usb/quirks.h | 3 +++
2 files changed, 6 insertions(+)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 2d5177a..93f2538 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -3361,6 +3361,9 @@ check_highspeed (struct usb_hub *hub, struct usb_device *udev, int port1)
struct usb_qualifier_descriptor *qual;
int status;

+ if (udev->quirks & USB_QUIRK_DEVICE_QUALIFIER)
+ return;
+
qual = kmalloc (sizeof *qual, GFP_KERNEL);
if (qual == NULL)
return;
diff --git a/include/linux/usb/quirks.h b/include/linux/usb/quirks.h
index 8eeeb87..d0d2af0 100644
--- a/include/linux/usb/quirks.h
+++ b/include/linux/usb/quirks.h
@@ -33,4 +33,7 @@
/* device generates spurious wakeup, ignore remote wakeup capability */
#define USB_QUIRK_IGNORE_REMOTE_WAKEUP 0x00000200

+/* device can't handle device_qualifier descriptor requests */
+#define USB_QUIRK_DEVICE_QUALIFIER 0x00000100
+
#endif /* __LINUX_USB_QUIRKS_H */

li...@kernel.org

unread,
Jan 27, 2015, 11:24:42 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Adel Gadllah, Greg Kroah-Hartman, Zefan Li
From: Adel Gadllah <adel.g...@gmail.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit d749947561af5996ccc076b2ffcc5f48b1be5d74 upstream.

Yet another device affected by this.

Tested-by: Kevin Fenzi <ke...@scrye.com>
Signed-off-by: Adel Gadllah <adel.g...@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/core/quirks.c | 3 +++
1 file changed, 3 insertions(+)

diff --git a/drivers/usb/core/quirks.c b/drivers/usb/core/quirks.c
index 9bd01502..980a9d8 100644
--- a/drivers/usb/core/quirks.c
+++ b/drivers/usb/core/quirks.c
@@ -91,6 +91,9 @@ static const struct usb_device_id usb_quirk_list[] = {
{ USB_DEVICE(0x04f3, 0x009b), .driver_info =
USB_QUIRK_DEVICE_QUALIFIER },

+ { USB_DEVICE(0x04f3, 0x016f), .driver_info =
+ USB_QUIRK_DEVICE_QUALIFIER },
+
/* Roland SC-8820 */
{ USB_DEVICE(0x0582, 0x0007), .driver_info = USB_QUIRK_RESET_RESUME },

li...@kernel.org

unread,
Jan 27, 2015, 11:24:47 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Oliver Neukum, Greg Kroah-Hartman, Zefan Li
From: Oliver Neukum <one...@suse.de>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit b45abacde3d551c6696c6738bef4a1805d0bf27a upstream.

The switch back is limited to ULT even on HP. The contrary
finding arose by bad luck in BIOS versions for testing.
This fixes spontaneous resume from S3 on some HP laptops.

Signed-off-by: Oliver Neukum <one...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[lizf: Backported to 3.4: adjust context]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/usb/host/xhci-pci.c | 14 --------------
1 file changed, 14 deletions(-)

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index 8882d65..c8835d5 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -118,20 +118,6 @@ static void xhci_pci_quirks(struct device *dev, struct xhci_hcd *xhci)
xhci->quirks |= XHCI_SPURIOUS_REBOOT;
xhci->quirks |= XHCI_AVOID_BEI;
}
- if (pdev->vendor == PCI_VENDOR_ID_INTEL &&
- (pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_XHCI ||
- pdev->device == PCI_DEVICE_ID_INTEL_LYNXPOINT_LP_XHCI)) {
- /* Workaround for occasional spurious wakeups from S5 (or
- * any other sleep) on Haswell machines with LPT and LPT-LP
- * with the new Intel BIOS
- */
- /* Limit the quirk to only known vendors, as this triggers
- * yet another BIOS bug on some other machines
- * https://bugzilla.kernel.org/show_bug.cgi?id=66171
- */
- if (pdev->subsystem_vendor == PCI_VENDOR_ID_HP)
- xhci->quirks |= XHCI_SPURIOUS_WAKEUP;
- }
if (pdev->vendor == PCI_VENDOR_ID_ETRON &&
pdev->device == PCI_DEVICE_ID_ASROCK_P67) {
xhci->quirks |= XHCI_RESET_ON_RESUME;

li...@kernel.org

unread,
Jan 27, 2015, 11:24:54 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Grant Likely, Rafael J. Wysocki, Mika Westerberg, Rob Herring, Arnd Bergmann, Darren Hart, Zefan Li
From: Grant Likely <grant....@linaro.org>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit a87fa1d81a9fb5e9adca9820e16008c40ad09f33 upstream.

The string property read helpers will run off the end of the buffer if
it is handed a malformed string property. Rework the parsers to make
sure that doesn't happen. At the same time add new test cases to make
sure the functions behave themselves.

The original implementations of of_property_read_string_index() and
of_property_count_strings() both open-coded the same block of parsing
code, each with it's own subtly different bugs. The fix here merges
functions into a single helper and makes the original functions static
inline wrappers around the helper.

One non-bugfix aspect of this patch is the addition of a new wrapper,
of_property_read_string_array(). The new wrapper is needed by the
device_properties feature that Rafael is working on and planning to
merge for v3.19. The implementation is identical both with and without
the new static inline wrapper, so it just got left in to reduce the
churn on the header file.

Signed-off-by: Grant Likely <grant....@linaro.org>
Cc: Rafael J. Wysocki <rafael.j...@intel.com>
Cc: Mika Westerberg <mika.we...@linux.intel.com>
Cc: Rob Herring <rob...@kernel.org>
Cc: Arnd Bergmann <ar...@arndb.de>
Cc: Darren Hart <darre...@intel.com>
[lizf: Backported to 3.4: Drop selftest hunks that don't apply]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/of/base.c | 88 +++++++++++++--------------------------------------
drivers/of/selftest.c | 64 ++++++++++++++++++++++++++++++++++---
include/linux/of.h | 84 ++++++++++++++++++++++++++++++++++++++++--------
3 files changed, 151 insertions(+), 85 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 1c207f2..d439d06 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -716,52 +716,6 @@ int of_property_read_string(struct device_node *np, const char *propname,
EXPORT_SYMBOL_GPL(of_property_read_string);

/**
- * of_property_read_string_index - Find and read a string from a multiple
- * strings property.
- * @np: device node from which the property value is to be read.
- * @propname: name of the property to be searched.
- * @index: index of the string in the list of strings
- * @out_string: pointer to null terminated return string, modified only if
- * return value is 0.
- *
- * Search for a property in a device tree node and retrieve a null
- * terminated string value (pointer to data, not a copy) in the list of strings
- * contained in that property.
- * Returns 0 on success, -EINVAL if the property does not exist, -ENODATA if
- * property does not have a value, and -EILSEQ if the string is not
- * null-terminated within the length of the property data.
- *
- * The out_string pointer is modified only if a valid string can be decoded.
- */
-int of_property_read_string_index(struct device_node *np, const char *propname,
- int index, const char **output)
-{
- struct property *prop = of_find_property(np, propname, NULL);
- int i = 0;
- size_t l = 0, total = 0;
- const char *p;
-
- if (!prop)
- return -EINVAL;
- if (!prop->value)
- return -ENODATA;
- if (strnlen(prop->value, prop->length) >= prop->length)
- return -EILSEQ;
-
- p = prop->value;
-
- for (i = 0; total < prop->length; total += l, p += l) {
- l = strlen(p) + 1;
- if (i++ == index) {
- *output = p;
- return 0;
- }
- }
- return -ENODATA;
-}
-EXPORT_SYMBOL_GPL(of_property_read_string_index);
-
-/**
* of_property_match_string() - Find string in a list and return index
* @np: pointer to node containing string list property
* @propname: string list property name
@@ -787,7 +741,7 @@ int of_property_match_string(struct device_node *np, const char *propname,
end = p + prop->length;

for (i = 0; p < end; i++, p += l) {
- l = strlen(p) + 1;
+ l = strnlen(p, end - p) + 1;
if (p + l > end)
return -EILSEQ;
pr_debug("comparing %s with %s\n", string, p);
@@ -799,39 +753,41 @@ int of_property_match_string(struct device_node *np, const char *propname,
EXPORT_SYMBOL_GPL(of_property_match_string);

/**
- * of_property_count_strings - Find and return the number of strings from a
- * multiple strings property.
+ * of_property_read_string_util() - Utility helper for parsing string properties
* @np: device node from which the property value is to be read.
* @propname: name of the property to be searched.
+ * @out_strs: output array of string pointers.
+ * @sz: number of array elements to read.
+ * @skip: Number of strings to skip over at beginning of list.
*
- * Search for a property in a device tree node and retrieve the number of null
- * terminated string contain in it. Returns the number of strings on
- * success, -EINVAL if the property does not exist, -ENODATA if property
- * does not have a value, and -EILSEQ if the string is not null-terminated
- * within the length of the property data.
+ * Don't call this function directly. It is a utility helper for the
+ * of_property_read_string*() family of functions.
*/
-int of_property_count_strings(struct device_node *np, const char *propname)
+int of_property_read_string_helper(struct device_node *np, const char *propname,
+ const char **out_strs, size_t sz, int skip)
{
struct property *prop = of_find_property(np, propname, NULL);
- int i = 0;
- size_t l = 0, total = 0;
- const char *p;
+ int l = 0, i = 0;
+ const char *p, *end;

if (!prop)
return -EINVAL;
if (!prop->value)
return -ENODATA;
- if (strnlen(prop->value, prop->length) >= prop->length)
- return -EILSEQ;
-
p = prop->value;
+ end = p + prop->length;

- for (i = 0; total < prop->length; total += l, p += l, i++)
- l = strlen(p) + 1;
-
- return i;
+ for (i = 0; p < end && (!out_strs || i < skip + sz); i++, p += l) {
+ l = strnlen(p, end - p) + 1;
+ if (p + l > end)
+ return -EILSEQ;
+ if (out_strs && i >= skip)
+ *out_strs++ = p;
+ }
+ i -= skip;
+ return i <= 0 ? -ENODATA : i;
}
-EXPORT_SYMBOL_GPL(of_property_count_strings);
+EXPORT_SYMBOL_GPL(of_property_read_string_helper);

/**
* of_parse_phandle - Resolve a phandle property to a device_node pointer
diff --git a/drivers/of/selftest.c b/drivers/of/selftest.c
index f24ffd7..75daeb5 100644
--- a/drivers/of/selftest.c
+++ b/drivers/of/selftest.c
@@ -120,8 +120,9 @@ static void __init of_selftest_parse_phandle_with_args(void)
pr_info("end - %s\n", passed_all ? "PASS" : "FAIL");
}

-static void __init of_selftest_property_match_string(void)
+static void __init of_selftest_property_string(void)
{
+ const char *strings[4];
struct device_node *np;
int rc;

@@ -139,13 +140,66 @@ static void __init of_selftest_property_match_string(void)
rc = of_property_match_string(np, "phandle-list-names", "third");
selftest(rc == 2, "third expected:0 got:%i\n", rc);
rc = of_property_match_string(np, "phandle-list-names", "fourth");
- selftest(rc == -ENODATA, "unmatched string; rc=%i", rc);
+ selftest(rc == -ENODATA, "unmatched string; rc=%i\n", rc);
rc = of_property_match_string(np, "missing-property", "blah");
- selftest(rc == -EINVAL, "missing property; rc=%i", rc);
+ selftest(rc == -EINVAL, "missing property; rc=%i\n", rc);
rc = of_property_match_string(np, "empty-property", "blah");
- selftest(rc == -ENODATA, "empty property; rc=%i", rc);
+ selftest(rc == -ENODATA, "empty property; rc=%i\n", rc);
rc = of_property_match_string(np, "unterminated-string", "blah");
- selftest(rc == -EILSEQ, "unterminated string; rc=%i", rc);
+ selftest(rc == -EILSEQ, "unterminated string; rc=%i\n", rc);
+
+ /* of_property_count_strings() tests */
+ rc = of_property_count_strings(np, "string-property");
+ selftest(rc == 1, "Incorrect string count; rc=%i\n", rc);
+ rc = of_property_count_strings(np, "phandle-list-names");
+ selftest(rc == 3, "Incorrect string count; rc=%i\n", rc);
+ rc = of_property_count_strings(np, "unterminated-string");
+ selftest(rc == -EILSEQ, "unterminated string; rc=%i\n", rc);
+ rc = of_property_count_strings(np, "unterminated-string-list");
+ selftest(rc == -EILSEQ, "unterminated string array; rc=%i\n", rc);
+
+ /* of_property_read_string_index() tests */
+ rc = of_property_read_string_index(np, "string-property", 0, strings);
+ selftest(rc == 0 && !strcmp(strings[0], "foobar"), "of_property_read_string_index() failure; rc=%i\n", rc);
+ strings[0] = NULL;
+ rc = of_property_read_string_index(np, "string-property", 1, strings);
+ selftest(rc == -ENODATA && strings[0] == NULL, "of_property_read_string_index() failure; rc=%i\n", rc);
+ rc = of_property_read_string_index(np, "phandle-list-names", 0, strings);
+ selftest(rc == 0 && !strcmp(strings[0], "first"), "of_property_read_string_index() failure; rc=%i\n", rc);
+ rc = of_property_read_string_index(np, "phandle-list-names", 1, strings);
+ selftest(rc == 0 && !strcmp(strings[0], "second"), "of_property_read_string_index() failure; rc=%i\n", rc);
+ rc = of_property_read_string_index(np, "phandle-list-names", 2, strings);
+ selftest(rc == 0 && !strcmp(strings[0], "third"), "of_property_read_string_index() failure; rc=%i\n", rc);
+ strings[0] = NULL;
+ rc = of_property_read_string_index(np, "phandle-list-names", 3, strings);
+ selftest(rc == -ENODATA && strings[0] == NULL, "of_property_read_string_index() failure; rc=%i\n", rc);
+ strings[0] = NULL;
+ rc = of_property_read_string_index(np, "unterminated-string", 0, strings);
+ selftest(rc == -EILSEQ && strings[0] == NULL, "of_property_read_string_index() failure; rc=%i\n", rc);
+ rc = of_property_read_string_index(np, "unterminated-string-list", 0, strings);
+ selftest(rc == 0 && !strcmp(strings[0], "first"), "of_property_read_string_index() failure; rc=%i\n", rc);
+ strings[0] = NULL;
+ rc = of_property_read_string_index(np, "unterminated-string-list", 2, strings); /* should fail */
+ selftest(rc == -EILSEQ && strings[0] == NULL, "of_property_read_string_index() failure; rc=%i\n", rc);
+ strings[1] = NULL;
+
+ /* of_property_read_string_array() tests */
+ rc = of_property_read_string_array(np, "string-property", strings, 4);
+ selftest(rc == 1, "Incorrect string count; rc=%i\n", rc);
+ rc = of_property_read_string_array(np, "phandle-list-names", strings, 4);
+ selftest(rc == 3, "Incorrect string count; rc=%i\n", rc);
+ rc = of_property_read_string_array(np, "unterminated-string", strings, 4);
+ selftest(rc == -EILSEQ, "unterminated string; rc=%i\n", rc);
+ /* -- An incorrectly formed string should cause a failure */
+ rc = of_property_read_string_array(np, "unterminated-string-list", strings, 4);
+ selftest(rc == -EILSEQ, "unterminated string array; rc=%i\n", rc);
+ /* -- parsing the correctly formed strings should still work: */
+ strings[2] = NULL;
+ rc = of_property_read_string_array(np, "unterminated-string-list", strings, 2);
+ selftest(rc == 2 && strings[2] == NULL, "of_property_read_string_array() failure; rc=%i\n", rc);
+ strings[1] = NULL;
+ rc = of_property_read_string_array(np, "phandle-list-names", strings, 1);
+ selftest(rc == 1 && strings[1] == NULL, "Overwrote end of string array; rc=%i, str='%s'\n", rc, strings[1]);
}

static int __init of_selftest(void)
diff --git a/include/linux/of.h b/include/linux/of.h
index fa7fb1d..ac58796 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -212,14 +212,12 @@ extern int of_property_read_u64(const struct device_node *np,
extern int of_property_read_string(struct device_node *np,
const char *propname,
const char **out_string);
-extern int of_property_read_string_index(struct device_node *np,
- const char *propname,
- int index, const char **output);
extern int of_property_match_string(struct device_node *np,
const char *propname,
const char *string);
-extern int of_property_count_strings(struct device_node *np,
- const char *propname);
+extern int of_property_read_string_helper(struct device_node *np,
+ const char *propname,
+ const char **out_strs, size_t sz, int index);
extern int of_device_is_compatible(const struct device_node *device,
const char *);
extern int of_device_is_available(const struct device_node *device);
@@ -304,15 +302,9 @@ static inline int of_property_read_string(struct device_node *np,
return -ENOSYS;
}

-static inline int of_property_read_string_index(struct device_node *np,
- const char *propname, int index,
- const char **out_string)
-{
- return -ENOSYS;
-}
-
-static inline int of_property_count_strings(struct device_node *np,
- const char *propname)
+static inline int of_property_read_string_helper(struct device_node *np,
+ const char *propname,
+ const char **out_strs, size_t sz, int index)
{
return -ENOSYS;
}
@@ -352,6 +344,70 @@ static inline int of_machine_is_compatible(const char *compat)
#endif /* CONFIG_OF */

/**
+ * of_property_read_string_array() - Read an array of strings from a multiple
+ * strings property.
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @out_strs: output array of string pointers.
+ * @sz: number of array elements to read.
+ *
+ * Search for a property in a device tree node and retrieve a list of
+ * terminated string values (pointer to data, not a copy) in that property.
+ *
+ * If @out_strs is NULL, the number of strings in the property is returned.
+ */
+static inline int of_property_read_string_array(struct device_node *np,
+ const char *propname, const char **out_strs,
+ size_t sz)
+{
+ return of_property_read_string_helper(np, propname, out_strs, sz, 0);
+}
+
+/**
+ * of_property_count_strings() - Find and return the number of strings from a
+ * multiple strings property.
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ *
+ * Search for a property in a device tree node and retrieve the number of null
+ * terminated string contain in it. Returns the number of strings on
+ * success, -EINVAL if the property does not exist, -ENODATA if property
+ * does not have a value, and -EILSEQ if the string is not null-terminated
+ * within the length of the property data.
+ */
+static inline int of_property_count_strings(struct device_node *np,
+ const char *propname)
+{
+ return of_property_read_string_helper(np, propname, NULL, 0, 0);
+}
+
+/**
+ * of_property_read_string_index() - Find and read a string from a multiple
+ * strings property.
+ * @np: device node from which the property value is to be read.
+ * @propname: name of the property to be searched.
+ * @index: index of the string in the list of strings
+ * @out_string: pointer to null terminated return string, modified only if
+ * return value is 0.
+ *
+ * Search for a property in a device tree node and retrieve a null
+ * terminated string value (pointer to data, not a copy) in the list of strings
+ * contained in that property.
+ * Returns 0 on success, -EINVAL if the property does not exist, -ENODATA if
+ * property does not have a value, and -EILSEQ if the string is not
+ * null-terminated within the length of the property data.
+ *
+ * The out_string pointer is modified only if a valid string can be decoded.
+ */
+static inline int of_property_read_string_index(struct device_node *np,
+ const char *propname,
+ int index, const char **output)
+{
+ int rc = of_property_read_string_helper(np, propname, output, 1, index);
+ return rc < 0 ? rc : 0;
+}
+
+/**
* of_property_read_bool - Findfrom a property
* @np: device node from which the property value is to be read.
* @propname: name of the property to be searched.

li...@kernel.org

unread,
Jan 27, 2015, 11:25:04 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Lars-Peter Clausen, Jonathan Cameron, Zefan Li
From: Lars-Peter Clausen <la...@metafoo.de>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 79fa64eb2ee8ccb4bcad7f54caa2699730b10b22 upstream.

We should check if a channel is enabled, not if no channels are enabled.

Fixes: 550268ca1111 ("staging:iio: scrap scan_count and ensure all drivers use active_scan_mask")
Signed-off-by: Lars-Peter Clausen <la...@metafoo.de>
Signed-off-by: Jonathan Cameron <ji...@kernel.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/staging/iio/meter/ade7758_ring.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/iio/meter/ade7758_ring.c b/drivers/staging/iio/meter/ade7758_ring.c
index c45b23b..629a6ed 100644
--- a/drivers/staging/iio/meter/ade7758_ring.c
+++ b/drivers/staging/iio/meter/ade7758_ring.c
@@ -96,7 +96,7 @@ static int ade7758_ring_preenable(struct iio_dev *indio_dev)
size_t d_size;
unsigned channel;

- if (!bitmap_empty(indio_dev->active_scan_mask, indio_dev->masklength))
+ if (bitmap_empty(indio_dev->active_scan_mask, indio_dev->masklength))
return -EINVAL;

channel = find_first_bit(indio_dev->active_scan_mask,

li...@kernel.org

unread,
Jan 27, 2015, 11:25:17 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Peter Hurley, Greg Kroah-Hartman, Zefan Li
From: Peter Hurley <pe...@hurleysoftware.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 37b164578826406a173ca7c20d9ba7430134d23e upstream.

Kernel oops can cause the tty to be unreleaseable (for example, if
n_tty_read() crashes while on the read_wait queue). This will cause
tty_release() to endlessly loop without sleeping.

Use a killable sleep timeout which grows by 2n+1 jiffies over the interval
[0, 120 secs.) and then jumps to forever (but still killable).

NB: killable just allows for the task to be rewoken manually, not
to be terminated.

Signed-off-by: Peter Hurley <pe...@hurleysoftware.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/tty/tty_io.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/tty/tty_io.c b/drivers/tty/tty_io.c
index b28d635..2d66beed 100644
--- a/drivers/tty/tty_io.c
+++ b/drivers/tty/tty_io.c
@@ -1633,6 +1633,7 @@ int tty_release(struct inode *inode, struct file *filp)
int devpts;
int idx;
char buf[64];
+ long timeout = 0;

if (tty_paranoia_check(tty, inode, __func__))
return 0;
@@ -1717,7 +1718,11 @@ int tty_release(struct inode *inode, struct file *filp)
__func__, tty_name(tty, buf));
tty_unlock();
mutex_unlock(&tty_mutex);
- schedule();
+ schedule_timeout_killable(timeout);
+ if (timeout < 120 * HZ)
+ timeout = 2 * timeout + 1;
+ else
+ timeout = MAX_SCHEDULE_TIMEOUT;
}

/*

li...@kernel.org

unread,
Jan 27, 2015, 11:25:22 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Mikulas Patocka, Mike Snitzer, Zefan Li
From: Mikulas Patocka <mpat...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 9d28eb12447ee08bb5d1e8bb3195cf20e1ecd1c0 upstream.

The shrinker uses gfp flags to indicate what kind of operation can the
driver wait for. If __GFP_IO flag is present, the driver can wait for
block I/O operations, if __GFP_FS flag is present, the driver can wait on
operations involving the filesystem.

dm-bufio tested for __GFP_IO. However, dm-bufio can run on a loop block
device that makes calls into the filesystem. If __GFP_IO is present and
__GFP_FS isn't, dm-bufio could still block on filesystem operations if it
runs on a loop block device.

The change from __GFP_IO to __GFP_FS supposedly fixes one observed (though
unreproducible) deadlock involving dm-bufio and loop device.

Signed-off-by: Mikulas Patocka <mpat...@redhat.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>
[lizf: Backported to 3.4:
- drop changes to dm_bufio_shrink_scan() and dm_bufio_shrink_count()
- change __GFP_IO to __GFP_FS in shrink()]
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/md/dm-bufio.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 5abc8d6..535b09c 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -1379,9 +1379,9 @@ static void drop_buffers(struct dm_bufio_client *c)

/*
* Test if the buffer is unused and too old, and commit it.
- * At if noio is set, we must not do any I/O because we hold
- * dm_bufio_clients_lock and we would risk deadlock if the I/O gets rerouted to
- * different bufio client.
+ * And if GFP_NOFS is used, we must not do any I/O because we hold
+ * dm_bufio_clients_lock and we would risk deadlock if the I/O gets
+ * rerouted to different bufio client.
*/
static int __cleanup_old_buffer(struct dm_buffer *b, gfp_t gfp,
unsigned long max_jiffies)
@@ -1389,7 +1389,7 @@ static int __cleanup_old_buffer(struct dm_buffer *b, gfp_t gfp,
if (jiffies - b->last_accessed < max_jiffies)
return 1;

- if (!(gfp & __GFP_IO)) {
+ if (!(gfp & __GFP_FS)) {
if (test_bit(B_READING, &b->state) ||
test_bit(B_WRITING, &b->state) ||
test_bit(B_DIRTY, &b->state))
@@ -1428,7 +1428,7 @@ static int shrink(struct shrinker *shrinker, struct shrink_control *sc)
unsigned long r;
unsigned long nr_to_scan = sc->nr_to_scan;

- if (sc->gfp_mask & __GFP_IO)
+ if (sc->gfp_mask & __GFP_FS)
dm_bufio_lock(c);
else if (!dm_bufio_trylock(c))
return !nr_to_scan ? 0 : -1;

li...@kernel.org

unread,
Jan 27, 2015, 11:25:28 PM1/27/15
to sta...@vger.kernel.org, linux-...@vger.kernel.org, Heinz Mauelshagen, Dan Carpenter, Mike Snitzer, Zefan Li
From: Heinz Mauelshagen <hei...@redhat.com>

3.4.106-rc1 review patch. If anyone has any objections, please let me know.

------------------


commit 40d43c4b4cac4c2647bf07110d7b07d35f399a84 upstream.

The dm-raid superblock (struct dm_raid_superblock) is padded to 512
bytes and that size is being used to read it in from the metadata
device into one preallocated page.

Reading or writing this on a 512-byte sector device works fine but on
a 4096-byte sector device this fails.

Set the dm-raid superblock's size to the logical block size of the
metadata device, because IO at that size is guaranteed too work. Also
add a size check to avoid silent partial metadata loss in case the
superblock should ever grow past the logical block size or PAGE_SIZE.

[includes pointer math fix from Dan Carpenter]
Reported-by: "Liuhua Wang" <lw...@suse.com>
Signed-off-by: Heinz Mauelshagen <hei...@redhat.com>
Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Mike Snitzer <sni...@redhat.com>
Signed-off-by: Zefan Li <liz...@huawei.com>
---
drivers/md/dm-raid.c | 11 +++++++----
1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/md/dm-raid.c b/drivers/md/dm-raid.c
index ead5ca9..5dea02c 100644
--- a/drivers/md/dm-raid.c
+++ b/drivers/md/dm-raid.c
@@ -592,8 +592,7 @@ struct dm_raid_superblock {
__le32 layout;
__le32 stripe_sectors;

- __u8 pad[452]; /* Round struct to 512 bytes. */
- /* Always set to 0 when writing. */
+ /* Remainder of a logical block is zero-filled when writing (see super_sync()). */
} __packed;

static int read_disk_sb(struct md_rdev *rdev, int size)
@@ -628,7 +627,7 @@ static void super_sync(struct mddev *mddev, struct md_rdev *rdev)
if ((r->raid_disk >= 0) && test_bit(Faulty, &r->flags))
failed_devices |= (1ULL << r->raid_disk);

- memset(sb, 0, sizeof(*sb));
+ memset(sb + 1, 0, rdev->sb_size - sizeof(*sb));

sb->magic = cpu_to_le32(DM_RAID_MAGIC);
sb->features = cpu_to_le32(0); /* No features yet */
@@ -663,7 +662,11 @@ static int super_load(struct md_rdev *rdev, struct md_rdev *refdev)
uint64_t events_sb, events_refsb;

rdev->sb_start = 0;
- rdev->sb_size = sizeof(*sb);
+ rdev->sb_size = bdev_logical_block_size(rdev->meta_bdev);
+ if (rdev->sb_size < sizeof(*sb) || rdev->sb_size > PAGE_SIZE) {
+ DMERR("superblock size of a logical block is no longer valid");
+ return -EINVAL;
+ }

ret = read_disk_sb(rdev, rdev->sb_size);
if (ret)
It is loading more messages.
0 new messages