[ 007/180] futex: Fix uninterruptible loop due to gate

Willy Tarreau

unread,

Oct 1, 2012, 6:59:30 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Hugh Dickins, Linus Torvalds, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Hugh Dickins <hu...@google.com>

commit e6780f7243eddb133cc20ec37fa69317c218b709 upstream.

It was found (by Sasha) that if you use a futex located in the gate
area we get stuck in an uninterruptible infinite loop, much like the
ZERO_PAGE issue.

While looking at this problem, PeterZ realized you'll get into similar
trouble when hitting any install_special_pages() mapping. And are there
still drivers setting up their own special mmaps without page->mapping,
and without special VM or pte flags to make get_user_pages fail?

In most cases, if page->mapping is NULL, we do not need to retry at all:
Linus points out that even /proc/sys/vm/drop_caches poses no problem,
because it ends up using remove_mapping(), which takes care not to
interfere when the page reference count is raised.

But there is still one case which does need a retry: if memory pressure
called shmem_writepage in between get_user_pages_fast dropping page
table lock and our acquiring page lock, then the page gets switched from
filecache to swapcache (and ->mapping set to NULL) whatever the refcount.
Fault it back in to get the page->mapping needed for key->shared.inode.

Reported-by: Sasha Levin <levins...@gmail.com>
Signed-off-by: Hugh Dickins <hu...@google.com>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
[PG: 2.6.34 variable is page, not page_head, since it doesn't have a5b338f2]
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
kernel/futex.c | 28 ++++++++++++++++++++--------
1 files changed, 20 insertions(+), 8 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index fb98c9f..0b06da1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -264,17 +264,29 @@ again:

page = compound_head(page);
lock_page(page);
+
+ /*
+ * If page->mapping is NULL, then it cannot be a PageAnon
+ * page; but it might be the ZERO_PAGE or in the gate area or
+ * in a special mapping (all cases which we are happy to fail);
+ * or it may have been a good file page when get_user_pages_fast
+ * found it, but truncated or holepunched or subjected to
+ * invalidate_complete_page2 before we got the page lock (also
+ * cases which we are happy to fail). And we hold a reference,
+ * so refcount care in invalidate_complete_page's remove_mapping
+ * prevents drop_caches from setting mapping to NULL beneath us.
+ *
+ * The case we do have to guard against is when memory pressure made
+ * shmem_writepage move it from filecache to swapcache beneath us:
+ * an unlikely race, but we do need to retry for page->mapping.
+ */
if (!page->mapping) {
+ int shmem_swizzled = PageSwapCache(page);
unlock_page(page);
put_page(page);
- /*
- * ZERO_PAGE pages don't have a mapping. Avoid a busy loop
- * trying to find one. RW mapping would have COW'd (and thus
- * have a mapping) so this page is RO and won't ever change.
- */
- if ((page == ZERO_PAGE(address)))
- return -EFAULT;
- goto again;
+ if (shmem_swizzled)
+ goto again;
+ return -EFAULT;
}

/*
--
1.7.2.1.45.g54fbc

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majo...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Willy Tarreau

unread,

Oct 1, 2012, 6:59:36 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Willy Tarreau, Jiri Slaby, Bjorn Helgaas, Jesse Barnes

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Willy Tarreau <w...@1wt.eu>

Initial stable commit : 2215d91091c465fd58da7814d1c10e09ac2d8307

This patch backported into 2.6.32.55 is enabled when CONFIG_AMD_NB is set,
but this config option does not exist in 2.6.32, it was called CONFIG_K8_NB,
so the fix was never applied. Some other changes were needed to make it work.
first, the correct include file name was asm/k8.h and not asm/amd_nb.h, and
second, amd_get_mmconfig_range() is needed and was merged by previous patch.

Thanks to Jiri Slabi who reported the issue and diagnosed all the dependencies.

Signed-off-by: Willy Tarreau <w...@1wt.eu>
Cc: Jiri Slaby <jsl...@suse.cz>
Cc: Bjorn Helgaas <bhel...@google.com>
Cc: Jesse Barnes <jba...@virtuousgeek.org>
---
arch/x86/pci/amd_bus.c | 1 +
drivers/pnp/quirks.c | 6 +++---
2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index cb34763..aae9931 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -3,6 +3,7 @@
#include <linux/topology.h>
#include <linux/cpu.h>
#include <asm/pci_x86.h>
+#include <asm/k8.h>

#ifdef CONFIG_X86_64
#include <asm/pci-direct.h>
diff --git a/drivers/pnp/quirks.c b/drivers/pnp/quirks.c
index eb39d26..253996c 100644
--- a/drivers/pnp/quirks.c
+++ b/drivers/pnp/quirks.c
@@ -300,9 +300,9 @@ static void quirk_system_pci_resources(struct pnp_dev *dev)
}
}

-#ifdef CONFIG_AMD_NB
+#ifdef CONFIG_K8_NB

-#include <asm/amd_nb.h>
+#include <asm/k8.h>

static void quirk_amd_mmconfig_area(struct pnp_dev *dev)
{
@@ -366,7 +366,7 @@ static struct pnp_fixup pnp_fixups[] = {
/* PnP resources that might overlap PCI BARs */
{"PNP0c01", quirk_system_pci_resources},
{"PNP0c02", quirk_system_pci_resources},
-#ifdef CONFIG_AMD_NB
+#ifdef CONFIG_K8_NB
{"PNP0c01", quirk_amd_mmconfig_area},
#endif
{""}

Willy Tarreau

unread,

Oct 1, 2012, 6:59:37 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Robert Richter, Greg KH, Junxiao Bi, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Junxiao Bi <junxi...@oracle.com>

If one kernel path is using KM_USER0 slot and is interrupted by
the oprofile nmi, then in copy_from_user_nmi(), the KM_USER0 slot
will be overwrite and cleared to zero at last, when the control
return to the original kernel path, it will access an invalid
virtual address and trigger a crash.

Cc: Robert Richter <robert....@amd.com>
Cc: Greg KH <gre...@linuxfoundation.org>
Cc: sta...@vger.kernel.org
Signed-off-by: Junxiao Bi <junxi...@oracle.com>

[WT: According to Junxiao and Robert, this patch is needed for stable kernels
which include a backport of a0e3e70243f5b270bc3eca718f0a9fa5e6b8262e without
3e4d3af501cccdc8a8cca41bdbe57d54ad7e7e73, but there is no exact equivalent in
mainline]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/oprofile/backtrace.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/oprofile/backtrace.c b/arch/x86/oprofile/backtrace.c
index 829edf0..b50a280 100644
--- a/arch/x86/oprofile/backtrace.c
+++ b/arch/x86/oprofile/backtrace.c
@@ -71,9 +71,9 @@ copy_from_user_nmi(void *to, const void __user *from, unsigned long n)
offset = addr & (PAGE_SIZE - 1);
size = min(PAGE_SIZE - offset, n - len);

- map = kmap_atomic(page, KM_USER0);
+ map = kmap_atomic(page, KM_NMI);
memcpy(to, map+offset, size);
- kunmap_atomic(map, KM_USER0);
+ kunmap_atomic(map, KM_NMI);
put_page(page);

len += size;

Willy Tarreau

unread,

Oct 1, 2012, 6:59:45 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Alex Williamson, Marcelo Tosatti, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Alex Williamson <alex.wi...@redhat.com>

commit 423873736b78f549fbfa2f715f2e4de7e6c5e1e9 upstream

This option has no users and it exposes a security hole that we
can allow devices to be assigned without iommu protection. Make
KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option.

Signed-off-by: Alex Williamson <alex.wi...@redhat.com>
Signed-off-by: Marcelo Tosatti <mtos...@redhat.com>
[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

virt/kvm/kvm_main.c | 18 +++++++++---------
1 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 4f3434f..77288e2 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -582,6 +582,9 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,
struct kvm_assigned_dev_kernel *match;
struct pci_dev *dev;

+ if (!(assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU))
+ return -EINVAL;
+
down_read(&kvm->slots_lock);
mutex_lock(&kvm->lock);

@@ -635,16 +638,14 @@ static int kvm_vm_ioctl_assign_device(struct kvm *kvm,

list_add(&match->list, &kvm->arch.assigned_dev_head);

- if (assigned_dev->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU) {
- if (!kvm->arch.iommu_domain) {
- r = kvm_iommu_map_guest(kvm);
- if (r)
- goto out_list_del;
- }
- r = kvm_assign_device(kvm, match);
+ if (!kvm->arch.iommu_domain) {
+ r = kvm_iommu_map_guest(kvm);
if (r)
goto out_list_del;
}
+ r = kvm_assign_device(kvm, match);
+ if (r)
+ goto out_list_del;

out:
mutex_unlock(&kvm->lock);
@@ -683,8 +684,7 @@ static int kvm_vm_ioctl_deassign_device(struct kvm *kvm,
goto out;
}

- if (match->flags & KVM_DEV_ASSIGN_ENABLE_IOMMU)
- kvm_deassign_device(kvm, match);
+ kvm_deassign_device(kvm, match);

kvm_free_assigned_device(kvm, match);

Willy Tarreau

unread,

Oct 1, 2012, 6:59:45 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Xiaotian Feng, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Xiaotian Feng <df...@redhat.com>

commit 00bff392c81e4fb1901e5160fdd5afdb2546a6ab upstream.

The current tty_audit_add_data code:

do {
size_t run;

run = N_TTY_BUF_SIZE - buf->valid;
if (run > size)
run = size;
memcpy(buf->data + buf->valid, data, run);
buf->valid += run;
data += run;
size -= run;
if (buf->valid == N_TTY_BUF_SIZE)
tty_audit_buf_push_current(buf);
} while (size != 0);

If the current buffer is full, kernel will then call tty_audit_buf_push_current
to empty the buffer. But if we disabled audit at the same time, tty_audit_buf_push()
returns immediately if audit_enabled is zero. Without emptying the buffer.
With obvious effect on tty_audit_add_data() that ends up spinning in that loop,
copying 0 bytes at each iteration and attempting to push each time without any effect.
Holding the lock all along.

Suggested-by: Alexander Viro <av...@redhat.com>
Signed-off-by: Xiaotian Feng <df...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@suse.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/tty_audit.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/char/tty_audit.c b/drivers/char/tty_audit.c
index ac16fbe..dd4691a 100644
--- a/drivers/char/tty_audit.c
+++ b/drivers/char/tty_audit.c
@@ -94,8 +94,10 @@ static void tty_audit_buf_push(struct task_struct *tsk, uid_t loginuid,
{
if (buf->valid == 0)
return;
- if (audit_enabled == 0)
+ if (audit_enabled == 0) {
+ buf->valid = 0;
return;
+ }
tty_audit_log("tty", tsk, loginuid, sessionid, buf->major, buf->minor,
buf->data, buf->valid);
buf->valid = 0;

Willy Tarreau

unread,

Oct 1, 2012, 6:59:54 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Benjamin Herrenschmidt, Jeremy Kerr, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Benjamin Herrenschmidt <be...@kernel.crashing.org>

commit 78c5c68a4cf4329d17abfa469345ddf323d4fd62 upstream.

The code for "powersurge" SMP would kick in and cause a crash
at boot due to the lack of a NULL test.

Adam Conrad reports that the 3.2 kernel, with CONFIG_SMP=y, will not
boot on an OldWorld G3; we're unconditionally writing to psurge_start,
but this is only set on powersurge machines.

Signed-off-by: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Signed-off-by: Jeremy Kerr <jerem...@canonical.com>
Reported-by: Adam Conrad <adco...@ubuntu.com>
Tested-by: Adam Conrad <adco...@ubuntu.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/powerpc/platforms/powermac/smp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/smp.c b/arch/powerpc/platforms/powermac/smp.c
index b40c22d..7f66d0c 100644
--- a/arch/powerpc/platforms/powermac/smp.c
+++ b/arch/powerpc/platforms/powermac/smp.c
@@ -402,7 +402,7 @@ static struct irqaction psurge_irqaction = {

static void __init smp_psurge_setup_cpu(int cpu_nr)
{
- if (cpu_nr != 0)
+ if (cpu_nr != 0 || !psurge_start)
return;

/* reset the entry point so if we get another intr we won't

Willy Tarreau

unread,

Oct 1, 2012, 7:00:02 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jonghwan Choi, Serge Hallyn, James Morris, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jonghwan Choi <jhbir...@samsung.com>

commit 51b79bee627d526199b2f6a6bef8ee0c0739b6d1 upstream

Add missing "personality.h"
security/commoncap.c: In function 'cap_bprm_set_creds':
security/commoncap.c:510: error: 'PER_CLEAR_ON_SETID' undeclared (first use in this function)
security/commoncap.c:510: error: (Each undeclared identifier is reported only once
security/commoncap.c:510: error: for each function it appears in.)

Signed-off-by: Jonghwan Choi <jhbir...@samsung.com>
Acked-by: Serge Hallyn <serge....@canonical.com>
Signed-off-by: James Morris <james.l...@oracle.com>
[dannf: adjusted to apply to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

security/commoncap.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/security/commoncap.c b/security/commoncap.c
index 30972d6..ee9d623 100644
--- a/security/commoncap.c
+++ b/security/commoncap.c
@@ -27,6 +27,7 @@
#include <linux/sched.h>
#include <linux/prctl.h>
#include <linux/securebits.h>
+#include <linux/personality.h>

/*
* If a non-root user executes a setuid-root binary in

Willy Tarreau

unread,

Oct 1, 2012, 7:00:15 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Sasha Levin, Thomas Gleixner, Prarit Bhargava, John Stultz, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john....@linaro.org>

This is a backport of 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d

This should have been backported when it was commited, but I
mistook the problem as requiring the ntp_lock changes
that landed in 3.4 in order for it to occur.

Unfortunately the same issue can happen (with only one cpu)
as follows:
do_adjtimex()
write_seqlock_irq(&xtime_lock);
process_adjtimex_modes()
process_adj_status()
ntp_start_leap_timer()
hrtimer_start()
hrtimer_reprogram()
tick_program_event()
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock); [DEADLOCK]

This deadlock will no always occur, as it requires the
leap_timer to force a hrtimer_reprogram which only happens
if its set and there's no sooner timer to expire.

NOTE: This patch, being faithful to the original commit,
introduces a bug (we don't update wall_to_monotonic),
which will be resovled by backporting a following fix.

Original commit message below:

Since commit 7dffa3c673fbcf835cd7be80bb4aec8ad3f51168 the ntp
subsystem has used an hrtimer for triggering the leapsecond
adjustment. However, this can cause a potential livelock.

Thomas diagnosed this as the following pattern:
CPU 0 CPU 1
do_adjtimex()
spin_lock_irq(&ntp_lock);
process_adjtimex_modes(); timer_interrupt()
process_adj_status(); do_timer()
ntp_start_leap_timer(); write_lock(&xtime_lock);
hrtimer_start(); update_wall_time();
hrtimer_reprogram(); ntp_tick_length()
tick_program_event() spin_lock(&ntp_lock);
clockevents_program_event()
ktime_get()
seq = req_seqbegin(xtime_lock);

This patch tries to avoid the problem by reverting back to not using
an hrtimer to inject leapseconds, and instead we handle the leapsecond
processing in the second_overflow() function.

The downside to this change is that on systems that support highres
timers, the leap second processing will occur on a HZ tick boundary,
(ie: ~1-10ms, depending on HZ) after the leap second instead of
possibly sooner (~34us in my tests w/ x86_64 lapic).

This patch applies on top of tip/timers/core.

CC: Sasha Levin <levins...@gmail.com>
CC: Thomas Gleixner <tg...@linutronix.de>
Reported-by: Sasha Levin <levins...@gmail.com>
Diagnoised-by: Thomas Gleixner <tg...@linutronix.de>
Tested-by: Sasha Levin <levins...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/timex.h | 2 +-
kernel/time/ntp.c | 122 +++++++++++++++------------------------------
kernel/time/timekeeping.c | 12 +---
3 files changed, 44 insertions(+), 92 deletions(-)

diff --git a/include/linux/timex.h b/include/linux/timex.h
index e6967d1..3b587b4 100644
--- a/include/linux/timex.h
+++ b/include/linux/timex.h
@@ -271,7 +271,7 @@ static inline int ntp_synced(void)
/* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */
extern u64 tick_length;

-extern void second_overflow(void);
+extern int second_overflow(unsigned long secs);
extern void update_ntp_one_tick(void);
extern int do_adjtimex(struct timex *);

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 4800f93..dc76c9a 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -28,8 +28,6 @@ unsigned long tick_nsec;
u64 tick_length;
static u64 tick_length_base;

-static struct hrtimer leap_timer;
-
#define MAX_TICKADJ 500LL /* usecs */
#define MAX_TICKADJ_SCALED \
(((MAX_TICKADJ * NSEC_PER_USEC) << NTP_SCALE_SHIFT) / NTP_INTERVAL_FREQ)
@@ -180,60 +178,60 @@ void ntp_clear(void)
}

/*
- * Leap second processing. If in leap-insert state at the end of the
- * day, the system clock is set back one second; if in leap-delete
- * state, the system clock is set ahead one second.
+ * this routine handles the overflow of the microsecond field
+ *
+ * The tricky bits of code to handle the accurate clock support
+ * were provided by Dave Mills (Mi...@UDEL.EDU) of NTP fame.
+ * They were originally developed for SUN and DEC kernels.
+ * All the kudos should go to Dave for this stuff.
+ *
+ * Also handles leap second processing, and returns leap offset
*/
-static enum hrtimer_restart ntp_leap_second(struct hrtimer *timer)
+int second_overflow(unsigned long secs)
{
- enum hrtimer_restart res = HRTIMER_NORESTART;
-
- write_seqlock(&xtime_lock);
+ int leap = 0;
+ s64 delta;

+ /*
+ * Leap second processing. If in leap-insert state at the end of the
+ * day, the system clock is set back one second; if in leap-delete
+ * state, the system clock is set ahead one second.
+ */
switch (time_state) {
case TIME_OK:
+ if (time_status & STA_INS)
+ time_state = TIME_INS;
+ else if (time_status & STA_DEL)
+ time_state = TIME_DEL;
break;
case TIME_INS:
- timekeeping_leap_insert(-1);
- time_state = TIME_OOP;
- printk(KERN_NOTICE
- "Clock: inserting leap second 23:59:60 UTC\n");
- hrtimer_add_expires_ns(&leap_timer, NSEC_PER_SEC);
- res = HRTIMER_RESTART;
+ if (secs % 86400 == 0) {
+ leap = -1;
+ time_state = TIME_OOP;
+ printk(KERN_NOTICE
+ "Clock: inserting leap second 23:59:60 UTC\n");
+ }
break;
case TIME_DEL:
- timekeeping_leap_insert(1);
- time_tai--;
- time_state = TIME_WAIT;
- printk(KERN_NOTICE
- "Clock: deleting leap second 23:59:59 UTC\n");
+ if ((secs + 1) % 86400 == 0) {
+ leap = 1;
+ time_tai--;
+ time_state = TIME_WAIT;
+ printk(KERN_NOTICE
+ "Clock: deleting leap second 23:59:59 UTC\n");
+ }
break;
case TIME_OOP:
time_tai++;
time_state = TIME_WAIT;
- /* fall through */
+ break;
+
case TIME_WAIT:
if (!(time_status & (STA_INS | STA_DEL)))
time_state = TIME_OK;
break;
}

- write_sequnlock(&xtime_lock);
-
- return res;
-}
-
-/*
- * this routine handles the overflow of the microsecond field
- *
- * The tricky bits of code to handle the accurate clock support
- * were provided by Dave Mills (Mi...@UDEL.EDU) of NTP fame.
- * They were originally developed for SUN and DEC kernels.
- * All the kudos should go to Dave for this stuff.
- */
-void second_overflow(void)
-{
- s64 delta;

/* Bump the maxerror field */
time_maxerror += MAXFREQ / NSEC_PER_USEC;
@@ -253,23 +251,25 @@ void second_overflow(void)
tick_length += delta;

if (!time_adjust)
- return;
+ goto out;

if (time_adjust > MAX_TICKADJ) {
time_adjust -= MAX_TICKADJ;
tick_length += MAX_TICKADJ_SCALED;
- return;
+ goto out;
}

if (time_adjust < -MAX_TICKADJ) {
time_adjust += MAX_TICKADJ;
tick_length -= MAX_TICKADJ_SCALED;
- return;
+ goto out;
}

tick_length += (s64)(time_adjust * NSEC_PER_USEC / NTP_INTERVAL_FREQ)
<< NTP_SCALE_SHIFT;
time_adjust = 0;
+out:
+ return leap;
}

#ifdef CONFIG_GENERIC_CMOS_UPDATE
@@ -331,27 +331,6 @@ static void notify_cmos_timer(void)
static inline void notify_cmos_timer(void) { }
#endif

-/*
- * Start the leap seconds timer:
- */
-static inline void ntp_start_leap_timer(struct timespec *ts)
-{
- long now = ts->tv_sec;
-
- if (time_status & STA_INS) {
- time_state = TIME_INS;
- now += 86400 - now % 86400;
- hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
-
- return;
- }
-
- if (time_status & STA_DEL) {
- time_state = TIME_DEL;
- now += 86400 - (now + 1) % 86400;
- hrtimer_start(&leap_timer, ktime_set(now, 0), HRTIMER_MODE_ABS);
- }
-}

/*
* Propagate a new txc->status value into the NTP state:
@@ -374,22 +353,6 @@ static inline void process_adj_status(struct timex *txc, struct timespec *ts)
time_status &= STA_RONLY;
time_status |= txc->status & ~STA_RONLY;

- switch (time_state) {
- case TIME_OK:
- ntp_start_leap_timer(ts);
- break;
- case TIME_INS:
- case TIME_DEL:
- time_state = TIME_OK;
- ntp_start_leap_timer(ts);
- case TIME_WAIT:
- if (!(time_status & (STA_INS | STA_DEL)))
- time_state = TIME_OK;
- break;
- case TIME_OOP:
- hrtimer_restart(&leap_timer);
- break;
- }
}
/*
* Called with the xtime lock held, so we can access and modify
@@ -469,9 +432,6 @@ int do_adjtimex(struct timex *txc)
(txc->tick < 900000/USER_HZ ||
txc->tick > 1100000/USER_HZ))
return -EINVAL;
-
- if (txc->modes & ADJ_STATUS && time_state != TIME_OK)
- hrtimer_cancel(&leap_timer);
}

getnstimeofday(&ts);
@@ -549,6 +509,4 @@ __setup("ntp_tick_adj=", ntp_tick_adj_setup);
void __init ntp_init(void)
{
ntp_clear();
- hrtimer_init(&leap_timer, CLOCK_REALTIME, HRTIMER_MODE_ABS);
- leap_timer.function = ntp_leap_second;
}
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 4a71cff..00e2fae 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -183,14 +183,6 @@ void update_xtime_cache(u64 nsec)
ACCESS_ONCE(xtime_cache) = ts;
}

-/* must hold xtime_lock */
-void timekeeping_leap_insert(int leapsecond)
-{
- xtime.tv_sec += leapsecond;
- wall_to_monotonic.tv_sec -= leapsecond;
- update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
-}
-
#ifdef CONFIG_GENERIC_TIME

/**
@@ -783,9 +775,11 @@ void update_wall_time(void)

timekeeper.xtime_nsec += timekeeper.xtime_interval;
if (timekeeper.xtime_nsec >= nsecps) {
+ int leap;
timekeeper.xtime_nsec -= nsecps;
xtime.tv_sec++;
- second_overflow();
+ leap = second_overflow(xtime.tv_sec);
+ xtime.tv_sec += leap;
}

raw_time.tv_nsec += timekeeper.raw_interval;

Willy Tarreau

unread,

Oct 1, 2012, 7:00:22 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Mel Gorman, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgo...@suse.de>

commit eb3979f64d25120d60b9e761a4c58f70b1a02f86 upstream.

Distribution kernel maintainers routinely backport fixes for users that
were deemed important but not "something critical" as defined by the
rules. To users of these kernels they are very serious and failing to fix
them reduces the value of -stable.

The problem is that the patches fixing these issues are often subtle and
prone to regressions in other ways and need greater care and attention.
To combat this, these "serious" backports should have a higher barrier
to entry.

This patch relaxes the rules to allow a distribution maintainer to merge
to -stable a backported patch or small series that fixes a "serious"
user-visible performance issue. They should include additional information on
the user-visible bug affected and a link to the bugzilla entry if available.
The same rules about the patch being already in mainline still apply.

Signed-off-by: Mel Gorman <mgo...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

Documentation/stable_kernel_rules.txt | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/Documentation/stable_kernel_rules.txt b/Documentation/stable_kernel_rules.txt
index e6e482f..3c9d7ac 100644
--- a/Documentation/stable_kernel_rules.txt
+++ b/Documentation/stable_kernel_rules.txt
@@ -12,6 +12,12 @@ Rules on what kind of patches are accepted, and which ones are not, into the
marked CONFIG_BROKEN), an oops, a hang, data corruption, a real
security issue, or some "oh, that's not good" issue. In short, something
critical.
+ - Serious issues as reported by a user of a distribution kernel may also
+ be considered if they fix a notable performance or interactivity issue.
+ As these fixes are not as obvious and have a higher risk of a subtle
+ regression they should only be submitted by a distribution kernel
+ maintainer and include an addendum linking to a bugzilla entry if it
+ exists and additional information on the user-visible impact.
- New device IDs and quirks are also accepted.
- No "theoretical race condition" issues, unless an explanation of how the
race can be exploited is also provided.

Willy Tarreau

unread,

Oct 1, 2012, 7:00:34 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit e6d4947b12e8ad947add1032dd754803c6004824 upstream.

If the CPU supports a hardware random number generator, use it in
xfer_secondary_pool(), where it will significantly improve things and
where we can afford it.

Also, remove the use of the arch-specific rng in
add_timer_randomness(), since the call is significantly slower than
get_cycles(), and we're much better off using it in
xfer_secondary_pool() anyway.

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 25 ++++++++++++++++---------
1 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cfcb31d..b5b16f1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -254,6 +254,7 @@
#include <linux/cryptohash.h>
#include <linux/fips.h>
#include <linux/ptrace.h>
+#include <linux/kmemcheck.h>

#ifdef CONFIG_GENERIC_HARDIRQS
# include <linux/irq.h>
@@ -702,11 +703,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
goto out;

sample.jiffies = jiffies;
-
- /* Use arch random value, fall back to cycles */
- if (!arch_get_random_int(&sample.cycles))
- sample.cycles = get_cycles();
-
+ sample.cycles = get_cycles();
sample.num = num;
mix_pool_bytes(&input_pool, &sample, sizeof(sample), NULL);

@@ -838,7 +835,11 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
*/
static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
{
- __u32 tmp[OUTPUT_POOL_WORDS];
+ union {
+ __u32 tmp[OUTPUT_POOL_WORDS];
+ long hwrand[4];
+ } u;
+ int i;

if (r->pull && r->entropy_count < nbytes * 8 &&
r->entropy_count < r->poolinfo->POOLBITS) {
@@ -849,17 +850,23 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
/* pull at least as many as BYTES as wakeup BITS */
bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
/* but never more than the buffer size */
- bytes = min_t(int, bytes, sizeof(tmp));
+ bytes = min_t(int, bytes, sizeof(u.tmp));

DEBUG_ENT("going to reseed %s with %d bits "
"(%d of %d requested)\n",
r->name, bytes * 8, nbytes * 8, r->entropy_count);

- bytes = extract_entropy(r->pull, tmp, bytes,
+ bytes = extract_entropy(r->pull, u.tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
- mix_pool_bytes(r, tmp, bytes, NULL);
+ mix_pool_bytes(r, u.tmp, bytes, NULL);
credit_entropy_bits(r, bytes*8);
}
+ kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
+ for (i = 0; i < 4; i++)
+ if (arch_get_random_long(&u.hwrand[i]))
+ break;
+ if (i)
+ mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
}

/*

Willy Tarreau

unread,

Oct 1, 2012, 7:00:43 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Thomas Gleixner, Peter Zijlstra, Prarit Bhargava, John Stultz, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tg...@linutronix.de>

This is a backport of 196951e91262fccda81147d2bcf7fdab08668b40

We need to update the base offsets from this code and we need to do
that under base->lock. Move the lock held region around the
ktime_get() calls. The ktime_get() calls are going to be replaced with
a function which gets the time and the offsets atomically.

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Acked-by: Prarit Bhargava <pra...@redhat.com>
Cc: sta...@vger.kernel.org
Signed-off-by: John Stultz <john...@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-6-gi...@us.ibm.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>

Signed-off-by: John Stultz <john...@us.ibm.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/hrtimer.c | 5 +++--
1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index c4acec7..8ba6d31 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -1263,11 +1263,10 @@ void hrtimer_interrupt(struct clock_event_device *dev)
cpu_base->nr_events++;
dev->next_event.tv64 = KTIME_MAX;

+ spin_lock(&cpu_base->lock);
entry_time = now = ktime_get();
retry:
expires_next.tv64 = KTIME_MAX;
-
- spin_lock(&cpu_base->lock);
/*
* We set expires_next to KTIME_MAX here with cpu_base->lock
* held to prevent that a timer is enqueued in our queue via
@@ -1342,6 +1341,7 @@ retry:
* interrupt routine. We give it 3 attempts to avoid
* overreacting on some spurious event.
*/
+ spin_lock(&cpu_base->lock);
now = ktime_get();
cpu_base->nr_retries++;
if (++retries < 3)
@@ -1354,6 +1354,7 @@ retry:
*/
cpu_base->nr_hangs++;
cpu_base->hang_detected = 1;
+ spin_unlock(&cpu_base->lock);
delta = ktime_sub(now, entry_time);
if (delta.tv64 > cpu_base->max_hang_time.tv64)
cpu_base->max_hang_time = delta;

Willy Tarreau

unread,

Oct 1, 2012, 7:00:54 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Sasha Levin, john...@us.ibm.com, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Sasha Levin <levins...@gmail.com>

commit a078c6d0e6288fad6d83fb6d5edd91ddb7b6ab33 upstream.

'long secs' is passed as divisor to div_s64, which accepts a 32bit
divisor. On 64bit machines that value is trimmed back from 8 bytes
back to 4, causing a divide by zero when the number is bigger than
(1 << 32) - 1 and all 32 lower bits are 0.

Use div64_long() instead.

Signed-off-by: Sasha Levin <levins...@gmail.com>
Cc: john...@us.ibm.com
Link: http://lkml.kernel.org/r/1331829374-31543-2-git-...@gmail.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
[WT: div64_long() does not exist on 2.6.32 and needs a deeper backport than
desired. Instead, address the issue by controlling that the divisor is
correct for use as an s32 divisor]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/ntp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index c1c36a2..26472a7 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -106,7 +106,7 @@ static inline s64 ntp_update_offset_fll(s64 offset64, long secs)
{
time_status &= ~STA_MODE;

- if (secs < MINSEC)
+ if ((s32)secs < MINSEC)
return 0;

if (!(time_status & STA_FLL) && (secs <= MAXSEC))

Willy Tarreau

unread,

Oct 1, 2012, 7:01:01 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Ben Hutchings, David S. Miller, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Ben Hutchings <b...@decadent.org.uk>

commit e0bccd315db0c2f919e7fcf9cb60db21d9986f52 upstream

Define some constant offsets for CALL_REQUEST based on the description
at <http://www.techfest.com/networking/wan/x25plp.htm> and the
definition of ROSE as using 10-digit (5-byte) addresses. Use them
consistently. Validate all implicit and explicit facilities lengths.
Validate the address length byte rather than either trusting or
assuming its value.

Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
Signed-off-by: David S. Miller <da...@davemloft.net>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/net/rose.h | 8 ++++-
net/rose/af_rose.c | 8 ++--
net/rose/rose_loopback.c | 13 ++++++-
net/rose/rose_route.c | 20 +++++++----
net/rose/rose_subr.c | 91 +++++++++++++++++++++++++++++-----------------
5 files changed, 93 insertions(+), 47 deletions(-)

diff --git a/include/net/rose.h b/include/net/rose.h
index 5ba9f02..555dd19 100644
--- a/include/net/rose.h
+++ b/include/net/rose.h
@@ -14,6 +14,12 @@

#define ROSE_MIN_LEN 3

+#define ROSE_CALL_REQ_ADDR_LEN_OFF 3
+#define ROSE_CALL_REQ_ADDR_LEN_VAL 0xAA /* each address is 10 digits */
+#define ROSE_CALL_REQ_DEST_ADDR_OFF 4
+#define ROSE_CALL_REQ_SRC_ADDR_OFF 9
+#define ROSE_CALL_REQ_FACILITIES_OFF 14
+
#define ROSE_GFI 0x10
#define ROSE_Q_BIT 0x80
#define ROSE_D_BIT 0x40
@@ -214,7 +220,7 @@ extern void rose_requeue_frames(struct sock *);
extern int rose_validate_nr(struct sock *, unsigned short);
extern void rose_write_internal(struct sock *, int);
extern int rose_decode(struct sk_buff *, int *, int *, int *, int *, int *);
-extern int rose_parse_facilities(unsigned char *, struct rose_facilities_struct *);
+extern int rose_parse_facilities(unsigned char *, unsigned int, struct rose_facilities_struct *);
extern void rose_disconnect(struct sock *, int, int, int);

/* rose_timer.c */
diff --git a/net/rose/af_rose.c b/net/rose/af_rose.c
index 7d188bc..523efbb 100644
--- a/net/rose/af_rose.c
+++ b/net/rose/af_rose.c
@@ -983,7 +983,7 @@ int rose_rx_call_request(struct sk_buff *skb, struct net_device *dev, struct ros
struct sock *make;
struct rose_sock *make_rose;
struct rose_facilities_struct facilities;
- int n, len;
+ int n;

skb->sk = NULL; /* Initially we don't know who it's for */

@@ -992,9 +992,9 @@ int rose_rx_call_request(struct sk_buff *skb, struct net_device *dev, struct ros
*/
memset(&facilities, 0x00, sizeof(struct rose_facilities_struct));

- len = (((skb->data[3] >> 4) & 0x0F) + 1) >> 1;
- len += (((skb->data[3] >> 0) & 0x0F) + 1) >> 1;
- if (!rose_parse_facilities(skb->data + len + 4, &facilities)) {
+ if (!rose_parse_facilities(skb->data + ROSE_CALL_REQ_FACILITIES_OFF,
+ skb->len - ROSE_CALL_REQ_FACILITIES_OFF,
+ &facilities)) {
rose_transmit_clear_request(neigh, lci, ROSE_INVALID_FACILITY, 76);
return 0;
}
diff --git a/net/rose/rose_loopback.c b/net/rose/rose_loopback.c
index 114df6e..37965b8 100644
--- a/net/rose/rose_loopback.c
+++ b/net/rose/rose_loopback.c
@@ -72,9 +72,20 @@ static void rose_loopback_timer(unsigned long param)
unsigned int lci_i, lci_o;

while ((skb = skb_dequeue(&loopback_queue)) != NULL) {
+ if (skb->len < ROSE_MIN_LEN) {
+ kfree_skb(skb);
+ continue;
+ }
lci_i = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF);
frametype = skb->data[2];
- dest = (rose_address *)(skb->data + 4);
+ if (frametype == ROSE_CALL_REQUEST &&
+ (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
+ skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
+ ROSE_CALL_REQ_ADDR_LEN_VAL)) {
+ kfree_skb(skb);
+ continue;
+ }
+ dest = (rose_address *)(skb->data + ROSE_CALL_REQ_DEST_ADDR_OFF);
lci_o = 0xFFF - lci_i;

skb_reset_transport_header(skb);
diff --git a/net/rose/rose_route.c b/net/rose/rose_route.c
index 08230fa..1646b25 100644
--- a/net/rose/rose_route.c
+++ b/net/rose/rose_route.c
@@ -852,7 +852,7 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
unsigned int lci, new_lci;
unsigned char cause, diagnostic;
struct net_device *dev;
- int len, res = 0;
+ int res = 0;
char buf[11];

#if 0
@@ -860,10 +860,17 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
return res;
#endif

+ if (skb->len < ROSE_MIN_LEN)
+ return res;
frametype = skb->data[2];
lci = ((skb->data[0] << 8) & 0xF00) + ((skb->data[1] << 0) & 0x0FF);
- src_addr = (rose_address *)(skb->data + 9);
- dest_addr = (rose_address *)(skb->data + 4);
+ if (frametype == ROSE_CALL_REQUEST &&
+ (skb->len <= ROSE_CALL_REQ_FACILITIES_OFF ||
+ skb->data[ROSE_CALL_REQ_ADDR_LEN_OFF] !=
+ ROSE_CALL_REQ_ADDR_LEN_VAL))
+ return res;
+ src_addr = (rose_address *)(skb->data + ROSE_CALL_REQ_SRC_ADDR_OFF);
+ dest_addr = (rose_address *)(skb->data + ROSE_CALL_REQ_DEST_ADDR_OFF);

spin_lock_bh(&rose_neigh_list_lock);
spin_lock_bh(&rose_route_list_lock);
@@ -1001,12 +1008,11 @@ int rose_route_frame(struct sk_buff *skb, ax25_cb *ax25)
goto out;
}

- len = (((skb->data[3] >> 4) & 0x0F) + 1) >> 1;
- len += (((skb->data[3] >> 0) & 0x0F) + 1) >> 1;
-
memset(&facilities, 0x00, sizeof(struct rose_facilities_struct));

- if (!rose_parse_facilities(skb->data + len + 4, &facilities)) {
+ if (!rose_parse_facilities(skb->data + ROSE_CALL_REQ_FACILITIES_OFF,
+ skb->len - ROSE_CALL_REQ_FACILITIES_OFF,
+ &facilities)) {
rose_transmit_clear_request(rose_neigh, lci, ROSE_INVALID_FACILITY, 76);
goto out;
}
diff --git a/net/rose/rose_subr.c b/net/rose/rose_subr.c
index 07bca7d..32e5c9f 100644
--- a/net/rose/rose_subr.c
+++ b/net/rose/rose_subr.c
@@ -141,7 +141,7 @@ void rose_write_internal(struct sock *sk, int frametype)
*dptr++ = ROSE_GFI | lci1;
*dptr++ = lci2;
*dptr++ = frametype;
- *dptr++ = 0xAA;
+ *dptr++ = ROSE_CALL_REQ_ADDR_LEN_VAL;
memcpy(dptr, &rose->dest_addr, ROSE_ADDR_LEN);
dptr += ROSE_ADDR_LEN;
memcpy(dptr, &rose->source_addr, ROSE_ADDR_LEN);
@@ -245,12 +245,16 @@ static int rose_parse_national(unsigned char *p, struct rose_facilities_struct *
do {
switch (*p & 0xC0) {
case 0x00:
+ if (len < 2)
+ return -1;
p += 2;
n += 2;
len -= 2;
break;

case 0x40:
+ if (len < 3)
+ return -1;
if (*p == FAC_NATIONAL_RAND)
facilities->rand = ((p[1] << 8) & 0xFF00) + ((p[2] << 0) & 0x00FF);
p += 3;
@@ -259,32 +263,48 @@ static int rose_parse_national(unsigned char *p, struct rose_facilities_struct *
break;

case 0x80:
+ if (len < 4)
+ return -1;
p += 4;
n += 4;
len -= 4;
break;

case 0xC0:
+ if (len < 2)
+ return -1;
l = p[1];
+ if (len < 2 + l)
+ return -1;
if (*p == FAC_NATIONAL_DEST_DIGI) {
if (!fac_national_digis_received) {
+ if (l < AX25_ADDR_LEN)
+ return -1;
memcpy(&facilities->source_digis[0], p + 2, AX25_ADDR_LEN);
facilities->source_ndigis = 1;
}
}
else if (*p == FAC_NATIONAL_SRC_DIGI) {
if (!fac_national_digis_received) {
+ if (l < AX25_ADDR_LEN)
+ return -1;
memcpy(&facilities->dest_digis[0], p + 2, AX25_ADDR_LEN);
facilities->dest_ndigis = 1;
}
}
else if (*p == FAC_NATIONAL_FAIL_CALL) {
+ if (l < AX25_ADDR_LEN)
+ return -1;
memcpy(&facilities->fail_call, p + 2, AX25_ADDR_LEN);
}
else if (*p == FAC_NATIONAL_FAIL_ADD) {
+ if (l < 1 + ROSE_ADDR_LEN)
+ return -1;
memcpy(&facilities->fail_addr, p + 3, ROSE_ADDR_LEN);
}
else if (*p == FAC_NATIONAL_DIGIS) {
+ if (l % AX25_ADDR_LEN)
+ return -1;
fac_national_digis_received = 1;
facilities->source_ndigis = 0;
facilities->dest_ndigis = 0;
@@ -318,24 +338,32 @@ static int rose_parse_ccitt(unsigned char *p, struct rose_facilities_struct *fac
do {
switch (*p & 0xC0) {
case 0x00:
+ if (len < 2)
+ return -1;
p += 2;
n += 2;
len -= 2;
break;

case 0x40:
+ if (len < 3)
+ return -1;
p += 3;
n += 3;
len -= 3;
break;

case 0x80:
+ if (len < 4)
+ return -1;
p += 4;
n += 4;
len -= 4;
break;

case 0xC0:
+ if (len < 2)
+ return -1;
l = p[1];

/* Prevent overflows*/
@@ -364,49 +392,44 @@ static int rose_parse_ccitt(unsigned char *p, struct rose_facilities_struct *fac
return n;
}

-int rose_parse_facilities(unsigned char *p,
+int rose_parse_facilities(unsigned char *p, unsigned packet_len,
struct rose_facilities_struct *facilities)
{
int facilities_len, len;

facilities_len = *p++;

- if (facilities_len == 0)
+ if (facilities_len == 0 || (unsigned)facilities_len > packet_len)
return 0;

- while (facilities_len > 0) {
- if (*p == 0x00) {
- facilities_len--;
- p++;
-
- switch (*p) {
- case FAC_NATIONAL: /* National */
- len = rose_parse_national(p + 1, facilities, facilities_len - 1);
- if (len < 0)
- return 0;
- facilities_len -= len + 1;
- p += len + 1;
- break;
-
- case FAC_CCITT: /* CCITT */
- len = rose_parse_ccitt(p + 1, facilities, facilities_len - 1);
- if (len < 0)
- return 0;
- facilities_len -= len + 1;
- p += len + 1;
- break;
-
- default:
- printk(KERN_DEBUG "ROSE: rose_parse_facilities - unknown facilities family %02X\n", *p);
- facilities_len--;
- p++;
- break;
- }
- } else
- break; /* Error in facilities format */
+ while (facilities_len >= 3 && *p == 0x00) {
+ facilities_len--;
+ p++;
+
+ switch (*p) {
+ case FAC_NATIONAL: /* National */
+ len = rose_parse_national(p + 1, facilities, facilities_len - 1);
+ break;
+
+ case FAC_CCITT: /* CCITT */
+ len = rose_parse_ccitt(p + 1, facilities, facilities_len - 1);
+ break;
+
+ default:
+ printk(KERN_DEBUG "ROSE: rose_parse_facilities - unknown facilities family %02X\n", *p);
+ len = 1;
+ break;
+ }
+
+ if (len < 0)
+ return 0;
+ if (WARN_ON(len >= facilities_len))
+ return 0;
+ facilities_len -= len + 1;
+ p += len + 1;
}

- return 1;
+ return facilities_len == 0;
}

static int rose_create_facilities(unsigned char *buffer, struct rose_sock *rose)

Willy Tarreau

unread,

Oct 1, 2012, 7:01:03 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Takashi Iwai, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Takashi Iwai <ti...@suse.de>

commit bc733d495267a23ef8660220d696c6e549ce30b3 upstream.

The irq field of struct snd_mpu401 is supposed to be initialized to -1.
Since it's set to zero as of now, a probing error before the irq
installation results in a kernel warning "Trying to free already-free
IRQ 0".

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=44821
Signed-off-by: Takashi Iwai <ti...@suse.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

sound/drivers/mpu401/mpu401_uart.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/sound/drivers/mpu401/mpu401_uart.c b/sound/drivers/mpu401/mpu401_uart.c
index 2af0999..74f5a3d 100644
--- a/sound/drivers/mpu401/mpu401_uart.c
+++ b/sound/drivers/mpu401/mpu401_uart.c
@@ -554,6 +554,7 @@ int snd_mpu401_uart_new(struct snd_card *card, int device,
spin_lock_init(&mpu->output_lock);
spin_lock_init(&mpu->timer_lock);
mpu->hardware = hardware;
+ mpu->irq = -1;
if (! (info_flags & MPU401_INFO_INTEGRATED)) {
int res_size = hardware == MPU401_HW_PC98II ? 4 : 2;
mpu->res = request_region(port, res_size, "MPU401 UART");

Willy Tarreau

unread,

Oct 1, 2012, 7:01:21 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Ben Myers, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit d97d32edcd732110758799ae60af725e5110b3dc upstream.

When an IO error happens during inode deletion run from
xlog_recover_process_iunlinks() filesystem gets shutdown. Thus any subsequent
attempt to read buffers fails. Code in xlog_recover_process_iunlinks() does not
count with the fact that read of a buffer which was read a while ago can
really fail which results in the oops on
agi = XFS_BUF_TO_AGI(agibp);

Fix the problem by cleaning up the buffer handling in
xlog_recover_process_iunlinks() as suggested by Dave Chinner. We release buffer
lock but keep buffer reference to AG buffer. That is enough for buffer to stay
pinned in memory and we don't have to call xfs_read_agi() all the time.

Signed-off-by: Jan Kara <ja...@suse.cz>
Reviewed-by: Dave Chinner <dchi...@redhat.com>
Signed-off-by: Ben Myers <b...@sgi.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/xfs/xfs_log_recover.c | 33 +++++++++++----------------------
1 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c
index 844a99b..bae2c99 100644
--- a/fs/xfs/xfs_log_recover.c
+++ b/fs/xfs/xfs_log_recover.c
@@ -3298,37 +3298,26 @@ xlog_recover_process_iunlinks(
*/
continue;
}
+ /*
+ * Unlock the buffer so that it can be acquired in the normal
+ * course of the transaction to truncate and free each inode.
+ * Because we are not racing with anyone else here for the AGI
+ * buffer, we don't even need to hold it locked to read the
+ * initial unlinked bucket entries out of the buffer. We keep
+ * buffer reference though, so that it stays pinned in memory
+ * while we need the buffer.
+ */
agi = XFS_BUF_TO_AGI(agibp);
+ xfs_buf_unlock(agibp);

for (bucket = 0; bucket < XFS_AGI_UNLINKED_BUCKETS; bucket++) {
agino = be32_to_cpu(agi->agi_unlinked[bucket]);
while (agino != NULLAGINO) {
- /*
- * Release the agi buffer so that it can
- * be acquired in the normal course of the
- * transaction to truncate and free the inode.
- */
- xfs_buf_relse(agibp);
-
agino = xlog_recover_process_one_iunlink(mp,
agno, agino, bucket);
-
- /*
- * Reacquire the agibuffer and continue around
- * the loop. This should never fail as we know
- * the buffer was good earlier on.
- */
- error = xfs_read_agi(mp, NULL, agno, &agibp);
- ASSERT(error == 0);
- agi = XFS_BUF_TO_AGI(agibp);
}
}
-
- /*
- * Release the buffer for the current agi so we can
- * go on to the next one.
- */
- xfs_buf_relse(agibp);
+ xfs_buf_rele(agibp);
}

mp->m_dmevmask = mp_dmevmask;

Willy Tarreau

unread,

Oct 1, 2012, 7:01:26 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, David S. Miller, Theodore Tso, Herbert Xu, Matt Mackall, Tony Luck, Eric Dumazet, H. Peter Anvin, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torv...@linux-foundation.org>

commit cf833d0b9937874b50ef2867c4e8badfd64948ce upstream.

We still don't use rdrand in /dev/random, which just seems stupid. We
accept the *cycle*counter* as a random input, but we don't accept
rdrand? That's just broken.

Sure, people can do things in user space (write to /dev/random, use
rdrand in addition to /dev/random themselves etc etc), but that
*still* seems to be a particularly stupid reason for saying "we
shouldn't bother to try to do better in /dev/random".

And even if somebody really doesn't trust rdrand as a source of random
bytes, it seems singularly stupid to trust the cycle counter *more*.

So I'd suggest the attached patch. I'm not going to even bother
arguing that we should add more bits to the entropy estimate, because
that's not the point - I don't care if /dev/random fills up slowly or
not, I think it's just stupid to not use the bits we can get from
rdrand and mix them into the strong randomness pool.

Link: http://lkml.kernel.org/r/CA%2B55aFwn59N1=m651QAyTy-1gO1noGb...@mail.gmail.com
Acked-by: "David S. Miller" <da...@davemloft.net>
Acked-by: "Theodore Ts'o" <ty...@mit.edu>
Acked-by: Herbert Xu <her...@gondor.apana.org.au>
Cc: Matt Mackall <m...@selenic.com>
Cc: Tony Luck <tony...@intel.com>
Cc: Eric Dumazet <eric.d...@gmail.com>
Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index f2a1651..ac74029 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -624,8 +624,8 @@ static struct timer_rand_state input_timer_state;

static void add_timer_randomness(struct timer_rand_state *state, unsigned num)

{
struct {
- cycles_t cycles;
long jiffies;
+ unsigned cycles;
unsigned num;
} sample;
long delta, delta2, delta3;
@@ -637,7 +637,11 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)

goto out;

sample.jiffies = jiffies;

- sample.cycles = get_cycles();
+
+ /* Use arch random value, fall back to cycles */
+ if (!arch_get_random_int(&sample.cycles))
+ sample.cycles = get_cycles();
+
sample.num = num;
mix_pool_bytes(&input_pool, &sample, sizeof(sample));

Willy Tarreau

unread,

Oct 1, 2012, 7:01:33 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, WANG Cong, Alexey Khoroshilov, Miklos Szeredi, Sage Weil, Eugene Teo, Roman Zippel, Al Viro, Christoph Hellwig, Alexey Dobriyan, Dave Anderson, Andrew Morton, Greg Kroah-Hartman, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <gre...@linuxfoundation.org>

commit 6f24f892871acc47b40dd594c63606a17c714f77 upstream

Commit ec81aecb2966 ("hfs: fix a potential buffer overflow") fixed a few
potential buffer overflows in the hfs filesystem. But as Timo Warns
pointed out, these changes also need to be made on the hfsplus
filesystem as well.

Reported-by: Timo Warns <wa...@pre-sense.de>
Acked-by: WANG Cong <amw...@redhat.com>
Cc: Alexey Khoroshilov <khoro...@ispras.ru>
Cc: Miklos Szeredi <msze...@suse.cz>
Cc: Sage Weil <sa...@newdream.net>
Cc: Eugene Teo <et...@redhat.com>
Cc: Roman Zippel <zip...@linux-m68k.org>
Cc: Al Viro <vi...@zeniv.linux.org.uk>
Cc: Christoph Hellwig <h...@lst.de>
Cc: Alexey Dobriyan <adob...@gmail.com>
Cc: Dave Anderson <ande...@redhat.com>
Cc: stable <sta...@vger.kernel.org>
Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/hfsplus/catalog.c | 4 ++++
fs/hfsplus/dir.c | 11 +++++++++++
2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/fs/hfsplus/catalog.c b/fs/hfsplus/catalog.c
index f6874ac..a0786c6 100644
--- a/fs/hfsplus/catalog.c
+++ b/fs/hfsplus/catalog.c
@@ -329,6 +329,10 @@ int hfsplus_rename_cat(u32 cnid,
err = hfs_brec_find(&src_fd);
if (err)
goto out;
+ if (src_fd.entrylength > sizeof(entry) || src_fd.entrylength < 0) {
+ err = -EIO;
+ goto out;
+ }

hfs_bnode_read(src_fd.bnode, &entry, src_fd.entryoffset,
src_fd.entrylength);
diff --git a/fs/hfsplus/dir.c b/fs/hfsplus/dir.c
index 5f40236..f4300ff7 100644
--- a/fs/hfsplus/dir.c
+++ b/fs/hfsplus/dir.c
@@ -138,6 +138,11 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
filp->f_pos++;
/* fall through */
case 1:
+ if (fd.entrylength > sizeof(entry) || fd.entrylength < 0) {
+ err = -EIO;
+ goto out;
+ }
+
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, fd.entrylength);
if (be16_to_cpu(entry.type) != HFSPLUS_FOLDER_THREAD) {
printk(KERN_ERR "hfs: bad catalog folder thread\n");
@@ -168,6 +173,12 @@ static int hfsplus_readdir(struct file *filp, void *dirent, filldir_t filldir)
err = -EIO;
goto out;
}
+
+ if (fd.entrylength > sizeof(entry) || fd.entrylength < 0) {
+ err = -EIO;
+ goto out;
+ }
+
hfs_bnode_read(fd.bnode, &entry, fd.entryoffset, fd.entrylength);
type = be16_to_cpu(entry.type);
len = HFSPLUS_MAX_STRLEN;

Willy Tarreau

unread,

Oct 1, 2012, 7:01:40 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Richard Cochran, Prarit Bhargava, Thomas Gleixner, John Stultz, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Richard Cochran <richard...@gmail.com>

This is a backport of dd48d708ff3e917f6d6b6c2b696c3f18c019feed

When repeating a UTC time value during a leap second (when the UTC
time should be 23:59:60), the TAI timescale should not stop. The kernel
NTP code increments the TAI offset one second too late. This patch fixes
the issue by incrementing the offset during the leap second itself.

Signed-off-by: Richard Cochran <richard...@gmail.com>

Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>

Signed-off-by: John Stultz <john....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/ntp.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c

index dc76c9a..c1c36a2 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -208,6 +208,7 @@ int second_overflow(unsigned long secs)

if (secs % 86400 == 0) {

leap = -1;
time_state = TIME_OOP;
+ time_tai++;
printk(KERN_NOTICE

"Clock: inserting leap second 23:59:60 UTC\n");
}

@@ -222,7 +223,6 @@ int second_overflow(unsigned long secs)
}
break;
case TIME_OOP:
- time_tai++;
time_state = TIME_WAIT;
break;

Willy Tarreau

unread,

Oct 1, 2012, 7:01:43 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, J. Bruce Fields, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfi...@fieldses.org>

commit 0ec4f431eb56d633da3a55da67d5c4b88886ccc7 upstream.

The only checks of the long argument passed to fcntl(fd,F_SETLEASE,.)
are done after converting the long to an int. Thus some illegal values
may be let through and cause problems in later code.

[ They actually *don't* cause problems in mainline, as of Dave Jones's
commit 8d657eb3b438 "Remove easily user-triggerable BUG from
generic_setlease", but we should fix this anyway. And this patch will
be necessary to fix real bugs on earlier kernels. ]

Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/locks.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index a8794f2..fde92d1 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -291,7 +291,7 @@ static int flock_make_lock(struct file *filp, struct file_lock **lock,
return 0;
}

-static int assign_type(struct file_lock *fl, int type)
+static int assign_type(struct file_lock *fl, long type)
{
switch (type) {
case F_RDLCK:
@@ -444,7 +444,7 @@ static const struct lock_manager_operations lease_manager_ops = {
/*
* Initialize a lease, use the default lock manager operations
*/
-static int lease_init(struct file *filp, int type, struct file_lock *fl)
+static int lease_init(struct file *filp, long type, struct file_lock *fl)
{
if (assign_type(fl, type) != 0)
return -EINVAL;
@@ -462,7 +462,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
}

/* Allocate a file_lock initialised to this type of lease */
-static struct file_lock *lease_alloc(struct file *filp, int type)
+static struct file_lock *lease_alloc(struct file *filp, long type)
{
struct file_lock *fl = locks_alloc_lock();
int error = -ENOMEM;

Willy Tarreau

unread,

Oct 1, 2012, 7:01:53 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Francois Romieu, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Francois Romieu <rom...@fr.zoreil.com>

commit 78f6a6bd89e9a33e4be1bc61e6990a1172aa396e upstream

Signed-off-by: Francois Romieu <rom...@fr.zoreil.com>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/dl2k.c | 105 ++++++++++++++++++++++++-------------------------
drivers/net/dl2k.h | 110 +---------------------------------------------------
2 files changed, 53 insertions(+), 162 deletions(-)

diff --git a/drivers/net/dl2k.c b/drivers/net/dl2k.c
index 7fa7a90..731ee85 100644
--- a/drivers/net/dl2k.c
+++ b/drivers/net/dl2k.c
@@ -1448,7 +1448,7 @@ mii_wait_link (struct net_device *dev, int wait)

do {
bmsr = mii_read (dev, phy_addr, MII_BMSR);
- if (bmsr & MII_BMSR_LINK_STATUS)
+ if (bmsr & BMSR_LSTATUS)
return 0;
mdelay (1);
} while (--wait > 0);
@@ -1469,60 +1469,60 @@ mii_get_media (struct net_device *dev)

bmsr = mii_read (dev, phy_addr, MII_BMSR);
if (np->an_enable) {
- if (!(bmsr & MII_BMSR_AN_COMPLETE)) {
+ if (!(bmsr & BMSR_ANEGCOMPLETE)) {
/* Auto-Negotiation not completed */
return -1;
}
- negotiate = mii_read (dev, phy_addr, MII_ANAR) &
- mii_read (dev, phy_addr, MII_ANLPAR);
- mscr = mii_read (dev, phy_addr, MII_MSCR);
- mssr = mii_read (dev, phy_addr, MII_MSSR);
- if (mscr & MII_MSCR_1000BT_FD && mssr & MII_MSSR_LP_1000BT_FD) {
+ negotiate = mii_read (dev, phy_addr, MII_ADVERTISE) &
+ mii_read (dev, phy_addr, MII_LPA);
+ mscr = mii_read (dev, phy_addr, MII_CTRL1000);
+ mssr = mii_read (dev, phy_addr, MII_STAT1000);
+ if (mscr & ADVERTISE_1000FULL && mssr & LPA_1000FULL) {
np->speed = 1000;
np->full_duplex = 1;
printk (KERN_INFO "Auto 1000 Mbps, Full duplex\n");
- } else if (mscr & MII_MSCR_1000BT_HD && mssr & MII_MSSR_LP_1000BT_HD) {
+ } else if (mscr & ADVERTISE_1000HALF && mssr & LPA_1000HALF) {
np->speed = 1000;
np->full_duplex = 0;
printk (KERN_INFO "Auto 1000 Mbps, Half duplex\n");
- } else if (negotiate & MII_ANAR_100BX_FD) {
+ } else if (negotiate & ADVERTISE_100FULL) {
np->speed = 100;
np->full_duplex = 1;
printk (KERN_INFO "Auto 100 Mbps, Full duplex\n");
- } else if (negotiate & MII_ANAR_100BX_HD) {
+ } else if (negotiate & ADVERTISE_100HALF) {
np->speed = 100;
np->full_duplex = 0;
printk (KERN_INFO "Auto 100 Mbps, Half duplex\n");
- } else if (negotiate & MII_ANAR_10BT_FD) {
+ } else if (negotiate & ADVERTISE_10FULL) {
np->speed = 10;
np->full_duplex = 1;
printk (KERN_INFO "Auto 10 Mbps, Full duplex\n");
- } else if (negotiate & MII_ANAR_10BT_HD) {
+ } else if (negotiate & ADVERTISE_10HALF) {
np->speed = 10;
np->full_duplex = 0;
printk (KERN_INFO "Auto 10 Mbps, Half duplex\n");
}
- if (negotiate & MII_ANAR_PAUSE) {
+ if (negotiate & ADVERTISE_PAUSE_CAP) {
np->tx_flow &= 1;
np->rx_flow &= 1;
- } else if (negotiate & MII_ANAR_ASYMMETRIC) {
+ } else if (negotiate & ADVERTISE_PAUSE_ASYM) {
np->tx_flow = 0;
np->rx_flow &= 1;
}
/* else tx_flow, rx_flow = user select */
} else {
__u16 bmcr = mii_read (dev, phy_addr, MII_BMCR);
- switch (bmcr & (MII_BMCR_SPEED_100 | MII_BMCR_SPEED_1000)) {
- case MII_BMCR_SPEED_1000:
+ switch (bmcr & (BMCR_SPEED100 | BMCR_SPEED1000)) {
+ case BMCR_SPEED1000:
printk (KERN_INFO "Operating at 1000 Mbps, ");
break;
- case MII_BMCR_SPEED_100:
+ case BMCR_SPEED100:
printk (KERN_INFO "Operating at 100 Mbps, ");
break;
case 0:
printk (KERN_INFO "Operating at 10 Mbps, ");
}
- if (bmcr & MII_BMCR_DUPLEX_MODE) {
+ if (bmcr & BMCR_FULLDPLX) {
printk (KERN_CONT "Full duplex\n");
} else {
printk (KERN_CONT "Half duplex\n");
@@ -1556,24 +1556,22 @@ mii_set_media (struct net_device *dev)
if (np->an_enable) {
/* Advertise capabilities */
bmsr = mii_read (dev, phy_addr, MII_BMSR);
- anar = mii_read (dev, phy_addr, MII_ANAR) &
- ~MII_ANAR_100BX_FD &
- ~MII_ANAR_100BX_HD &
- ~MII_ANAR_100BT4 &
- ~MII_ANAR_10BT_FD &
- ~MII_ANAR_10BT_HD;
- if (bmsr & MII_BMSR_100BX_FD)
- anar |= MII_ANAR_100BX_FD;
- if (bmsr & MII_BMSR_100BX_HD)
- anar |= MII_ANAR_100BX_HD;
- if (bmsr & MII_BMSR_100BT4)
- anar |= MII_ANAR_100BT4;
- if (bmsr & MII_BMSR_10BT_FD)
- anar |= MII_ANAR_10BT_FD;
- if (bmsr & MII_BMSR_10BT_HD)
- anar |= MII_ANAR_10BT_HD;
- anar |= MII_ANAR_PAUSE | MII_ANAR_ASYMMETRIC;
- mii_write (dev, phy_addr, MII_ANAR, anar);
+ anar = mii_read (dev, phy_addr, MII_ADVERTISE) &
+ ~(ADVERTISE_100FULL | ADVERTISE_10FULL |
+ ADVERTISE_100HALF | ADVERTISE_10HALF |
+ ADVERTISE_100BASE4);
+ if (bmsr & BMSR_100FULL)
+ anar |= ADVERTISE_100FULL;
+ if (bmsr & BMSR_100HALF)
+ anar |= ADVERTISE_100HALF;
+ if (bmsr & BMSR_100BASE4)
+ anar |= ADVERTISE_100BASE4;
+ if (bmsr & BMSR_10FULL)
+ anar |= ADVERTISE_10FULL;
+ if (bmsr & BMSR_10HALF)
+ anar |= ADVERTISE_10HALF;
+ anar |= ADVERTISE_PAUSE_CAP | ADVERTISE_PAUSE_ASYM;
+ mii_write (dev, phy_addr, MII_ADVERTISE, anar);

/* Enable Auto crossover */
pscr = mii_read (dev, phy_addr, MII_PHY_SCR);
@@ -1581,8 +1579,8 @@ mii_set_media (struct net_device *dev)
mii_write (dev, phy_addr, MII_PHY_SCR, pscr);

/* Soft reset PHY */
- mii_write (dev, phy_addr, MII_BMCR, MII_BMCR_RESET);
- bmcr = MII_BMCR_AN_ENABLE | MII_BMCR_RESTART_AN | MII_BMCR_RESET;
+ mii_write (dev, phy_addr, MII_BMCR, BMCR_RESET);
+ bmcr = BMCR_ANENABLE | BMCR_ANRESTART | BMCR_RESET;
mii_write (dev, phy_addr, MII_BMCR, bmcr);
mdelay(1);
} else {
@@ -1594,7 +1592,7 @@ mii_set_media (struct net_device *dev)

/* 2) PHY Reset */
bmcr = mii_read (dev, phy_addr, MII_BMCR);
- bmcr |= MII_BMCR_RESET;
+ bmcr |= BMCR_RESET;
mii_write (dev, phy_addr, MII_BMCR, bmcr);

/* 3) Power Down */
@@ -1603,25 +1601,25 @@ mii_set_media (struct net_device *dev)
mdelay (100); /* wait a certain time */

/* 4) Advertise nothing */
- mii_write (dev, phy_addr, MII_ANAR, 0);
+ mii_write (dev, phy_addr, MII_ADVERTISE, 0);

/* 5) Set media and Power Up */
- bmcr = MII_BMCR_POWER_DOWN;
+ bmcr = BMCR_PDOWN;
if (np->speed == 100) {
- bmcr |= MII_BMCR_SPEED_100;
+ bmcr |= BMCR_SPEED100;
printk (KERN_INFO "Manual 100 Mbps, ");
} else if (np->speed == 10) {
printk (KERN_INFO "Manual 10 Mbps, ");
}
if (np->full_duplex) {
- bmcr |= MII_BMCR_DUPLEX_MODE;
+ bmcr |= BMCR_FULLDPLX;
printk (KERN_CONT "Full duplex\n");
} else {
printk (KERN_CONT "Half duplex\n");
}
#if 0
/* Set 1000BaseT Master/Slave setting */
- mscr = mii_read (dev, phy_addr, MII_MSCR);
+ mscr = mii_read (dev, phy_addr, MII_CTRL1000);
mscr |= MII_MSCR_CFG_ENABLE;
mscr &= ~MII_MSCR_CFG_VALUE = 0;
#endif
@@ -1644,7 +1642,7 @@ mii_get_media_pcs (struct net_device *dev)

bmsr = mii_read (dev, phy_addr, PCS_BMSR);
if (np->an_enable) {
- if (!(bmsr & MII_BMSR_AN_COMPLETE)) {
+ if (!(bmsr & BMSR_ANEGCOMPLETE)) {
/* Auto-Negotiation not completed */
return -1;
}
@@ -1669,7 +1667,7 @@ mii_get_media_pcs (struct net_device *dev)
} else {
__u16 bmcr = mii_read (dev, phy_addr, PCS_BMCR);
printk (KERN_INFO "Operating at 1000 Mbps, ");
- if (bmcr & MII_BMCR_DUPLEX_MODE) {
+ if (bmcr & BMCR_FULLDPLX) {
printk (KERN_CONT "Full duplex\n");
} else {
printk (KERN_CONT "Half duplex\n");
@@ -1702,7 +1700,7 @@ mii_set_media_pcs (struct net_device *dev)
if (np->an_enable) {
/* Advertise capabilities */
esr = mii_read (dev, phy_addr, PCS_ESR);
- anar = mii_read (dev, phy_addr, MII_ANAR) &
+ anar = mii_read (dev, phy_addr, MII_ADVERTISE) &
~PCS_ANAR_HALF_DUPLEX &
~PCS_ANAR_FULL_DUPLEX;
if (esr & (MII_ESR_1000BT_HD | MII_ESR_1000BX_HD))
@@ -1710,22 +1708,21 @@ mii_set_media_pcs (struct net_device *dev)
if (esr & (MII_ESR_1000BT_FD | MII_ESR_1000BX_FD))
anar |= PCS_ANAR_FULL_DUPLEX;
anar |= PCS_ANAR_PAUSE | PCS_ANAR_ASYMMETRIC;
- mii_write (dev, phy_addr, MII_ANAR, anar);
+ mii_write (dev, phy_addr, MII_ADVERTISE, anar);

/* Soft reset PHY */
- mii_write (dev, phy_addr, MII_BMCR, MII_BMCR_RESET);
- bmcr = MII_BMCR_AN_ENABLE | MII_BMCR_RESTART_AN |
- MII_BMCR_RESET;
+ mii_write (dev, phy_addr, MII_BMCR, BMCR_RESET);
+ bmcr = BMCR_ANENABLE | BMCR_ANRESTART | BMCR_RESET;
mii_write (dev, phy_addr, MII_BMCR, bmcr);
mdelay(1);
} else {
/* Force speed setting */
/* PHY Reset */
- bmcr = MII_BMCR_RESET;
+ bmcr = BMCR_RESET;
mii_write (dev, phy_addr, MII_BMCR, bmcr);
mdelay(10);
if (np->full_duplex) {
- bmcr = MII_BMCR_DUPLEX_MODE;
+ bmcr = BMCR_FULLDPLX;
printk (KERN_INFO "Manual full duplex\n");
} else {
bmcr = 0;
@@ -1735,7 +1732,7 @@ mii_set_media_pcs (struct net_device *dev)
mdelay(10);

/* Advertise nothing */
- mii_write (dev, phy_addr, MII_ANAR, 0);
+ mii_write (dev, phy_addr, MII_ADVERTISE, 0);
}
return 0;
}
diff --git a/drivers/net/dl2k.h b/drivers/net/dl2k.h
index 266ec87..73e1457 100644
--- a/drivers/net/dl2k.h
+++ b/drivers/net/dl2k.h
@@ -28,6 +28,7 @@
#include <linux/init.h>
#include <linux/crc32.h>
#include <linux/ethtool.h>
+#include <linux/mii.h>
#include <linux/bitops.h>
#include <asm/processor.h> /* Processor type for cache alignment. */
#include <asm/io.h>
@@ -271,20 +272,9 @@ enum RFS_bits {
#define MII_RESET_TIME_OUT 10000
/* MII register */
enum _mii_reg {
- MII_BMCR = 0,
- MII_BMSR = 1,
- MII_PHY_ID1 = 2,
- MII_PHY_ID2 = 3,
- MII_ANAR = 4,
- MII_ANLPAR = 5,
- MII_ANER = 6,
- MII_ANNPT = 7,
- MII_ANLPRNP = 8,
- MII_MSCR = 9,
- MII_MSSR = 10,
- MII_ESR = 15,
MII_PHY_SCR = 16,
};
+
/* PCS register */
enum _pcs_reg {
PCS_BMCR = 0,
@@ -297,102 +287,6 @@ enum _pcs_reg {
PCS_ESR = 15,
};

-/* Basic Mode Control Register */
-enum _mii_bmcr {
- MII_BMCR_RESET = 0x8000,
- MII_BMCR_LOOP_BACK = 0x4000,
- MII_BMCR_SPEED_LSB = 0x2000,
- MII_BMCR_AN_ENABLE = 0x1000,
- MII_BMCR_POWER_DOWN = 0x0800,
- MII_BMCR_ISOLATE = 0x0400,
- MII_BMCR_RESTART_AN = 0x0200,
- MII_BMCR_DUPLEX_MODE = 0x0100,
- MII_BMCR_COL_TEST = 0x0080,
- MII_BMCR_SPEED_MSB = 0x0040,
- MII_BMCR_SPEED_RESERVED = 0x003f,
- MII_BMCR_SPEED_10 = 0,
- MII_BMCR_SPEED_100 = MII_BMCR_SPEED_LSB,
- MII_BMCR_SPEED_1000 = MII_BMCR_SPEED_MSB,
-};
-
-/* Basic Mode Status Register */
-enum _mii_bmsr {
- MII_BMSR_100BT4 = 0x8000,
- MII_BMSR_100BX_FD = 0x4000,
- MII_BMSR_100BX_HD = 0x2000,
- MII_BMSR_10BT_FD = 0x1000,
- MII_BMSR_10BT_HD = 0x0800,
- MII_BMSR_100BT2_FD = 0x0400,
- MII_BMSR_100BT2_HD = 0x0200,
- MII_BMSR_EXT_STATUS = 0x0100,
- MII_BMSR_PREAMBLE_SUPP = 0x0040,
- MII_BMSR_AN_COMPLETE = 0x0020,
- MII_BMSR_REMOTE_FAULT = 0x0010,
- MII_BMSR_AN_ABILITY = 0x0008,
- MII_BMSR_LINK_STATUS = 0x0004,
- MII_BMSR_JABBER_DETECT = 0x0002,
- MII_BMSR_EXT_CAP = 0x0001,
-};
-
-/* ANAR */
-enum _mii_anar {
- MII_ANAR_NEXT_PAGE = 0x8000,
- MII_ANAR_REMOTE_FAULT = 0x4000,
- MII_ANAR_ASYMMETRIC = 0x0800,
- MII_ANAR_PAUSE = 0x0400,
- MII_ANAR_100BT4 = 0x0200,
- MII_ANAR_100BX_FD = 0x0100,
- MII_ANAR_100BX_HD = 0x0080,
- MII_ANAR_10BT_FD = 0x0020,
- MII_ANAR_10BT_HD = 0x0010,
- MII_ANAR_SELECTOR = 0x001f,
- MII_IEEE8023_CSMACD = 0x0001,
-};
-
-/* ANLPAR */
-enum _mii_anlpar {
- MII_ANLPAR_NEXT_PAGE = MII_ANAR_NEXT_PAGE,
- MII_ANLPAR_REMOTE_FAULT = MII_ANAR_REMOTE_FAULT,
- MII_ANLPAR_ASYMMETRIC = MII_ANAR_ASYMMETRIC,
- MII_ANLPAR_PAUSE = MII_ANAR_PAUSE,
- MII_ANLPAR_100BT4 = MII_ANAR_100BT4,
- MII_ANLPAR_100BX_FD = MII_ANAR_100BX_FD,
- MII_ANLPAR_100BX_HD = MII_ANAR_100BX_HD,
- MII_ANLPAR_10BT_FD = MII_ANAR_10BT_FD,
- MII_ANLPAR_10BT_HD = MII_ANAR_10BT_HD,
- MII_ANLPAR_SELECTOR = MII_ANAR_SELECTOR,
-};
-
-/* Auto-Negotiation Expansion Register */
-enum _mii_aner {
- MII_ANER_PAR_DETECT_FAULT = 0x0010,
- MII_ANER_LP_NEXTPAGABLE = 0x0008,
- MII_ANER_NETXTPAGABLE = 0x0004,
- MII_ANER_PAGE_RECEIVED = 0x0002,
- MII_ANER_LP_NEGOTIABLE = 0x0001,
-};
-
-/* MASTER-SLAVE Control Register */
-enum _mii_mscr {
- MII_MSCR_TEST_MODE = 0xe000,
- MII_MSCR_CFG_ENABLE = 0x1000,
- MII_MSCR_CFG_VALUE = 0x0800,
- MII_MSCR_PORT_VALUE = 0x0400,
- MII_MSCR_1000BT_FD = 0x0200,
- MII_MSCR_1000BT_HD = 0X0100,
-};
-
-/* MASTER-SLAVE Status Register */
-enum _mii_mssr {
- MII_MSSR_CFG_FAULT = 0x8000,
- MII_MSSR_CFG_RES = 0x4000,
- MII_MSSR_LOCAL_RCV_STATUS = 0x2000,
- MII_MSSR_REMOTE_RCVR = 0x1000,
- MII_MSSR_LP_1000BT_FD = 0x0800,
- MII_MSSR_LP_1000BT_HD = 0x0400,
- MII_MSSR_IDLE_ERR_COUNT = 0x00ff,
-};
-
/* IEEE Extened Status Register */
enum _mii_esr {
MII_ESR_1000BX_FD = 0x8000,

Willy Tarreau

unread,

Oct 1, 2012, 7:01:59 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Darren Hart, Dave Jones, Dan Carpenter, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvh...@linux.intel.com>

commit b6070a8d9853eda010a549fa9a09eb8d7269b929 upstream.

If fixup_pi_state_owner() faults, pi_mutex may be NULL. Test
for pi_mutex != NULL before testing the owner against current
and possibly unlocking it.

Signed-off-by: Darren Hart <dvh...@linux.intel.com>
Cc: Dave Jones <da...@redhat.com>
Cc: Dan Carpenter <dan.ca...@oracle.com>
Link: http://lkml.kernel.org/r/dc59890338fc413606f04e5c5b13153...@linux.intel.com

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/futex.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 0b06da1..6f745b6 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2345,7 +2345,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
* fault, unlock the rt_mutex and return the fault to userspace.
*/
if (ret == -EFAULT) {
- if (rt_mutex_owner(pi_mutex) == current)
+ if (pi_mutex && rt_mutex_owner(pi_mutex) == current)
rt_mutex_unlock(pi_mutex);
} else if (ret == -EINTR) {
/*

Willy Tarreau

unread,

Oct 1, 2012, 7:02:01 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jason Baron, Nelson Elhage, Davide Libenzi, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <jba...@redhat.com>

commit 28d82dc1c4edbc352129f97f4ca22624d1fe61de upstream.

The current epoll code can be tickled to run basically indefinitely in
both loop detection path check (on ep_insert()), and in the wakeup paths.
The programs that tickle this behavior set up deeply linked networks of
epoll file descriptors that cause the epoll algorithms to traverse them
indefinitely. A couple of these sample programs have been previously
posted in this thread: https://lkml.org/lkml/2011/2/25/297.

To fix the loop detection path check algorithms, I simply keep track of
the epoll nodes that have been already visited. Thus, the loop detection
becomes proportional to the number of epoll file descriptor and links.
This dramatically decreases the run-time of the loop check algorithm. In
one diabolical case I tried it reduced the run-time from 15 mintues (all
in kernel time) to .3 seconds.

Fixing the wakeup paths could be done at wakeup time in a similar manner
by keeping track of nodes that have already been visited, but the
complexity is harder, since there can be multiple wakeups on different
cpus...Thus, I've opted to limit the number of possible wakeup paths when
the paths are created.

This is accomplished, by noting that the end file descriptor points that
are found during the loop detection pass (from the newly added link), are
actually the sources for wakeup events. I keep a list of these file
descriptors and limit the number and length of these paths that emanate
from these 'source file descriptors'. In the current implemetation I
allow 1000 paths of length 1, 500 of length 2, 100 of length 3, 50 of
length 4 and 10 of length 5. Note that it is sufficient to check the
'source file descriptors' reachable from the newly added link, since no
other 'source file descriptors' will have newly added links. This allows
us to check only the wakeup paths that may have gotten too long, and not
re-check all possible wakeup paths on the system.

In terms of the path limit selection, I think its first worth noting that
the most common case for epoll, is probably the model where you have 1
epoll file descriptor that is monitoring n number of 'source file
descriptors'. In this case, each 'source file descriptor' has a 1 path of
length 1. Thus, I believe that the limits I'm proposing are quite
reasonable and in fact may be too generous. Thus, I'm hoping that the
proposed limits will not prevent any workloads that currently work to
fail.

In terms of locking, I have extended the use of the 'epmutex' to all
epoll_ctl add and remove operations. Currently its only used in a subset
of the add paths. I need to hold the epmutex, so that we can correctly
traverse a coherent graph, to check the number of paths. I believe that
this additional locking is probably ok, since its in the setup/teardown
paths, and doesn't affect the running paths, but it certainly is going to
add some extra overhead. Also, worth noting is that the epmuex was
recently added to the ep_ctl add operations in the initial path loop
detection code using the argument that it was not on a critical path.

Another thing to note here, is the length of epoll chains that is allowed.
Currently, eventpoll.c defines:

/* Maximum number of nesting allowed inside epoll sets */

This basically means that I am limited to a graph depth of 5 (EP_MAX_NESTS
+ 1). However, this limit is currently only enforced during the loop
check detection code, and only when the epoll file descriptors are added
in a certain order. Thus, this limit is currently easily bypassed. The
newly added check for wakeup paths, stricly limits the wakeup paths to a
length of 5, regardless of the order in which ep's are linked together.
Thus, a side-effect of the new code is a more consistent enforcement of
the graph depth.

Thus far, I've tested this, using the sample programs previously
mentioned, which now either return quickly or return -EINVAL. I've also
testing using the piptest.c epoll tester, which showed no difference in
performance. I've also created a number of different epoll networks and
tested that they behave as expectded.

I believe this solves the original diabolical test cases, while still
preserving the sane epoll nesting.

Signed-off-by: Jason Baron <jba...@redhat.com>
Cc: Nelson Elhage <nel...@ksplice.com>
Cc: Davide Libenzi <dav...@xmailserver.org>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/eventpoll.c | 236 ++++++++++++++++++++++++++++++++++++++++-----
include/linux/eventpoll.h | 1 +
include/linux/fs.h | 1 +
3 files changed, 212 insertions(+), 26 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 42f2c12..8da83d8 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -200,6 +200,12 @@ struct eventpoll {

/* The user that created the eventpoll descriptor */
struct user_struct *user;
+
+ struct file *file;
+
+ /* used to optimize loop detection check */
+ int visited;
+ struct list_head visited_list_link;
};

/* Wait structure used by the poll hooks */
@@ -258,6 +264,15 @@ static struct kmem_cache *epi_cache __read_mostly;
/* Slab cache used to allocate "struct eppoll_entry" */
static struct kmem_cache *pwq_cache __read_mostly;

+/* Visited nodes during ep_loop_check(), so we can unset them when we finish */
+static LIST_HEAD(visited_list);
+
+/*
+ * List of files with newly added links, where we may need to limit the number
+ * of emanating paths. Protected by the epmutex.
+ */
+static LIST_HEAD(tfile_check_list);
+
#ifdef CONFIG_SYSCTL

#include <linux/sysctl.h>
@@ -277,6 +292,12 @@ ctl_table epoll_table[] = {
};
#endif /* CONFIG_SYSCTL */

+static const struct file_operations eventpoll_fops;
+
+static inline int is_file_epoll(struct file *f)
+{
+ return f->f_op == &eventpoll_fops;
+}

/* Setup the structure that is used as key for the RB tree */
static inline void ep_set_ffd(struct epoll_filefd *ffd,
@@ -715,12 +736,6 @@ static const struct file_operations eventpoll_fops = {
.poll = ep_eventpoll_poll
};

-/* Fast test to see if the file is an evenpoll file */
-static inline int is_file_epoll(struct file *f)
-{
- return f->f_op == &eventpoll_fops;
-}
-
/*
* This is called from eventpoll_release() to unlink files from the eventpoll
* interface. We need to have this facility to cleanup correctly files that are
@@ -941,6 +956,99 @@ static void ep_rbtree_insert(struct eventpoll *ep, struct epitem *epi)
rb_insert_color(&epi->rbn, &ep->rbr);
}

+
+
+#define PATH_ARR_SIZE 5
+/*
+ * These are the number paths of length 1 to 5, that we are allowing to emanate
+ * from a single file of interest. For example, we allow 1000 paths of length
+ * 1, to emanate from each file of interest. This essentially represents the
+ * potential wakeup paths, which need to be limited in order to avoid massive
+ * uncontrolled wakeup storms. The common use case should be a single ep which
+ * is connected to n file sources. In this case each file source has 1 path
+ * of length 1. Thus, the numbers below should be more than sufficient. These
+ * path limits are enforced during an EPOLL_CTL_ADD operation, since a modify
+ * and delete can't add additional paths. Protected by the epmutex.
+ */
+static const int path_limits[PATH_ARR_SIZE] = { 1000, 500, 100, 50, 10 };
+static int path_count[PATH_ARR_SIZE];
+
+static int path_count_inc(int nests)
+{
+ if (++path_count[nests] > path_limits[nests])
+ return -1;
+ return 0;
+}
+
+static void path_count_init(void)
+{
+ int i;
+
+ for (i = 0; i < PATH_ARR_SIZE; i++)
+ path_count[i] = 0;
+}
+
+static int reverse_path_check_proc(void *priv, void *cookie, int call_nests)
+{
+ int error = 0;
+ struct file *file = priv;
+ struct file *child_file;
+ struct epitem *epi;
+
+ list_for_each_entry(epi, &file->f_ep_links, fllink) {
+ child_file = epi->ep->file;
+ if (is_file_epoll(child_file)) {
+ if (list_empty(&child_file->f_ep_links)) {
+ if (path_count_inc(call_nests)) {
+ error = -1;
+ break;
+ }
+ } else {
+ error = ep_call_nested(&poll_loop_ncalls,
+ EP_MAX_NESTS,
+ reverse_path_check_proc,
+ child_file, child_file,
+ current);
+ }
+ if (error != 0)
+ break;
+ } else {
+ printk(KERN_ERR "reverse_path_check_proc: "
+ "file is not an ep!\n");
+ }
+ }
+ return error;
+}
+
+/**
+ * reverse_path_check - The tfile_check_list is list of file *, which have
+ * links that are proposed to be newly added. We need to
+ * make sure that those added links don't add too many
+ * paths such that we will spend all our time waking up
+ * eventpoll objects.
+ *
+ * Returns: Returns zero if the proposed links don't create too many paths,
+ * -1 otherwise.
+ */
+static int reverse_path_check(void)
+{
+ int length = 0;
+ int error = 0;
+ struct file *current_file;
+
+ /* let's call this for all tfiles */
+ list_for_each_entry(current_file, &tfile_check_list, f_tfile_llink) {
+ length++;
+ path_count_init();
+ error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
+ reverse_path_check_proc, current_file,
+ current_file, current);
+ if (error)
+ break;
+ }
+ return error;
+}
+
/*
* Must be called with "mtx" held.
*/
@@ -1001,6 +1109,11 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,
*/
ep_rbtree_insert(ep, epi);

+ /* now check if we've created too many backpaths */
+ error = -EINVAL;
+ if (reverse_path_check())
+ goto error_remove_epi;
+
/* We have to drop the new item inside our item list to keep track of it */
spin_lock_irqsave(&ep->lock, flags);

@@ -1025,6 +1138,14 @@ static int ep_insert(struct eventpoll *ep, struct epoll_event *event,

return 0;

+error_remove_epi:
+ spin_lock(&tfile->f_lock);
+ if (ep_is_linked(&epi->fllink))
+ list_del_init(&epi->fllink);
+ spin_unlock(&tfile->f_lock);
+
+ rb_erase(&epi->rbn, &ep->rbr);
+
error_unregister:
ep_unregister_pollwait(ep, epi);

@@ -1251,18 +1372,36 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
int error = 0;
struct file *file = priv;
struct eventpoll *ep = file->private_data;
+ struct eventpoll *ep_tovisit;
struct rb_node *rbp;
struct epitem *epi;

mutex_lock_nested(&ep->mtx, call_nests + 1);
+ ep->visited = 1;
+ list_add(&ep->visited_list_link, &visited_list);
for (rbp = rb_first(&ep->rbr); rbp; rbp = rb_next(rbp)) {
epi = rb_entry(rbp, struct epitem, rbn);
if (unlikely(is_file_epoll(epi->ffd.file))) {
+ ep_tovisit = epi->ffd.file->private_data;
+ if (ep_tovisit->visited)
+ continue;
error = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
- ep_loop_check_proc, epi->ffd.file,
- epi->ffd.file->private_data, current);
+ ep_loop_check_proc, epi->ffd.file,
+ ep_tovisit, current);
if (error != 0)
break;
+ } else {
+ /*
+ * If we've reached a file that is not associated with
+ * an ep, then we need to check if the newly added
+ * links are going to add too many wakeup paths. We do
+ * this by adding it to the tfile_check_list, if it's
+ * not already there, and calling reverse_path_check()
+ * during ep_insert().
+ */
+ if (list_empty(&epi->ffd.file->f_tfile_llink))
+ list_add(&epi->ffd.file->f_tfile_llink,
+ &tfile_check_list);
}
}
mutex_unlock(&ep->mtx);
@@ -1283,8 +1422,31 @@ static int ep_loop_check_proc(void *priv, void *cookie, int call_nests)
*/
static int ep_loop_check(struct eventpoll *ep, struct file *file)
{
- return ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
+ int ret;
+ struct eventpoll *ep_cur, *ep_next;
+
+ ret = ep_call_nested(&poll_loop_ncalls, EP_MAX_NESTS,
ep_loop_check_proc, file, ep, current);
+ /* clear visited list */
+ list_for_each_entry_safe(ep_cur, ep_next, &visited_list,
+ visited_list_link) {
+ ep_cur->visited = 0;
+ list_del(&ep_cur->visited_list_link);
+ }
+ return ret;
+}
+
+static void clear_tfile_check_list(void)
+{
+ struct file *file;
+
+ /* first clear the tfile_check_list */
+ while (!list_empty(&tfile_check_list)) {
+ file = list_first_entry(&tfile_check_list, struct file,
+ f_tfile_llink);
+ list_del_init(&file->f_tfile_llink);
+ }
+ INIT_LIST_HEAD(&tfile_check_list);
}

/*
@@ -1292,8 +1454,9 @@ static int ep_loop_check(struct eventpoll *ep, struct file *file)
*/
SYSCALL_DEFINE1(epoll_create1, int, flags)
{
- int error;
+ int error, fd;
struct eventpoll *ep = NULL;
+ struct file *file;

/* Check the EPOLL_* constant for consistency. */
BUILD_BUG_ON(EPOLL_CLOEXEC != O_CLOEXEC);
@@ -1310,11 +1473,25 @@ SYSCALL_DEFINE1(epoll_create1, int, flags)
* Creates all the items needed to setup an eventpoll file. That is,
* a file structure and a free file descriptor.
*/
- error = anon_inode_getfd("[eventpoll]", &eventpoll_fops, ep,
- flags & O_CLOEXEC);
- if (error < 0)
- ep_free(ep);
-
+ fd = get_unused_fd_flags(O_RDWR | (flags & O_CLOEXEC));
+ if (fd < 0) {
+ error = fd;
+ goto out_free_ep;
+ }
+ file = anon_inode_getfile("[eventpoll]", &eventpoll_fops, ep,
+ O_RDWR | (flags & O_CLOEXEC));
+ if (IS_ERR(file)) {
+ error = PTR_ERR(file);
+ goto out_free_fd;
+ }
+ fd_install(fd, file);
+ ep->file = file;
+ return fd;
+
+out_free_fd:
+ put_unused_fd(fd);
+out_free_ep:
+ ep_free(ep);
return error;
}

@@ -1380,21 +1557,27 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
/*
* When we insert an epoll file descriptor, inside another epoll file
* descriptor, there is the change of creating closed loops, which are
- * better be handled here, than in more critical paths.
+ * better be handled here, than in more critical paths. While we are
+ * checking for loops we also determine the list of files reachable
+ * and hang them on the tfile_check_list, so we can check that we
+ * haven't created too many possible wakeup paths.
*
- * We hold epmutex across the loop check and the insert in this case, in
- * order to prevent two separate inserts from racing and each doing the
- * insert "at the same time" such that ep_loop_check passes on both
- * before either one does the insert, thereby creating a cycle.
+ * We need to hold the epmutex across both ep_insert and ep_remove
+ * b/c we want to make sure we are looking at a coherent view of
+ * epoll network.
*/
- if (unlikely(is_file_epoll(tfile) && op == EPOLL_CTL_ADD)) {
+ if (op == EPOLL_CTL_ADD || op == EPOLL_CTL_DEL) {
mutex_lock(&epmutex);
did_lock_epmutex = 1;
- error = -ELOOP;
- if (ep_loop_check(ep, tfile) != 0)
- goto error_tgt_fput;
}
-
+ if (op == EPOLL_CTL_ADD) {
+ if (is_file_epoll(tfile)) {
+ error = -ELOOP;
+ if (ep_loop_check(ep, tfile) != 0)
+ goto error_tgt_fput;
+ } else
+ list_add(&tfile->f_tfile_llink, &tfile_check_list);
+ }

mutex_lock_nested(&ep->mtx, 0);

@@ -1413,6 +1596,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
error = ep_insert(ep, &epds, tfile, fd);
} else
error = -EEXIST;
+ clear_tfile_check_list();
break;
case EPOLL_CTL_DEL:
if (epi)
@@ -1431,7 +1615,7 @@ SYSCALL_DEFINE4(epoll_ctl, int, epfd, int, op, int, fd,
mutex_unlock(&ep->mtx);

error_tgt_fput:
- if (unlikely(did_lock_epmutex))
+ if (did_lock_epmutex)
mutex_unlock(&epmutex);

fput(tfile);
diff --git a/include/linux/eventpoll.h b/include/linux/eventpoll.h
index f6856a5..ca399c5 100644
--- a/include/linux/eventpoll.h
+++ b/include/linux/eventpoll.h
@@ -61,6 +61,7 @@ struct file;
static inline void eventpoll_init_file(struct file *file)
{
INIT_LIST_HEAD(&file->f_ep_links);
+ INIT_LIST_HEAD(&file->f_tfile_llink);
}

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1b9a47a..860cb6d 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -941,6 +941,7 @@ struct file {
#ifdef CONFIG_EPOLL
/* Used by fs/eventpoll.c to link all the hooks to this file */
struct list_head f_ep_links;
+ struct list_head f_tfile_llink;
#endif /* #ifdef CONFIG_EPOLL */
struct address_space *f_mapping;
#ifdef CONFIG_DEBUG_WRITECOUNT

Willy Tarreau

unread,

Oct 1, 2012, 7:02:15 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Lin Ming, Paul Moore, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Paul Moore <pmo...@redhat.com>

[ Upstream commit 89d7ae34cdda4195809a5a987f697a517a2a3177 ]

As reported by Alan Cox, and verified by Lin Ming, when a user
attempts to add a CIPSO option to a socket using the CIPSO_V4_TAG_LOCAL
tag the kernel dies a terrible death when it attempts to follow a NULL
pointer (the skb argument to cipso_v4_validate() is NULL when called via
the setsockopt() syscall).

This patch fixes this by first checking to ensure that the skb is
non-NULL before using it to find the incoming network interface. In
the unlikely case where the skb is NULL and the user attempts to add
a CIPSO option with the _TAG_LOCAL tag we return an error as this is
not something we want to allow.

A simple reproducer, kindly supplied by Lin Ming, although you must
have the CIPSO DOI #3 configure on the system first or you will be
caught early in cipso_v4_validate():

#include <sys/types.h>
#include <sys/socket.h>
#include <linux/ip.h>
#include <linux/in.h>
#include <string.h>

struct local_tag {
char type;
char length;
char info[4];
};

struct cipso {
char type;
char length;
char doi[4];
struct local_tag local;
};

int main(int argc, char **argv)
{
int sockfd;
struct cipso cipso = {
.type = IPOPT_CIPSO,
.length = sizeof(struct cipso),
.local = {
.type = 128,
.length = sizeof(struct local_tag),
},
};

memset(cipso.doi, 0, 4);
cipso.doi[3] = 3;

sockfd = socket(AF_INET, SOCK_DGRAM, 0);
#define SOL_IP 0
setsockopt(sockfd, SOL_IP, IP_OPTIONS,
&cipso, sizeof(struct cipso));

return 0;
}

CC: Lin Ming <ml...@ss.pku.edu.cn>
Reported-by: Alan Cox <al...@lxorguk.ukuu.org.uk>
Signed-off-by: Paul Moore <pmo...@redhat.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/cipso_ipv4.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
index 039cc1f..10f8f8d 100644
--- a/net/ipv4/cipso_ipv4.c
+++ b/net/ipv4/cipso_ipv4.c
@@ -1726,8 +1726,10 @@ int cipso_v4_validate(const struct sk_buff *skb, unsigned char **option)
case CIPSO_V4_TAG_LOCAL:
/* This is a non-standard tag that we only allow for
* local connections, so if the incoming interface is
- * not the loopback device drop the packet. */
- if (!(skb->dev->flags & IFF_LOOPBACK)) {
+ * not the loopback device drop the packet. Further,
+ * there is no legitimate reason for setting this from
+ * userspace so reject it if skb is NULL. */
+ if (skb == NULL || !(skb->dev->flags & IFF_LOOPBACK)) {
err_offset = opt_iter;
goto validate_return_locked;

Willy Tarreau

unread,

Oct 1, 2012, 7:02:20 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, John Stultz, Peter Zijlstra, Prarit Bhargava, Thomas Gleixner, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john...@us.ibm.com>

This is a backport of f55a6faa384304c89cfef162768e88374d3312cb

clock_was_set() cannot be called from hard interrupt context because
it calls on_each_cpu().

For fixing the widely reported leap seconds issue it is necessary to
call it from hard interrupt context, i.e. the timer tick code, which
does the timekeeping updates.

Provide a new function which denotes it in the hrtimer cpu base
structure of the cpu on which it is called and raise the hrtimer
softirq. We then execute the clock_was_set() notificiation from
softirq context in run_hrtimer_softirq(). The hrtimer softirq is
rarely used, so polling the flag there is not a performance issue.

[ tglx: Made it depend on CONFIG_HIGH_RES_TIMERS. We really should get
rid of all this ifdeffery ASAP ]

Signed-off-by: John Stultz <john...@us.ibm.com>
Reported-by: Jan Engelhardt <jen...@inai.de>

Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Acked-by: Prarit Bhargava <pra...@redhat.com>
Cc: sta...@vger.kernel.org

Link: http://lkml.kernel.org/r/1341960205-56738-2-gi...@us.ibm.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>

Signed-off-by: John Stultz <john...@us.ibm.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/hrtimer.h | 7 +++++++
kernel/hrtimer.c | 20 ++++++++++++++++++++
2 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 040b679..a7f48af 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,6 +159,7 @@ struct hrtimer_clock_base {
* and timers
* @clock_base: array of clock bases for this cpu
* @curr_timer: the timer which is executing a callback right now
+ * @clock_was_set: Indicates that clock was set from irq context.
* @expires_next: absolute time of the next event which was scheduled
* via clock_set_next_event()
* @hres_active: State of high resolution mode
@@ -171,6 +172,7 @@ struct hrtimer_clock_base {
struct hrtimer_cpu_base {
spinlock_t lock;
struct hrtimer_clock_base clock_base[HRTIMER_MAX_CLOCK_BASES];
+ unsigned int clock_was_set;
#ifdef CONFIG_HIGH_RES_TIMERS
ktime_t expires_next;
int hres_active;
@@ -280,6 +282,8 @@ extern void hrtimer_peek_ahead_timers(void);
# define MONOTONIC_RES_NSEC HIGH_RES_NSEC
# define KTIME_MONOTONIC_RES KTIME_HIGH_RES

+extern void clock_was_set_delayed(void);
+
#else

# define MONOTONIC_RES_NSEC LOW_RES_NSEC
@@ -308,6 +312,9 @@ static inline int hrtimer_is_hres_active(struct hrtimer *timer)
{
return 0;
}
+
+static inline void clock_was_set_delayed(void) { }
+
#endif

extern ktime_t ktime_get(void);
diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
index a6e9d00..c4acec7 100644
--- a/kernel/hrtimer.c
+++ b/kernel/hrtimer.c
@@ -738,6 +738,19 @@ static int hrtimer_switch_to_hres(void)
return 1;
}

+/*
+ * Called from timekeeping code to reprogramm the hrtimer interrupt
+ * device. If called from the timer interrupt context we defer it to
+ * softirq context.
+ */
+void clock_was_set_delayed(void)
+{
+ struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+ cpu_base->clock_was_set = 1;
+ __raise_softirq_irqoff(HRTIMER_SOFTIRQ);
+}
+
#else

static inline int hrtimer_hres_active(void) { return 0; }
@@ -1393,6 +1406,13 @@ void hrtimer_peek_ahead_timers(void)

static void run_hrtimer_softirq(struct softirq_action *h)
{
+ struct hrtimer_cpu_base *cpu_base = &__get_cpu_var(hrtimer_bases);
+
+ if (cpu_base->clock_was_set) {
+ cpu_base->clock_was_set = 0;
+ clock_was_set();
+ }
+
hrtimer_peek_ahead_timers();

Willy Tarreau

unread,

Oct 1, 2012, 7:02:29 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Tony Luck, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony...@intel.com>

commit d114a33387472555188f142ed8e98acdb8181c6d upstream.

Send the entire DMI (SMBIOS) table to the /dev/random driver to
help seed its pools.

Signed-off-by: Tony Luck <tony...@intel.com>

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/firmware/dmi_scan.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/firmware/dmi_scan.c b/drivers/firmware/dmi_scan.c
index 3a2ccb0..10a4246 100644
--- a/drivers/firmware/dmi_scan.c
+++ b/drivers/firmware/dmi_scan.c
@@ -6,6 +6,7 @@
#include <linux/efi.h>
#include <linux/bootmem.h>
#include <linux/slab.h>
+#include <linux/random.h>
#include <asm/dmi.h>

/*
@@ -111,6 +112,8 @@ static int __init dmi_walk_early(void (*decode)(const struct dmi_header *,

dmi_table(buf, dmi_len, dmi_num, decode, NULL);

+ add_device_randomness(buf, dmi_len);
+
dmi_iounmap(buf, dmi_len);
return 0;

Willy Tarreau

unread,

Oct 1, 2012, 7:02:35 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Oleg Nesterov, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <ol...@redhat.com>

commit d80e731ecab420ddcb79ee9d0ac427acbc187b4b upstream.

This patch is intentionally incomplete to simplify the review.
It ignores ep_unregister_pollwait() which plays with the same wqh.
See the next change.

epoll assumes that the EPOLL_CTL_ADD'ed file controls everything
f_op->poll() needs. In particular it assumes that the wait queue
can't go away until eventpoll_release(). This is not true in case
of signalfd, the task which does EPOLL_CTL_ADD uses its ->sighand
which is not connected to the file.

This patch adds the special event, POLLFREE, currently only for
epoll. It expects that init_poll_funcptr()'ed hook should do the
necessary cleanup. Perhaps it should be defined as EPOLLFREE in
eventpoll.

__cleanup_sighand() is changed to do wake_up_poll(POLLFREE) if
->signalfd_wqh is not empty, we add the new signalfd_cleanup()
helper.

ep_poll_callback(POLLFREE) simply does list_del_init(task_list).
This make this poll entry inconsistent, but we don't care. If you
share epoll fd which contains our sigfd with another process you
should blame yourself. signalfd is "really special". I simply do
not know how we can define the "right" semantics if it used with
epoll.

The main problem is, epoll calls signalfd_poll() once to establish
the connection with the wait queue, after that signalfd_poll(NULL)
returns the different/inconsistent results depending on who does
EPOLL_CTL_MOD/signalfd_read/etc. IOW: apart from sigmask, signalfd
has nothing to do with the file, it works with the current thread.

In short: this patch is the hack which tries to fix the symptoms.
It also assumes that nobody can take tasklist_lock under epoll
locks, this seems to be true.

Note:

- we do not have wake_up_all_poll() but wake_up_poll()
is fine, poll/epoll doesn't use WQ_FLAG_EXCLUSIVE.

- signalfd_cleanup() uses POLLHUP along with POLLFREE,
we need a couple of simple changes in eventpoll.c to
make sure it can't be "lost".

Reported-by: Maxime Bizon <mbi...@freebox.fr>
Signed-off-by: Oleg Nesterov <ol...@redhat.com>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/eventpoll.c | 4 ++++
fs/signalfd.c | 11 +++++++++++
include/asm-generic/poll.h | 2 ++
include/linux/signalfd.h | 5 ++++-
kernel/fork.c | 5 ++++-
5 files changed, 25 insertions(+), 2 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index f539204..15a7ef3 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -814,6 +814,10 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;

+ /* the caller holds eppoll_entry->whead->lock */
+ if ((unsigned long)key & POLLFREE)
+ list_del_init(&wait->task_list);
+
spin_lock_irqsave(&ep->lock, flags);

/*
diff --git a/fs/signalfd.c b/fs/signalfd.c
index d98bea8..6339cb4 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -29,6 +29,17 @@
#include <linux/signalfd.h>
#include <linux/syscalls.h>

+void signalfd_cleanup(struct sighand_struct *sighand)
+{
+ wait_queue_head_t *wqh = &sighand->signalfd_wqh;
+
+ if (likely(!waitqueue_active(wqh)))
+ return;
+
+ /* wait_queue_t->func(POLLFREE) should do remove_wait_queue() */
+ wake_up_poll(wqh, POLLHUP | POLLFREE);
+}
+
struct signalfd_ctx {
sigset_t sigmask;
};
diff --git a/include/asm-generic/poll.h b/include/asm-generic/poll.h
index 44bce83..9ce7f44 100644
--- a/include/asm-generic/poll.h
+++ b/include/asm-generic/poll.h
@@ -28,6 +28,8 @@
#define POLLRDHUP 0x2000
#endif

+#define POLLFREE 0x4000 /* currently only for epoll */
+
struct pollfd {
int fd;
short events;
diff --git a/include/linux/signalfd.h b/include/linux/signalfd.h
index b363b91..ed9b65e 100644
--- a/include/linux/signalfd.h
+++ b/include/linux/signalfd.h
@@ -60,13 +60,16 @@ static inline void signalfd_notify(struct task_struct *tsk, int sig)
wake_up(&tsk->sighand->signalfd_wqh);
}

+extern void signalfd_cleanup(struct sighand_struct *sighand);
+
#else /* CONFIG_SIGNALFD */

static inline void signalfd_notify(struct task_struct *tsk, int sig) { }

+static inline void signalfd_cleanup(struct sighand_struct *sighand) { }
+
#endif /* CONFIG_SIGNALFD */

#endif /* __KERNEL__ */

#endif /* _LINUX_SIGNALFD_H */
-
diff --git a/kernel/fork.c b/kernel/fork.c
index cd075bc..c28f804 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -64,6 +64,7 @@
#include <linux/magic.h>
#include <linux/perf_event.h>
#include <linux/posix-timers.h>
+#include <linux/signalfd.h>

#include <asm/pgtable.h>
#include <asm/pgalloc.h>
@@ -815,8 +816,10 @@ static int copy_sighand(unsigned long clone_flags, struct task_struct *tsk)

void __cleanup_sighand(struct sighand_struct *sighand)
{
- if (atomic_dec_and_test(&sighand->count))
+ if (atomic_dec_and_test(&sighand->count)) {
+ signalfd_cleanup(sighand);
kmem_cache_free(sighand_cachep, sighand);
+ }

Willy Tarreau

unread,

Oct 1, 2012, 7:02:41 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Tony Zelenoff, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Tony Zelenoff <ant...@parallels.com>

[ Upstream commit 03662e41c7cff64a776bfb1b3816de4be43de881 ]

Problem:
There was two separate work_struct structures which share one
handler. Unfortunately getting atl1_adapter structure from
work_struct in case of DMA error was done from incorrect
offset which cause kernel panics.

Solution:
The useless work_struct for DMA error removed and
handler name changed to more generic one.

Signed-off-by: Tony Zelenoff <ant...@parallels.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/atlx/atl1.c | 12 +++++-------
drivers/net/atlx/atl1.h | 3 +--
drivers/net/atlx/atlx.c | 2 +-
3 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/net/atlx/atl1.c b/drivers/net/atlx/atl1.c
index 403bfb6..adc862f 100644
--- a/drivers/net/atlx/atl1.c
+++ b/drivers/net/atlx/atl1.c
@@ -2478,7 +2478,7 @@ static irqreturn_t atl1_intr(int irq, void *data)
"pcie phy link down %x\n", status);
if (netif_running(adapter->netdev)) { /* reset MAC */
iowrite32(0, adapter->hw.hw_addr + REG_IMR);
- schedule_work(&adapter->pcie_dma_to_rst_task);
+ schedule_work(&adapter->reset_dev_task);
return IRQ_HANDLED;
}
}
@@ -2490,7 +2490,7 @@ static irqreturn_t atl1_intr(int irq, void *data)
"pcie DMA r/w error (status = 0x%x)\n",
status);
iowrite32(0, adapter->hw.hw_addr + REG_IMR);
- schedule_work(&adapter->pcie_dma_to_rst_task);
+ schedule_work(&adapter->reset_dev_task);
return IRQ_HANDLED;
}

@@ -2635,10 +2635,10 @@ static void atl1_down(struct atl1_adapter *adapter)
atl1_clean_rx_ring(adapter);
}

-static void atl1_tx_timeout_task(struct work_struct *work)
+static void atl1_reset_dev_task(struct work_struct *work)
{
struct atl1_adapter *adapter =
- container_of(work, struct atl1_adapter, tx_timeout_task);
+ container_of(work, struct atl1_adapter, reset_dev_task);
struct net_device *netdev = adapter->netdev;

netif_device_detach(netdev);
@@ -3050,12 +3050,10 @@ static int __devinit atl1_probe(struct pci_dev *pdev,
(unsigned long)adapter);
adapter->phy_timer_pending = false;

- INIT_WORK(&adapter->tx_timeout_task, atl1_tx_timeout_task);
+ INIT_WORK(&adapter->reset_dev_task, atl1_reset_dev_task);

INIT_WORK(&adapter->link_chg_task, atlx_link_chg_task);

- INIT_WORK(&adapter->pcie_dma_to_rst_task, atl1_tx_timeout_task);
-
err = register_netdev(netdev);
if (err)
goto err_common;
diff --git a/drivers/net/atlx/atl1.h b/drivers/net/atlx/atl1.h
index 146372f..0494e514 100644
--- a/drivers/net/atlx/atl1.h
+++ b/drivers/net/atlx/atl1.h
@@ -762,9 +762,8 @@ struct atl1_adapter {
u16 link_speed;
u16 link_duplex;
spinlock_t lock;
- struct work_struct tx_timeout_task;
+ struct work_struct reset_dev_task;
struct work_struct link_chg_task;
- struct work_struct pcie_dma_to_rst_task;

struct timer_list phy_config_timer;
bool phy_timer_pending;
diff --git a/drivers/net/atlx/atlx.c b/drivers/net/atlx/atlx.c
index 3dc0142..ce09b95 100644
--- a/drivers/net/atlx/atlx.c
+++ b/drivers/net/atlx/atlx.c
@@ -189,7 +189,7 @@ static void atlx_tx_timeout(struct net_device *netdev)
{
struct atlx_adapter *adapter = netdev_priv(netdev);
/* Do the reset outside of interrupt context */
- schedule_work(&adapter->tx_timeout_task);
+ schedule_work(&adapter->reset_dev_task);
}

/*

Willy Tarreau

unread,

Oct 1, 2012, 7:02:47 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Davide Ciminaghi, Raffaele Recalcati, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Davide Ciminaghi <cimi...@gnudd.com>

[ Upstream commit 8a9a0ea6032186e3030419262678d652b88bf6a8 ]

At the beginning of ks_rcv(), a for loop retrieves the
header information relevant to all the frames stored
in the mac's internal buffers. The number of pending
frames is stored as an 8 bits field in KS_RXFCTR.
If interrupts are disabled long enough to allow for more than
32 frames to accumulate in the MAC's internal buffers, a buffer
overflow occurs.
This patch fixes the problem by making the
driver's frame_head_info buffer big enough.
Well actually, since the chip appears to have 12K of
internal rx buffers and the shortest ethernet frame should
be 64 bytes long, maybe the limit could be set to
12*1024/64 = 192 frames, but 255 should be safer.

Signed-off-by: Davide Ciminaghi <cimi...@gnudd.com>
Signed-off-by: Raffaele Recalcati <raffaele....@bticino.it>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/ks8851_mll.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/ks8851_mll.c b/drivers/net/ks8851_mll.c
index c0ceebc..4e3a69c 100644
--- a/drivers/net/ks8851_mll.c
+++ b/drivers/net/ks8851_mll.c
@@ -35,7 +35,7 @@
#define DRV_NAME "ks8851_mll"

static u8 KS_DEFAULT_MAC_ADDRESS[] = { 0x00, 0x10, 0xA1, 0x86, 0x95, 0x11 };
-#define MAX_RECV_FRAMES 32
+#define MAX_RECV_FRAMES 255
#define MAX_BUF_SIZE 2048
#define TX_BUF_SIZE 2000
#define RX_BUF_SIZE 2000

Willy Tarreau

unread,

Oct 1, 2012, 7:02:52 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Kent Yoder, Herbert Xu, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Kent Yoder <k...@linux.vnet.ibm.com>

commit 25c3d30c918207556ae1d6e663150ebdf902186b upstream.

The current code only increments the upper 64 bits of the SHA-512 byte
counter when the number of bytes hashed happens to hit 2^64 exactly.

This patch increments the upper 64 bits whenever the lower 64 bits
overflows.

Signed-off-by: Kent Yoder <k...@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

crypto/sha512_generic.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/crypto/sha512_generic.c b/crypto/sha512_generic.c
index 107f6f7..dd30f40 100644
--- a/crypto/sha512_generic.c
+++ b/crypto/sha512_generic.c
@@ -174,7 +174,7 @@ sha512_update(struct shash_desc *desc, const u8 *data, unsigned int len)
index = sctx->count[0] & 0x7f;

/* Update number of bytes */
- if (!(sctx->count[0] += len))
+ if ((sctx->count[0] += len) < len)
sctx->count[1]++;

part_len = 128 - index;

Willy Tarreau

unread,

Oct 1, 2012, 7:02:57 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Johan Hovold, Marcel Holtmann, Johan Hedberg, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Johan Hovold <jho...@gmail.com>

commit 33b69bf80a3704d45341928e4ff68b6ebd470686 upstream.

Do not close protocol driver until device has been unregistered.

This fixes a race between tty_close and hci_dev_open which can result in
a NULL-pointer dereference.

The line discipline closes the protocol driver while we may still have
hci_dev_open sleeping on the req_lock mutex resulting in a NULL-pointer
dereference when lock is acquired and hci_init_req called.

Bug is 100% reproducible using hciattach and a disconnected serial port:

0. # hciattach -n ttyO1 any noflow

1. hci_dev_open called from hci_power_on grabs req lock
2. hci_init_req executes but device fails to initialise (times out
eventually)
3. hci_dev_open is called from hci_sock_ioctl and sleeps on req lock
4. hci_uart_tty_close detaches protocol driver and cancels init req
5. hci_dev_open (1) releases req lock
6. hci_dev_open (3) grabs req lock, calls hci_init_req, which triggers oops
when request is prepared in hci_uart_send_frame

[ 137.201263] Unable to handle kernel NULL pointer dereference at virtual address 00000028
[ 137.209838] pgd = c0004000
[ 137.212677] [00000028] *pgd=00000000
[ 137.216430] Internal error: Oops: 17 [#1]
[ 137.220642] Modules linked in:
[ 137.223846] CPU: 0 Tainted: G W (3.3.0-rc6-dirty #406)
[ 137.230529] PC is at __lock_acquire+0x5c/0x1ab0
[ 137.235290] LR is at lock_acquire+0x9c/0x128
[ 137.239776] pc : [<c0071490>] lr : [<c00733f8>] psr: 20000093
[ 137.239776] sp : cf869dd8 ip : c0529554 fp : c051c730
[ 137.251800] r10: 00000000 r9 : cf8673c0 r8 : 00000080
[ 137.257293] r7 : 00000028 r6 : 00000002 r5 : 00000000 r4 : c053fd70
[ 137.264129] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : 00000001
[ 137.270965] Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel
[ 137.278717] Control: 10c5387d Table: 8f0f4019 DAC: 00000015
[ 137.284729] Process kworker/u:1 (pid: 7, stack limit = 0xcf8682e8)
[ 137.291229] Stack: (0xcf869dd8 to 0xcf86a000)
[ 137.295776] 9dc0: c0529554 00000000
[ 137.304351] 9de0: cf8673c0 cf868000 d03ea1ef cf868000 000001ef 00000470 00000000 00000002
[ 137.312927] 9e00: cf8673c0 00000001 c051c730 c00716ec 0000000c 00000440 c0529554 00000001
[ 137.321533] 9e20: c051c730 cf868000 d03ea1f3 00000000 c053b978 00000000 00000028 cf868000
[ 137.330078] 9e40: 00000000 00000000 00000002 00000000 00000000 c00733f8 00000002 00000080
[ 137.338684] 9e60: 00000000 c02a1d50 00000000 00000001 60000013 c0969a1c 60000093 c053b96c
[ 137.347259] 9e80: 00000002 00000018 20000013 c02a1d50 cf0ac000 00000000 00000002 cf868000
[ 137.355834] 9ea0: 00000089 c0374130 00000002 00000000 c02a1d50 cf0ac000 0000000c cf0fc540
[ 137.364410] 9ec0: 00000018 c02a1d50 cf0fc540 00000000 cf0fc540 c0282238 c028220c cf178d80
[ 137.372985] 9ee0: 127525d8 c02821cc 9a1fa451 c032727c 9a1fa451 127525d8 cf0fc540 cf0ac4ec
[ 137.381561] 9f00: cf0ac000 cf0fc540 cf0ac584 c03285f4 c0328580 cf0ac4ec cf85c740 c05510cc
[ 137.390136] 9f20: ce825400 c004c914 00000002 00000000 c004c884 ce8254f5 cf869f48 00000000
[ 137.398712] 9f40: c0328580 ce825415 c0a7f914 c061af64 00000000 c048cf3c cf8673c0 cf85c740
[ 137.407287] 9f60: c05510cc c051a66c c05510ec c05510c4 cf85c750 cf868000 00000089 c004d6ac
[ 137.415863] 9f80: 00000000 c0073d14 00000001 cf853ed8 cf85c740 c004d558 00000013 00000000
[ 137.424438] 9fa0: 00000000 00000000 00000000 c00516b0 00000000 00000000 cf85c740 00000000
[ 137.433013] 9fc0: 00000001 dead4ead ffffffff ffffffff c0551674 00000000 00000000 c0450aa4
[ 137.441589] 9fe0: cf869fe0 cf869fe0 cf853ed8 c005162c c0013b30 c0013b30 00ffff00 00ffff00
[ 137.450164] [<c0071490>] (__lock_acquire+0x5c/0x1ab0) from [<c00733f8>] (lock_acquire+0x9c/0x128)
[ 137.459503] [<c00733f8>] (lock_acquire+0x9c/0x128) from [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58)
[ 137.469360] [<c0374130>] (_raw_spin_lock_irqsave+0x44/0x58) from [<c02a1d50>] (skb_queue_tail+0x18/0x48)
[ 137.479339] [<c02a1d50>] (skb_queue_tail+0x18/0x48) from [<c0282238>] (h4_enqueue+0x2c/0x34)
[ 137.488189] [<c0282238>] (h4_enqueue+0x2c/0x34) from [<c02821cc>] (hci_uart_send_frame+0x34/0x68)
[ 137.497497] [<c02821cc>] (hci_uart_send_frame+0x34/0x68) from [<c032727c>] (hci_send_frame+0x50/0x88)
[ 137.507171] [<c032727c>] (hci_send_frame+0x50/0x88) from [<c03285f4>] (hci_cmd_work+0x74/0xd4)
[ 137.516204] [<c03285f4>] (hci_cmd_work+0x74/0xd4) from [<c004c914>] (process_one_work+0x1a0/0x4ec)
[ 137.525604] [<c004c914>] (process_one_work+0x1a0/0x4ec) from [<c004d6ac>] (worker_thread+0x154/0x344)
[ 137.535278] [<c004d6ac>] (worker_thread+0x154/0x344) from [<c00516b0>] (kthread+0x84/0x90)
[ 137.543975] [<c00516b0>] (kthread+0x84/0x90) from [<c0013b30>] (kernel_thread_exit+0x0/0x8)
[ 137.552734] Code: e59f4e5c e5941000 e3510000 0a000031 (e5971000)
[ 137.559234] ---[ end trace 1b75b31a2719ed1e ]---

Signed-off-by: Johan Hovold <jho...@gmail.com>
Acked-by: Marcel Holtmann <mar...@holtmann.org>
Signed-off-by: Johan Hedberg <johan....@intel.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/bluetooth/hci_ldisc.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index e6f67b6..d68e2f5 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -312,11 +312,11 @@ static void hci_uart_tty_close(struct tty_struct *tty)
hci_uart_close(hdev);

if (test_and_clear_bit(HCI_UART_PROTO_SET, &hu->flags)) {
- hu->proto->close(hu);
if (hdev) {
hci_unregister_dev(hdev);
hci_free_dev(hdev);
}
+ hu->proto->close(hu);

Willy Tarreau

unread,

Oct 1, 2012, 7:03:05 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Thomas Gleixner, John Stultz, Peter Zijlstra, Prarit Bhargava, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tg...@linutronix.de>

This is a backport of 5b9fe759a678e05be4937ddf03d50e950207c1c0

We need to update the hrtimer clock offsets from the hrtimer interrupt
context. To avoid conversions from timespec to ktime_t maintain a
ktime_t based representation of those offsets in the timekeeper. This
puts the conversion overhead into the code which updates the
underlying offsets and provides fast accessible values in the hrtimer
interrupt.

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: John Stultz <john...@us.ibm.com>

Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Acked-by: Prarit Bhargava <pra...@redhat.com>
Cc: sta...@vger.kernel.org

Link: http://lkml.kernel.org/r/1341960205-56738-4-gi...@us.ibm.com

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john...@us.ibm.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/timekeeping.c | 25 ++++++++++++++++++++++++-
1 files changed, 24 insertions(+), 1 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 1e9808d..c7fbc9f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -161,18 +161,34 @@ struct timespec xtime __attribute__ ((aligned (16)));
struct timespec wall_to_monotonic __attribute__ ((aligned (16)));
static struct timespec total_sleep_time;

+/* Offset clock monotonic -> clock realtime */
+static ktime_t offs_real;
+
+/* Offset clock monotonic -> clock boottime */
+static ktime_t offs_boot;
+
/*
* The raw monotonic time for the CLOCK_MONOTONIC_RAW posix clock.
*/
struct timespec raw_time;

/* must hold write on xtime_lock */
+static void update_rt_offset(void)
+{
+ struct timespec tmp, *wtm = &wall_to_monotonic;
+
+ set_normalized_timespec(&tmp, -wtm->tv_sec, -wtm->tv_nsec);
+ offs_real = timespec_to_ktime(tmp);
+}
+
+/* must hold write on xtime_lock */
static void timekeeping_update(bool clearntp)
{
if (clearntp) {
timekeeper.ntp_error = 0;
ntp_clear();
}
+ update_rt_offset();
update_vsyscall(&xtime, timekeeper.clock, timekeeper.mult);
}

@@ -576,6 +592,7 @@ void __init timekeeping_init(void)
set_normalized_timespec(&wall_to_monotonic,
-boot.tv_sec, -boot.tv_nsec);
update_xtime_cache(0);
+ update_rt_offset();
total_sleep_time.tv_sec = 0;
total_sleep_time.tv_nsec = 0;
write_sequnlock_irqrestore(&xtime_lock, flags);
@@ -584,6 +601,12 @@ void __init timekeeping_init(void)
/* time in seconds when suspend began */
static struct timespec timekeeping_suspend_time;

+static void update_sleep_time(struct timespec t)
+{
+ total_sleep_time = t;
+ offs_boot = timespec_to_ktime(t);
+}
+
/**
* timekeeping_resume - Resumes the generic timekeeping subsystem.
* @dev: unused
@@ -607,7 +630,7 @@ static int timekeeping_resume(struct sys_device *dev)
ts = timespec_sub(ts, timekeeping_suspend_time);
xtime = timespec_add_safe(xtime, ts);
wall_to_monotonic = timespec_sub(wall_to_monotonic, ts);
- total_sleep_time = timespec_add_safe(total_sleep_time, ts);
+ update_sleep_time(timespec_add_safe(total_sleep_time, ts));
}
update_xtime_cache(0);
/* re-base the last cycle value */

Willy Tarreau

unread,

Oct 1, 2012, 7:03:11 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Thomas Gleixner, Peter Zijlstra, Prarit Bhargava, John Stultz, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tg...@linutronix.de>

This is a backport of f6c06abfb3972ad4914cef57d8348fcb2932bc3b

To finally fix the infamous leap second issue and other race windows
caused by functions which change the offsets between the various time
bases (CLOCK_MONOTONIC, CLOCK_REALTIME and CLOCK_BOOTTIME) we need a
function which atomically gets the current monotonic time and updates
the offsets of CLOCK_REALTIME and CLOCK_BOOTTIME with minimalistic
overhead. The previous patch which provides ktime_t offsets allows us
to make this function almost as cheap as ktime_get() which is going to
be replaced in hrtimer_interrupt().

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Reviewed-by: Ingo Molnar <mi...@kernel.org>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Acked-by: Prarit Bhargava <pra...@redhat.com>
Cc: sta...@vger.kernel.org

Signed-off-by: John Stultz <john...@us.ibm.com>
Link: http://lkml.kernel.org/r/1341960205-56738-7-gi...@us.ibm.com

Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john...@us.ibm.com>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/hrtimer.h | 2 +-
kernel/time/timekeeping.c | 32 ++++++++++++++++++++++++++++++++
2 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index a7f48af..b4f0b3f 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -319,7 +319,7 @@ static inline void clock_was_set_delayed(void) { }

extern ktime_t ktime_get(void);
extern ktime_t ktime_get_real(void);
-
+extern ktime_t ktime_get_update_offsets(ktime_t *offs_real);

DECLARE_PER_CPU(struct tick_device, tick_cpu_device);

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index c7fbc9f..6054b94 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -943,3 +943,35 @@ struct timespec get_monotonic_coarse(void)
now.tv_nsec + mono.tv_nsec);
return now;
}
+
+#ifdef CONFIG_HIGH_RES_TIMERS
+/**
+ * ktime_get_update_offsets - hrtimer helper
+ * @real: pointer to storage for monotonic -> realtime offset
+ *
+ * Returns current monotonic time and updates the offsets
+ * Called from hrtimer_interupt() or retrigger_next_event()
+ */
+ktime_t ktime_get_update_offsets(ktime_t *real)
+{
+ ktime_t now;
+ unsigned int seq;
+ u64 secs, nsecs;
+
+ do {
+ seq = read_seqbegin(&xtime_lock);
+
+ secs = xtime.tv_sec;
+ nsecs = xtime.tv_nsec;
+ nsecs += timekeeping_get_ns();
+ /* If arch requires, add in gettimeoffset() */
+ nsecs += arch_gettimeoffset();
+
+ *real = offs_real;
+ } while (read_seqretry(&xtime_lock, seq));
+
+ now = ktime_add_ns(ktime_set(secs, 0), nsecs);
+ now = ktime_sub(now, *real);
+ return now;
+}
+#endif

Willy Tarreau

unread,

Oct 1, 2012, 7:03:19 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Tyler Hicks, Colin Ian King, Stefan Bader, Andy Whitcroft, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Colin Ian King <colin...@canonical.com>

BugLink: http://bugs.launchpad.net/bugs/745836

The ECRYPTFS_NEW_FILE crypt_stat flag is set upon creation of a new
eCryptfs file. When the flag is set, eCryptfs reads directly from the
lower filesystem when bringing a page up to date. This means that no
offset translation (for the eCryptfs file metadata in the lower file)
and no decryption is performed. The flag is cleared just before the
first write is completed (at the beginning of ecryptfs_write_begin()).

It was discovered that if a new file was created and then extended with
truncate, the ECRYPTFS_NEW_FILE flag was not cleared. If pages
corresponding to this file are ever reclaimed, any subsequent reads
would result in userspace seeing eCryptfs file metadata and encrypted
file contents instead of the expected decrypted file contents.

Data corruption is possible if the file is written to before the
eCryptfs directory is unmounted. The data written will be copied into
pages which have been read directly from the lower file rather than
zeroed pages, as would be expected after extending the file with
truncate.

This flag, and the functionality that used it, was removed in upstream
kernels in 2.6.39 with the following commits:

bd4f0fe8bb7c73c738e1e11bc90d6e2cf9c6e20e
fed8859b3ab94274c986cbdf7d27130e0545f02c

Signed-off-by: Tyler Hicks <tyh...@canonical.com>
Signed-off-by: Colin Ian King <colin...@canonical.com>
Acked-by: Stefan Bader <stefan...@canonical.com>
Acked-by: Andy Whitcroft <a...@canonical.com>
Signed-off-by: Andy Whitcroft <a...@canonical.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ecryptfs/inode.c | 3 +++

1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/fs/ecryptfs/inode.c b/fs/ecryptfs/inode.c
index 645da17..3c1dbc0 100644
--- a/fs/ecryptfs/inode.c
+++ b/fs/ecryptfs/inode.c
@@ -777,6 +777,9 @@ static int truncate_upper(struct dentry *dentry, struct iattr *ia,
goto out;
}
crypt_stat = &ecryptfs_inode_to_private(dentry->d_inode)->crypt_stat;
+ if (crypt_stat->flags & ECRYPTFS_NEW_FILE)
+ crypt_stat->flags &= ~(ECRYPTFS_NEW_FILE);
+
/* Set up a fake ecryptfs file, this is used to interface with
* the file in the underlying filesystem so that the
* truncation has an effect there as well. */

Willy Tarreau

unread,

Oct 1, 2012, 7:03:33 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Steffen Rumler, Paul Mackerras, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Steffen Rumler <steffen.r...@nsn.com>

commit 3c75296562f43e6fbc6cddd3de948a7b3e4e9bcf upstream.

This fixes a problem which can causes kernel oopses while loading
a kernel module.

According to the PowerPC EABI specification, GPR r11 is assigned
the dedicated function to point to the previous stack frame.
In the powerpc-specific kernel module loader, do_plt_call()
(in arch/powerpc/kernel/module_32.c), GPR r11 is also used
to generate trampoline code.

This combination crashes the kernel, in the case where the compiler
chooses to use a helper function for saving GPRs on entry, and the
module loader has placed the .init.text section far away from the
text section, meaning that it has to generate a trampoline for
functions in the .init.text section to call the GPR save helper.
Because the trampoline trashes r11, references to the stack frame
using r11 can cause an oops.

The fix just uses GPR r12 instead of GPR r11 for generating the
trampoline code. According to the statements from Freescale, this is
safe from an EABI perspective.

I've tested the fix for kernel 2.6.33 on MPC8541.

Signed-off-by: Steffen Rumler <steffen.r...@nsn.com>
[pau...@samba.org: reworded the description]
Signed-off-by: Paul Mackerras <pau...@samba.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/powerpc/kernel/module_32.c | 11 +++++------
1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index f832773..449a7e0 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -187,8 +187,8 @@ int apply_relocate(Elf32_Shdr *sechdrs,

static inline int entry_matches(struct ppc_plt_entry *entry, Elf32_Addr val)
{
- if (entry->jump[0] == 0x3d600000 + ((val + 0x8000) >> 16)
- && entry->jump[1] == 0x396b0000 + (val & 0xffff))
+ if (entry->jump[0] == 0x3d800000 + ((val + 0x8000) >> 16)
+ && entry->jump[1] == 0x398c0000 + (val & 0xffff))
return 1;
return 0;
}
@@ -215,10 +215,9 @@ static uint32_t do_plt_call(void *location,
entry++;
}

- /* Stolen from Paul Mackerras as well... */
- entry->jump[0] = 0x3d600000+((val+0x8000)>>16); /* lis r11,sym@ha */
- entry->jump[1] = 0x396b0000 + (val&0xffff); /* addi r11,r11,sym@l*/
- entry->jump[2] = 0x7d6903a6; /* mtctr r11 */
+ entry->jump[0] = 0x3d800000+((val+0x8000)>>16); /* lis r12,sym@ha */
+ entry->jump[1] = 0x398c0000 + (val&0xffff); /* addi r12,r12,sym@l*/
+ entry->jump[2] = 0x7d8903a6; /* mtctr r12 */
entry->jump[3] = 0x4e800420; /* bctr */

DEBUGP("Initialized plt for 0x%x at %p\n", val, entry);

Willy Tarreau

unread,

Oct 1, 2012, 7:03:42 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Mel Gorman, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mel Gorman <mgo...@suse.de>

commit bba3d8c3b3c0f2123be5bc687d1cddc13437c923 upstream.

The following build error occured during a parisc build with
swap-over-NFS patches applied.

net/core/sock.c:274:36: error: initializer element is not constant
net/core/sock.c:274:36: error: (near initialization for 'memalloc_socks')
net/core/sock.c:274:36: error: initializer element is not constant

Dave Anglin says:
> Here is the line in sock.i:
>
> struct static_key memalloc_socks = ((struct static_key) { .enabled =
> ((atomic_t) { (0) }) });

The above line contains two compound literals. It also uses a designated
initializer to initialize the field enabled. A compound literal is not a
constant expression.

The location of the above statement isn't fully clear, but if a compound
literal occurs outside the body of a function, the initializer list must
consist of constant expressions.

Reported-by: Fengguang Wu <fenggu...@intel.com>
Signed-off-by: Mel Gorman <mgo...@suse.de>
Signed-off-by: James Bottomley <JBott...@Parallels.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/parisc/include/asm/atomic.h | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/parisc/include/asm/atomic.h b/arch/parisc/include/asm/atomic.h
index 8bc9e96..6ee459d 100644
--- a/arch/parisc/include/asm/atomic.h
+++ b/arch/parisc/include/asm/atomic.h
@@ -248,7 +248,7 @@ static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)

#define atomic_sub_and_test(i,v) (atomic_sub_return((i),(v)) == 0)

-#define ATOMIC_INIT(i) ((atomic_t) { (i) })
+#define ATOMIC_INIT(i) { (i) }

#define smp_mb__before_atomic_dec() smp_mb()
#define smp_mb__after_atomic_dec() smp_mb()
@@ -257,7 +257,7 @@ static __inline__ int atomic_add_unless(atomic_t *v, int a, int u)

#ifdef CONFIG_64BIT

-#define ATOMIC64_INIT(i) ((atomic64_t) { (i) })
+#define ATOMIC64_INIT(i) { (i) }

static __inline__ int
__atomic64_add_return(s64 i, atomic64_t *v)

Willy Tarreau

unread,

Oct 1, 2012, 7:03:48 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Eric Dumazet, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.d...@gmail.com>

[ Upstream commit 03606895cd98c0a628b17324fd7b5ff15db7e3cd ]

Niccolo Belli reported ipsec crashes in case we handle a frame without
mac header (atm in his case)

Before copying mac header, better make sure it is present.

Bugzilla reference: https://bugzilla.kernel.org/show_bug.cgi?id=42809

Reported-by: Niccolò Belli <dark...@linuxsystems.it>
Tested-by: Niccolò Belli <dark...@linuxsystems.it>
Signed-off-by: Eric Dumazet <eric.d...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/skbuff.h | 10 ++++++++++
net/ipv4/xfrm4_mode_beet.c | 5 +----
net/ipv4/xfrm4_mode_tunnel.c | 6 ++----
net/ipv6/xfrm6_mode_beet.c | 6 +-----
net/ipv6/xfrm6_mode_tunnel.c | 6 ++----
5 files changed, 16 insertions(+), 17 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index bcdd660..4e647bb 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1312,6 +1312,16 @@ static inline void skb_set_mac_header(struct sk_buff *skb, const int offset)
}
#endif /* NET_SKBUFF_DATA_USES_OFFSET */

+static inline void skb_mac_header_rebuild(struct sk_buff *skb)
+{
+ if (skb_mac_header_was_set(skb)) {
+ const unsigned char *old_mac = skb_mac_header(skb);
+
+ skb_set_mac_header(skb, -skb->mac_len);
+ memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+ }
+}
+
static inline int skb_transport_offset(const struct sk_buff *skb)
{
return skb_transport_header(skb) - skb->data;
diff --git a/net/ipv4/xfrm4_mode_beet.c b/net/ipv4/xfrm4_mode_beet.c
index 6341818..e3db3f9 100644
--- a/net/ipv4/xfrm4_mode_beet.c
+++ b/net/ipv4/xfrm4_mode_beet.c
@@ -110,10 +110,7 @@ static int xfrm4_beet_input(struct xfrm_state *x, struct sk_buff *skb)

skb_push(skb, sizeof(*iph));
skb_reset_network_header(skb);
-
- memmove(skb->data - skb->mac_len, skb_mac_header(skb),
- skb->mac_len);
- skb_set_mac_header(skb, -skb->mac_len);
+ skb_mac_header_rebuild(skb);

xfrm4_beet_make_header(skb);

diff --git a/net/ipv4/xfrm4_mode_tunnel.c b/net/ipv4/xfrm4_mode_tunnel.c
index 3444f3b..5d1d1fd 100644
--- a/net/ipv4/xfrm4_mode_tunnel.c
+++ b/net/ipv4/xfrm4_mode_tunnel.c
@@ -65,7 +65,6 @@ static int xfrm4_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)

static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
{
- const unsigned char *old_mac;
int err = -EINVAL;

if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPIP)
@@ -83,10 +82,9 @@ static int xfrm4_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
if (!(x->props.flags & XFRM_STATE_NOECN))
ipip_ecn_decapsulate(skb);

- old_mac = skb_mac_header(skb);
- skb_set_mac_header(skb, -skb->mac_len);
- memmove(skb_mac_header(skb), old_mac, skb->mac_len);
skb_reset_network_header(skb);
+ skb_mac_header_rebuild(skb);
+
err = 0;

out:
diff --git a/net/ipv6/xfrm6_mode_beet.c b/net/ipv6/xfrm6_mode_beet.c
index bbd48b1..6cc7a45 100644
--- a/net/ipv6/xfrm6_mode_beet.c
+++ b/net/ipv6/xfrm6_mode_beet.c
@@ -82,7 +82,6 @@ static int xfrm6_beet_output(struct xfrm_state *x, struct sk_buff *skb)
static int xfrm6_beet_input(struct xfrm_state *x, struct sk_buff *skb)
{
struct ipv6hdr *ip6h;
- const unsigned char *old_mac;
int size = sizeof(struct ipv6hdr);
int err;

@@ -92,10 +91,7 @@ static int xfrm6_beet_input(struct xfrm_state *x, struct sk_buff *skb)

__skb_push(skb, size);
skb_reset_network_header(skb);
-
- old_mac = skb_mac_header(skb);
- skb_set_mac_header(skb, -skb->mac_len);
- memmove(skb_mac_header(skb), old_mac, skb->mac_len);
+ skb_mac_header_rebuild(skb);

xfrm6_beet_make_header(skb);

diff --git a/net/ipv6/xfrm6_mode_tunnel.c b/net/ipv6/xfrm6_mode_tunnel.c
index 3927832..672c0da 100644
--- a/net/ipv6/xfrm6_mode_tunnel.c
+++ b/net/ipv6/xfrm6_mode_tunnel.c
@@ -61,7 +61,6 @@ static int xfrm6_mode_tunnel_output(struct xfrm_state *x, struct sk_buff *skb)
static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
{
int err = -EINVAL;
- const unsigned char *old_mac;

if (XFRM_MODE_SKB_CB(skb)->protocol != IPPROTO_IPV6)
goto out;
@@ -78,10 +77,9 @@ static int xfrm6_mode_tunnel_input(struct xfrm_state *x, struct sk_buff *skb)
if (!(x->props.flags & XFRM_STATE_NOECN))
ipip6_ecn_decapsulate(skb);

- old_mac = skb_mac_header(skb);
- skb_set_mac_header(skb, -skb->mac_len);
- memmove(skb_mac_header(skb), old_mac, skb->mac_len);
skb_reset_network_header(skb);
+ skb_mac_header_rebuild(skb);
+
err = 0;

out:

Willy Tarreau

unread,

Oct 1, 2012, 7:03:59 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, sta...@kernel.org, Sebastian Andrzej Siewior, Alan Stern, Oliver Neukum, Ming Lei, Greg Kroah-Hartman, David S. Miller, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: tom.l...@gmail.com <tom.l...@gmail.com>

commit 0956a8c20b23d429e79ff86d4325583fc06f9eb4 upstream.

Commit 4231d47e6fe69f061f96c98c30eaf9fb4c14b96d(net/usbnet: avoid
recursive locking in usbnet_stop()) fixes the recursive locking
problem by releasing the skb queue lock, but it makes usb_unlink_urb
racing with defer_bh, and the URB to being unlinked may be freed before
or during calling usb_unlink_urb, so use-after-free problem may be
triggerd inside usb_unlink_urb.

The patch fixes the use-after-free problem by increasing URB
reference count with skb queue lock held before calling
usb_unlink_urb, so the URB won't be freed until return from
usb_unlink_urb.

Cc: sta...@kernel.org
Cc: Sebastian Andrzej Siewior <big...@linutronix.de>
Cc: Alan Stern <st...@rowland.harvard.edu>
Cc: Oliver Neukum <oli...@neukum.org>
Reported-by: Dave Jones <da...@redhat.com>
Signed-off-by: Ming Lei <tom.l...@gmail.com>
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
drivers/net/usb/usbnet.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index da33dce..a92a415 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -584,6 +584,14 @@ static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
entry = (struct skb_data *) skb->cb;
urb = entry->urb;

+ /*
+ * Get reference count of the URB to avoid it to be
+ * freed during usb_unlink_urb, which may trigger
+ * use-after-free problem inside usb_unlink_urb since
+ * usb_unlink_urb is always racing with .complete
+ * handler(include defer_bh).
+ */
+ usb_get_urb(urb);
spin_unlock_irqrestore(&q->lock, flags);
// during some PM-driven resume scenarios,
// these (async) unlinks complete immediately
@@ -592,6 +600,7 @@ static int unlink_urbs (struct usbnet *dev, struct sk_buff_head *q)
devdbg (dev, "unlink urb err, %d", retval);
else
count++;
+ usb_put_urb(urb);
spin_lock_irqsave(&q->lock, flags);
}
spin_unlock_irqrestore (&q->lock, flags);

Willy Tarreau

unread,

Oct 1, 2012, 7:04:00 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, H. Peter Anvin, Matt Mackall, Herbert Xu, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <h...@zytor.com>

commit 628c6246d47b85f5357298601df2444d7f4dd3fd upstream.

Architectural inlines to get random ints and longs using the RDRAND
instruction.

Intel has introduced a new RDRAND instruction, a Digital Random Number
Generator (DRNG), which is functionally an high bandwidth entropy
source, cryptographic whitener, and integrity monitor all built into
hardware. This enables RDRAND to be used directly, bypassing the
kernel random number pool.

For technical documentation, see:

http://software.intel.com/en-us/articles/download-the-latest-bull-mountain-software-implementation-guide/

In this patch, this is *only* used for the nonblocking random number
pool. RDRAND is a nonblocking source, similar to our /dev/urandom,
and is therefore not a direct replacement for /dev/random. The
architectural hooks presented in the previous patch only feed the
kernel internal users, which only use the nonblocking pool, and so
this is not a problem.

Since this instruction is available in userspace, there is no reason
to have a /dev/hw_rng device driver for the purpose of feeding rngd.
This is especially so since RDRAND is a nonblocking source, and needs
additional whitening and reduction (see the above technical
documentation for details) in order to be of "pure entropy source"
quality.

The CONFIG_EXPERT compile-time option can be used to disable this use
of RDRAND.

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>

Originally-by: Fenghua Yu <fengh...@intel.com>
Cc: Matt Mackall <m...@selenic.com>
Cc: Herbert Xu <her...@gondor.apana.org.au>
Cc: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/Kconfig | 9 +++++
arch/x86/include/asm/archrandom.h | 73 +++++++++++++++++++++++++++++++++++++
2 files changed, 82 insertions(+), 0 deletions(-)
create mode 100644 arch/x86/include/asm/archrandom.h

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 73ae02a..aa889d6 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1428,6 +1428,15 @@ config ARCH_USES_PG_UNCACHED
def_bool y
depends on X86_PAT

+config ARCH_RANDOM
+ def_bool y
+ prompt "x86 architectural random number generator" if EXPERT
+ ---help---
+ Enable the x86 architectural RDRAND instruction
+ (Intel Bull Mountain technology) to generate random numbers.
+ If supported, this is a high bandwidth, cryptographically
+ secure hardware random number generator.
+
config EFI
bool "EFI runtime service support"
depends on ACPI
diff --git a/arch/x86/include/asm/archrandom.h b/arch/x86/include/asm/archrandom.h
new file mode 100644
index 0000000..b7b5bc0
--- /dev/null
+++ b/arch/x86/include/asm/archrandom.h
@@ -0,0 +1,73 @@
+/*
+ * This file is part of the Linux kernel.
+ *
+ * Copyright (c) 2011, Intel Corporation
+ * Authors: Fenghua Yu <fengh...@intel.com>,
+ * H. Peter Anvin <h...@linux.intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef ASM_X86_ARCHRANDOM_H
+#define ASM_X86_ARCHRANDOM_H
+
+#include <asm/processor.h>
+#include <asm/cpufeature.h>
+#include <asm/alternative.h>
+#include <asm/nops.h>
+
+#define RDRAND_RETRY_LOOPS 10
+
+#define RDRAND_INT ".byte 0x0f,0xc7,0xf0"
+#ifdef CONFIG_X86_64
+# define RDRAND_LONG ".byte 0x48,0x0f,0xc7,0xf0"
+#else
+# define RDRAND_LONG RDRAND_INT
+#endif
+
+#ifdef CONFIG_ARCH_RANDOM
+
+#define GET_RANDOM(name, type, rdrand, nop) \
+static inline int name(type *v) \
+{ \
+ int ok; \
+ alternative_io("movl $0, %0\n\t" \
+ nop, \
+ "\n1: " rdrand "\n\t" \
+ "jc 2f\n\t" \
+ "decl %0\n\t" \
+ "jnz 1b\n\t" \
+ "2:", \
+ X86_FEATURE_RDRAND, \
+ ASM_OUTPUT2("=r" (ok), "=a" (*v)), \
+ "0" (RDRAND_RETRY_LOOPS)); \
+ return ok; \
+}
+
+#ifdef CONFIG_X86_64
+
+GET_RANDOM(arch_get_random_long, unsigned long, RDRAND_LONG, ASM_NOP5);
+GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP4);
+
+#else
+
+GET_RANDOM(arch_get_random_long, unsigned long, RDRAND_LONG, ASM_NOP3);
+GET_RANDOM(arch_get_random_int, unsigned int, RDRAND_INT, ASM_NOP3);
+
+#endif /* CONFIG_X86_64 */
+
+#endif /* CONFIG_ARCH_RANDOM */
+
+#endif /* ASM_X86_ARCHRANDOM_H */

Willy Tarreau

unread,

Oct 1, 2012, 7:04:05 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Greg Kroah-Hartman <gre...@linuxfoundation.org>

commit 656d2b3964a9d0f9864d472f8dfa2dd7dd42e6c0 upstream.

On some misconfigured ftdi_sio devices, if the manufacturer string is
NULL, the kernel will oops when the device is plugged in. This patch
fixes the problem.

Reported-by: Wojciech M Zabolotny <W.Zab...@elka.pw.edu.pl>
Tested-by: Wojciech M Zabolotny <W.Zab...@elka.pw.edu.pl>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/serial/ftdi_sio.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/serial/ftdi_sio.c b/drivers/usb/serial/ftdi_sio.c
index 0a1ccaa..c374beb 100644
--- a/drivers/usb/serial/ftdi_sio.c
+++ b/drivers/usb/serial/ftdi_sio.c
@@ -1784,7 +1784,8 @@ static int ftdi_8u2232c_probe(struct usb_serial *serial)

dbg("%s", __func__);

- if (strcmp(udev->manufacturer, "CALAO Systems") == 0)
+ if ((udev->manufacturer) &&
+ (strcmp(udev->manufacturer, "CALAO Systems") == 0))
return ftdi_jtag_probe(serial);

return 0;

Willy Tarreau

unread,

Oct 1, 2012, 7:04:30 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Theodore Tso, H. Peter Anvin, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit 3e88bdff1c65145f7ba297ccec69c774afe4c785 upstream.

If there is an architecture-specific random number generator (such as
RDRAND for Intel architectures), use it to initialize /dev/random's
entropy stores. Even in the worst case, if RDRAND is something like
AES(NSA_KEY, counter++), it won't hurt, and it will definitely help
against any other adversaries.

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Link: http://lkml.kernel.org/r/1324589281-31931-1-...@mit.edu

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>

Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 6 ++++++
1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index ac74029..d67890f 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -965,6 +965,7 @@ EXPORT_SYMBOL(get_random_bytes);
*/
static void init_std_data(struct entropy_store *r)
{
+ int i;
ktime_t now;
unsigned long flags;

@@ -974,6 +975,11 @@ static void init_std_data(struct entropy_store *r)

now = ktime_get_real();
mix_pool_bytes(r, &now, sizeof(now));
+ for (i = r->poolinfo->poolwords; i; i--) {
+ if (!arch_get_random_long(&flags))
+ break;
+ mix_pool_bytes(r, &flags, sizeof(flags));
+ }
mix_pool_bytes(r, utsname(), sizeof(*(utsname())));

Willy Tarreau

unread,

Oct 1, 2012, 7:04:43 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Ryusuke Konishi, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Ryusuke Konishi <konishi...@lab.ntt.co.jp>

commit d7178c79d9b7c5518f9943188091a75fc6ce0675 upstream.

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

BUG: unable to handle kernel NULL pointer dereference at 00000048
IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
*pde = 00000000
Oops: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
[<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
[<c0226636>] mount_fs+0x36/0x180
[<c023d961>] vfs_kern_mount+0x51/0xa0
[<c023ddae>] do_kern_mount+0x3e/0xe0
[<c023f189>] do_mount+0x169/0x700
[<c023fa9b>] sys_mount+0x6b/0xa0
[<c04abd1f>] sysenter_do_call+0x12/0x28
Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.

Reported-by: Slicky Devil <slick...@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi...@lab.ntt.co.jp>
Tested-by: Slicky Devil <slick...@gmail.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/nilfs2/the_nilfs.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/fs/nilfs2/the_nilfs.c b/fs/nilfs2/the_nilfs.c
index ad391a8..149a8a1 100644
--- a/fs/nilfs2/the_nilfs.c
+++ b/fs/nilfs2/the_nilfs.c
@@ -478,6 +478,7 @@ static int nilfs_load_super_block(struct the_nilfs *nilfs,
brelse(sbh[1]);
sbh[1] = NULL;
sbp[1] = NULL;
+ valid[1] = 0;
swp = 0;
}
if (!valid[swp]) {

Willy Tarreau

unread,

Oct 1, 2012, 7:05:04 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit adee11b2085bee90bd8f4f52123ffb07882d6256 upstream.

Check provided length of partition table so that (possibly maliciously)
corrupted partition table cannot cause accessing data beyond current buffer.

Signed-off-by: Jan Kara <ja...@suse.cz>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/udf/super.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index ee6b3af..0388d43 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -1249,6 +1249,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
struct genericPartitionMap *gpm;
uint16_t ident;
struct buffer_head *bh;
+ unsigned int table_len;
int ret = 0;

bh = udf_read_tagged(sb, block, block, &ident);
@@ -1257,6 +1258,14 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
BUG_ON(ident != TAG_IDENT_LVD);
lvd = (struct logicalVolDesc *)bh->b_data;

+ table_len = le32_to_cpu(lvd->mapTableLength);
+ if (sizeof(*lvd) + table_len > sb->s_blocksize) {
+ udf_error(sb, __func__, "error loading logical volume descriptor: "
+ "Partition table too long (%u > %lu)\n", table_len,
+ sb->s_blocksize - sizeof(*lvd));
+ goto out_bh;
+ }
+
i = udf_sb_alloc_partition_maps(sb, le32_to_cpu(lvd->numPartitionMaps));
if (i != 0) {
ret = i;
@@ -1264,7 +1273,7 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
}

for (i = 0, offset = 0;
- i < sbi->s_partitions && offset < le32_to_cpu(lvd->mapTableLength);
+ i < sbi->s_partitions && offset < table_len;
i++, offset += gpm->partitionMapLength) {
struct udf_part_map *map = &sbi->s_partmaps[i];
gpm = (struct genericPartitionMap *)

Willy Tarreau

unread,

Oct 1, 2012, 7:05:05 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Alex Elder, Carlos Maiolino, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Carlos Maiolino <cmai...@redhat.com>

commit b52a360b2aa1c59ba9970fb0f52bbb093fcc7a24 upstream

Fixes a possible memory corruption when the link is larger than
MAXPATHLEN and XFS_DEBUG is not enabled. This also remove the
S_ISLNK assert, since the inode mode is checked previously in
xfs_readlink_by_handle() and via VFS.

Updated to address concerns raised by Ben Hutchings about the loose
attention paid to 32- vs 64-bit values, and the lack of handling a
potentially negative pathlen value:
- Changed type of "pathlen" to be xfs_fsize_t, to match that of
ip->i_d.di_size
- Added checking for a negative pathlen to the too-long pathlen
test, and generalized the message that gets reported in that case
to reflect the change
As a result, if a negative pathlen were encountered, this function
would return EFSCORRUPTED (and would fail an assertion for a debug
build)--just as would a too-long pathlen.

Signed-off-by: Alex Elder <ael...@sgi.com>
Signed-off-by: Carlos Maiolino <cmai...@redhat.com>
Reviewed-by: Christoph Hellwig <h...@lst.de>

[dannf: backported to Debian's 2.6.32]

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/xfs/xfs_vnodeops.c | 15 +++++++++++----
1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/fs/xfs/xfs_vnodeops.c b/fs/xfs/xfs_vnodeops.c
index 8f32f50..1638884 100644
--- a/fs/xfs/xfs_vnodeops.c
+++ b/fs/xfs/xfs_vnodeops.c
@@ -554,7 +554,7 @@ xfs_readlink(
char *link)
{
xfs_mount_t *mp = ip->i_mount;
- int pathlen;
+ xfs_fsize_t pathlen;
int error = 0;

xfs_itrace_entry(ip);
@@ -564,13 +564,20 @@ xfs_readlink(

xfs_ilock(ip, XFS_ILOCK_SHARED);

- ASSERT((ip->i_d.di_mode & S_IFMT) == S_IFLNK);
- ASSERT(ip->i_d.di_size <= MAXPATHLEN);
-
pathlen = ip->i_d.di_size;
if (!pathlen)
goto out;

+ if (pathlen < 0 || pathlen > MAXPATHLEN) {
+ xfs_fs_cmn_err(CE_ALERT, mp,
+ "%s: inode (%llu) bad symlink length (%lld)",
+ __func__, (unsigned long long) ip->i_ino,
+ (long long) pathlen);
+ ASSERT(0);
+ return XFS_ERROR(EFSCORRUPTED);
+ }
+
+
if (ip->i_df.if_flags & XFS_IFINLINE) {
memcpy(link, ip->i_df.if_u1.if_data, pathlen);
link[pathlen] = '\0';

Willy Tarreau

unread,

Oct 1, 2012, 7:05:17 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, John Stultz, Peter Zijlstra, Prarit Bhargava, Zhouping Liu, Ingo Molnar, Thomas Gleixner, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john....@linaro.org>

This is a -stable backport of 4e8b14526ca7fb046a81c94002c1c43b6fdf0e9b

Unexpected behavior could occur if the time is set to a value large
enough to overflow a 64bit ktime_t (which is something larger then the
year 2262).

Also unexpected behavior could occur if large negative offsets are
injected via adjtimex.

So this patch improves the sanity check timekeeping inputs by
improving the timespec_valid() check, and then makes better use of
timespec_valid() to make sure we don't set the time to an invalid
negative value or one that overflows ktime_t.

Note: This does not protect from setting the time close to overflowing
ktime_t and then letting natural accumulation cause the overflow.

Reported-by: CAI Qian <cai...@redhat.com>
Reported-by: Sasha Levin <levins...@gmail.com>
Signed-off-by: John Stultz <john....@linaro.org>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Zhouping Liu <zl...@redhat.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: sta...@vger.kernel.org
Link: http://lkml.kernel.org/r/1344454580-17031-1-git...@linaro.org
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/ktime.h | 7 -------
include/linux/time.h | 22 ++++++++++++++++++++--
kernel/time/timekeeping.c | 15 ++++++++++++++-
3 files changed, 34 insertions(+), 10 deletions(-)

diff --git a/include/linux/ktime.h b/include/linux/ktime.h
index ce59832..ecdf64e 100644
--- a/include/linux/ktime.h
+++ b/include/linux/ktime.h
@@ -58,13 +58,6 @@ union ktime {

typedef union ktime ktime_t; /* Kill this */

-#define KTIME_MAX ((s64)~((u64)1 << 63))
-#if (BITS_PER_LONG == 64)
-# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
-#else
-# define KTIME_SEC_MAX LONG_MAX
-#endif
-
/*
* ktime_t definitions when using the 64-bit scalar representation:
*/
diff --git a/include/linux/time.h b/include/linux/time.h
index 6e026e4..146b6f3 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -91,11 +91,29 @@ static inline struct timespec timespec_sub(struct timespec lhs,
return ts_delta;
}

+#define KTIME_MAX ((s64)~((u64)1 << 63))
+#if (BITS_PER_LONG == 64)
+# define KTIME_SEC_MAX (KTIME_MAX / NSEC_PER_SEC)
+#else
+# define KTIME_SEC_MAX LONG_MAX
+#endif
+
/*
* Returns true if the timespec is norm, false if denorm:
*/
-#define timespec_valid(ts) \
- (((ts)->tv_sec >= 0) && (((unsigned long) (ts)->tv_nsec) < NSEC_PER_SEC))
+static inline bool timespec_valid(const struct timespec *ts)
+{
+ /* Dates before 1970 are bogus */
+ if (ts->tv_sec < 0)
+ return false;
+ /* Can't have more nanoseconds then a second */
+ if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)
+ return false;
+ /* Disallow values that could overflow ktime_t */
+ if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)
+ return false;
+ return true;
+}

extern struct timespec xtime;
extern struct timespec wall_to_monotonic;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 3f7e53f..85d51c4 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -354,7 +354,7 @@ int do_settimeofday(struct timespec *tv)
struct timespec ts_delta;
unsigned long flags;

- if ((unsigned long)tv->tv_nsec >= NSEC_PER_SEC)
+ if (!timespec_valid(tv))
return -EINVAL;

write_seqlock_irqsave(&xtime_lock, flags);
@@ -570,7 +570,20 @@ void __init timekeeping_init(void)
struct timespec now, boot;

read_persistent_clock(&now);
+ if (!timespec_valid(&now)) {
+ printk("WARNING: Persistent clock returned invalid value!\n"
+ " Check your CMOS/BIOS settings.\n");
+ now.tv_sec = 0;
+ now.tv_nsec = 0;
+ }
+
read_boot_clock(&boot);
+ if (!timespec_valid(&boot)) {
+ printk("WARNING: Boot clock returned invalid value!\n"
+ " Check your CMOS/BIOS settings.\n");
+ boot.tv_sec = 0;
+ boot.tv_nsec = 0;
+ }

write_seqlock_irqsave(&xtime_lock, flags);

Willy Tarreau

unread,

Oct 1, 2012, 7:05:28 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, H. Peter Anvin, Ingo Molnar, DJ Johnston, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <h...@linux.intel.com>

commit d2e7c96af1e54b507ae2a6a7dd2baf588417a7e5 upstream.

Mix in any architectural randomness in extract_buf() instead of
xfer_secondary_buf(). This allows us to mix in more architectural
randomness, and it also makes xfer_secondary_buf() faster, moving a
tiny bit of additional CPU overhead to process which is extracting the
randomness.

[ Commit description modified by tytso to remove an extended
advertisement for the RDRAND instruction. ]

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>

Acked-by: Ingo Molnar <mi...@kernel.org>
Cc: DJ Johnston <dj.jo...@intel.com>

Signed-off-by: Theodore Ts'o <ty...@mit.edu>

Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 56 ++++++++++++++++++++++++++++---------------------
1 files changed, 32 insertions(+), 24 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b038751..3ea1ddb 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -274,6 +274,8 @@
#define SEC_XFER_SIZE 512
#define EXTRACT_SIZE 10

+#define LONGS(x) (((x) + sizeof(unsigned long) - 1)/sizeof(unsigned long))
+
/*
* The minimum number of bits of entropy before we wake up a read on
* /dev/random. Should be enough to do a significant reseed.
@@ -835,11 +837,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
*/
static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
{
- union {
- __u32 tmp[OUTPUT_POOL_WORDS];
- long hwrand[4];
- } u;
- int i;
+ __u32 tmp[OUTPUT_POOL_WORDS];

if (r->pull && r->entropy_count < nbytes * 8 &&
r->entropy_count < r->poolinfo->POOLBITS) {
@@ -850,23 +848,17 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)
/* pull at least as many as BYTES as wakeup BITS */
bytes = max_t(int, bytes, random_read_wakeup_thresh / 8);
/* but never more than the buffer size */
- bytes = min_t(int, bytes, sizeof(u.tmp));
+ bytes = min_t(int, bytes, sizeof(tmp));

DEBUG_ENT("going to reseed %s with %d bits "
"(%d of %d requested)\n",
r->name, bytes * 8, nbytes * 8, r->entropy_count);

- bytes = extract_entropy(r->pull, u.tmp, bytes,
+ bytes = extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
- mix_pool_bytes(r, u.tmp, bytes, NULL);
+ mix_pool_bytes(r, tmp, bytes, NULL);
credit_entropy_bits(r, bytes*8);
}
- kmemcheck_mark_initialized(&u.hwrand, sizeof(u.hwrand));
- for (i = 0; i < 4; i++)
- if (arch_get_random_long(&u.hwrand[i]))
- break;
- if (i)
- mix_pool_bytes(r, &u.hwrand, sizeof(u.hwrand), 0);
}

/*
@@ -923,15 +915,19 @@ static size_t account(struct entropy_store *r, size_t nbytes, int min,
static void extract_buf(struct entropy_store *r, __u8 *out)
{
int i;
- __u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
+ union {
+ __u32 w[5];
+ unsigned long l[LONGS(EXTRACT_SIZE)];
+ } hash;
+ __u32 workspace[SHA_WORKSPACE_WORDS];
__u8 extract[64];
unsigned long flags;

/* Generate a hash across the pool, 16 words (512 bits) at a time */
- sha_init(hash);
+ sha_init(hash.w);
spin_lock_irqsave(&r->lock, flags);
for (i = 0; i < r->poolinfo->poolwords; i += 16)
- sha_transform(hash, (__u8 *)(r->pool + i), workspace);
+ sha_transform(hash.w, (__u8 *)(r->pool + i), workspace);

/*
* We mix the hash back into the pool to prevent backtracking
@@ -942,14 +938,14 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* brute-forcing the feedback as hard as brute-forcing the
* hash.
*/
- __mix_pool_bytes(r, hash, sizeof(hash), extract);
+ __mix_pool_bytes(r, hash.w, sizeof(hash.w), extract);
spin_unlock_irqrestore(&r->lock, flags);

/*
* To avoid duplicates, we atomically extract a portion of the
* pool while mixing, and hash one final time.
*/
- sha_transform(hash, extract, workspace);
+ sha_transform(hash.w, extract, workspace);
memset(extract, 0, sizeof(extract));
memset(workspace, 0, sizeof(workspace));

@@ -958,11 +954,23 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* pattern, we fold it in half. Thus, we always feed back
* twice as much data as we output.
*/
- hash[0] ^= hash[3];
- hash[1] ^= hash[4];
- hash[2] ^= rol32(hash[2], 16);
- memcpy(out, hash, EXTRACT_SIZE);
- memset(hash, 0, sizeof(hash));
+ hash.w[0] ^= hash.w[3];
+ hash.w[1] ^= hash.w[4];
+ hash.w[2] ^= rol32(hash.w[2], 16);
+
+ /*
+ * If we have a architectural hardware random number
+ * generator, mix that in, too.
+ */
+ for (i = 0; i < LONGS(EXTRACT_SIZE); i++) {
+ unsigned long v;
+ if (!arch_get_random_long(&v))
+ break;
+ hash.l[i] ^= v;
+ }
+
+ memcpy(out, &hash, EXTRACT_SIZE);
+ memset(&hash, 0, sizeof(hash));
}

static ssize_t extract_entropy(struct entropy_store *r, void *buf,

Willy Tarreau

unread,

Oct 1, 2012, 7:05:46 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, John Stultz, Ingo Molnar, Peter Zijlstra, Richard Cochran, Prarit Bhargava, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john...@us.ibm.com>

commit 6b1859dba01c7d512b72d77e3fd7da8354235189 upstream.

In commit 6b43ae8a619d17c4935c3320d2ef9e92bdeed05d, I
introduced a bug that kept the STA_INS or STA_DEL bit
from being cleared from time_status via adjtimex()
without forcing STA_PLL first.

Usually once the STA_INS is set, it isn't cleared
until the leap second is applied, so its unlikely this
affected anyone. However during testing I noticed it
took some effort to cancel a leap second once STA_INS
was set.

Signed-off-by: John Stultz <john...@us.ibm.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Richard Cochran <richard...@gmail.com>
Cc: Prarit Bhargava <pra...@redhat.com>
Link: http://lkml.kernel.org/r/1342156917-25092-2-git...@linaro.org
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/ntp.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 26472a7..264928c 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -205,7 +205,9 @@ int second_overflow(unsigned long secs)
time_state = TIME_DEL;
break;
case TIME_INS:
- if (secs % 86400 == 0) {
+ if (!(time_status & STA_INS))
+ time_state = TIME_OK;
+ else if (secs % 86400 == 0) {
leap = -1;
time_state = TIME_OOP;
time_tai++;
@@ -214,7 +216,9 @@ int second_overflow(unsigned long secs)
}
break;
case TIME_DEL:
- if ((secs + 1) % 86400 == 0) {
+ if (!(time_status & STA_DEL))
+ time_state = TIME_OK;
+ else if ((secs + 1) % 86400 == 0) {
leap = 1;
time_tai--;
time_state = TIME_WAIT;

Willy Tarreau

unread,

Oct 1, 2012, 7:06:05 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Zhouping Liu, Ingo Molnar, Prarit Bhargava, Thomas Gleixner, John Stultz, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john....@linaro.org>

This is a -stable backport of cee58483cf56e0ba355fdd97ff5e8925329aa936

Andreas Bombe reported that the added ktime_t overflow checking added to
timespec_valid in commit 4e8b14526ca7 ("time: Improve sanity checking of
timekeeping inputs") was causing problems with X.org because it caused
timeouts larger then KTIME_T to be invalid.

Previously, these large timeouts would be clamped to KTIME_MAX and would
never expire, which is valid.

This patch splits the ktime_t overflow checking into a new
timespec_valid_strict function, and converts the timekeeping codes
internal checking to use this more strict function.

Reported-and-tested-by: Andreas Bombe <a...@debian.org>
Cc: Zhouping Liu <zl...@redhat.com>
Cc: Ingo Molnar <mi...@kernel.org>

Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>

Cc: sta...@vger.kernel.org
Signed-off-by: John Stultz <john....@linaro.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

include/linux/time.h | 7 +++++++
kernel/time/timekeeping.c | 6 +++---
2 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/time.h b/include/linux/time.h
index 146b6f3..bc93987 100644
--- a/include/linux/time.h
+++ b/include/linux/time.h
@@ -109,6 +109,13 @@ static inline bool timespec_valid(const struct timespec *ts)

/* Can't have more nanoseconds then a second */

if ((unsigned long)ts->tv_nsec >= NSEC_PER_SEC)

return false;
+ return true;
+}

+
+static inline bool timespec_valid_strict(const struct timespec *ts)
+{
+ if (!timespec_valid(ts))
+ return false;

/* Disallow values that could overflow ktime_t */

if ((unsigned long long)ts->tv_sec >= KTIME_SEC_MAX)

return false;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index b451c93..3d35af3 100644

--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -354,7 +354,7 @@ int do_settimeofday(struct timespec *tv)
struct timespec ts_delta;
unsigned long flags;

- if (!timespec_valid(tv))
+ if (!timespec_valid_strict(tv))
return -EINVAL;

write_seqlock_irqsave(&xtime_lock, flags);
@@ -570,7 +570,7 @@ void __init timekeeping_init(void)

struct timespec now, boot;

read_persistent_clock(&now);

- if (!timespec_valid(&now)) {
+ if (!timespec_valid_strict(&now)) {

printk("WARNING: Persistent clock returned invalid value!\n"

" Check your CMOS/BIOS settings.\n");

now.tv_sec = 0;
@@ -578,7 +578,7 @@ void __init timekeeping_init(void)
}

read_boot_clock(&boot);
- if (!timespec_valid(&boot)) {
+ if (!timespec_valid_strict(&boot)) {

printk("WARNING: Boot clock returned invalid value!\n"

" Check your CMOS/BIOS settings.\n");

boot.tv_sec = 0;

Willy Tarreau

unread,

Oct 1, 2012, 7:06:34 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, sta...@kernel.org, Alan Stern, Oliver Neukum, Ming Lei, Greg Kroah-Hartman, David S. Miller, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: tom.l...@gmail.com <tom.l...@gmail.com>

commit 5d5440a835710d09f0ef18da5000541ec98b537a upstream.

URB unlinking is always racing with its completion and tx_complete
may be called before or during running usb_unlink_urb, so tx_complete
must not clear urb->dev since it will be used in unlink path,
otherwise invalid memory accesses or usb device leak may be caused
inside usb_unlink_urb.

Cc: sta...@kernel.org

Cc: Alan Stern <st...@rowland.harvard.edu>
Cc: Oliver Neukum <oli...@neukum.org>

Signed-off-by: Ming Lei <tom.l...@gmail.com>
Acked-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/net/usb/usbnet.c | 1 -
1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/drivers/net/usb/usbnet.c b/drivers/net/usb/usbnet.c
index a92a415..07f69ee 100644
--- a/drivers/net/usb/usbnet.c
+++ b/drivers/net/usb/usbnet.c
@@ -998,7 +998,6 @@ static void tx_complete (struct urb *urb)
}
}

- urb->dev = NULL;
entry->state = tx_done;
defer_bh(dev, skb, &dev->txq);

Willy Tarreau

unread,

Oct 1, 2012, 7:06:38 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, David Miller, Linus Torvalds, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit 7bf2357524408b97fec58344caf7397f8140c3fd upstream.

Cc: David Miller <da...@davemloft.net>
Cc: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/core/dev.c | 3 +++
net/core/rtnetlink.c | 1 +
2 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 84a0705..46e2a29 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -1133,6 +1133,7 @@ int dev_open(struct net_device *dev)
/*
* ... and announce new interface.
*/
+ add_device_randomness(dev->dev_addr, dev->addr_len);
call_netdevice_notifiers(NETDEV_UP, dev);
}

@@ -4268,6 +4269,7 @@ int dev_set_mac_address(struct net_device *dev, struct sockaddr *sa)
err = ops->ndo_set_mac_address(dev, sa);
if (!err)
call_netdevice_notifiers(NETDEV_CHANGEADDR, dev);
+ add_device_randomness(dev->dev_addr, dev->addr_len);
return err;
}
EXPORT_SYMBOL(dev_set_mac_address);
@@ -4871,6 +4873,7 @@ int register_netdevice(struct net_device *dev)
dev_init_scheduler(dev);
dev_hold(dev);
list_netdevice(dev);
+ add_device_randomness(dev->dev_addr, dev->addr_len);

/* Notify protocols, that a new device appeared. */
ret = call_netdevice_notifiers(NETDEV_REGISTER, dev);
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index d4fd895..9d70042 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -817,6 +817,7 @@ static int do_setlink(struct net_device *dev, struct ifinfomsg *ifm,
goto errout;
send_addr_notify = 1;
modified = 1;
+ add_device_randomness(dev->dev_addr, dev->addr_len);
}

if (tb[IFLA_MTU]) {

Willy Tarreau

unread,

Oct 1, 2012, 7:06:48 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Philipp Hahn, Andrea Arcangeli, Rik van Riel, Andrew Morton, Peter Zijlstra, Linus Torvalds, sta...@kernel.org, Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Philipp Hahn <ha...@univention.de>

commit a79e53d85683c6dd9f99c90511028adc2043031f upstream.

On Wednesday 16 February 2011 15:49:47 Andrea Arcangeli wrote:
> Subject: fix pgd_lock deadlock
>
> From: Andrea Arcangeli <aarc...@redhat.com>
>
> It's forbidden to take the page_table_lock with the irq disabled or if
> there's contention the IPIs (for tlb flushes) sent with the page_table_lock
> held will never run leading to a deadlock.
>
> Apparently nobody takes the pgd_lock from irq so the _irqsave can be
> removed.
>
> Signed-off-by: Andrea Arcangeli <aarc...@redhat.com>

This patch (original commit Id for 2.6.38 a79e53d85683c6dd9f99c90511028adc2043031f)
needs to be back-ported to 2.6.32.x as well.
I observed a dead-lock problem when running a PAE enabled Debian 2.6.32.46+
kernel with 6 VCPUs as a KVM on (2.6.32, 3.2, 3.3) kernel, which showed the
following behaviour:

1 VCPU is stuck in
pgd_alloc() =E2=86=92 pgd_prepopulate_pmb() =E2=86=92... =E2=86=92 flush_tlb_others_ipi()
while (!cpumask_empty(to_cpumask(f->flush_cpumask)))
cpu_relax();
(gdb) print f->flush_cpumask
$5 = {1}

while all other VCPUs are stuck in
pgd_alloc() =E2=86=92 spin_lock_irqsave(pgd_lock)

I tracked it down to the commit
2.6.39-rc1: 4981d01eada5354d81c8929d5b2836829ba3df7b
2.6.32.34: ba456fd7ec1bdc31a4ad4a6bd02802dcaa730a33
x86: Flush TLB if PGD entry is changed in i386 PAE mode
which when reverted made the bug disappear.

Comparing 3.2 to 2.6.32.34 showed that the 'pgd-deadlock'-patch went into
2.6.38, that is before the 'PAE correctness'-patch, so the problem was
probably never observed in the main development branch.
But for 2.6.32 the 'pgd-deadlock' patch is still missing, so the 'PAE
corretness'-patch made the problem worse with 2.6.32.

The Patch was also back-ported to the OpenSUSE Kernel
<http://kernel.opensuse.org/cgit/kernel-source/commit/?id=ac27c01aa880c65d17043ab87249c613ac4c3635>,
Since the patch didn't apply cleanly on the current Debian kernel, I had to
backport it for us and Debian. The patch is also available from our (German)
Bugzilla <https://forge.univention.org/bugzilla/show_bug.cgi?id=26661> or
from the Debian BTS at <http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=669335>.

I have no easy test case, but running multiple parallel builds inside the VM
normally triggers the bug within seconds to minutes. With the patch applied
the VM survived a night building packages without any problem.

Signed-off-by: Philipp Hahn <ha...@univention.de>

Sincerely
Philipp
-
Philipp Hahn Open Source Software Engineer ha...@univention.de
Univention GmbH be open. fon: +49 421 22 232- 0
Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99
http://www.univention.de/

It's forbidden to take the page_table_lock with the irq disabled
or if there's contention the IPIs (for tlb flushes) sent with
the page_table_lock held will never run leading to a deadlock.

Nobody takes the pgd_lock from irq context so the _irqsave can be
removed.

Signed-off-by: Andrea Arcangeli <aarc...@redhat.com>
Acked-by: Rik van Riel <ri...@redhat.com>
Tested-by: Konrad Rzeszutek Wilk <konra...@oracle.com>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Cc: Peter Zijlstra <pet...@infradead.org>
Cc: Linus Torvalds <torv...@linux-foundation.org>
Cc: <sta...@kernel.org>
LKML-Reference: <201102162345....@imap1.linux-foundation.org>
Signed-off-by: Ingo Molnar <mi...@elte.hu>
Git-commit: a79e53d85683c6dd9f99c90511028adc2043031f

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/mm/fault.c | 10 ++++------
arch/x86/mm/pageattr.c | 18 ++++++++----------
arch/x86/mm/pgtable.c | 11 ++++-------
arch/x86/xen/mmu.c | 10 ++++------
4 files changed, 20 insertions(+), 29 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 8ac0d76..249ad57 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -223,15 +223,14 @@ void vmalloc_sync_all(void)
address >= TASK_SIZE && address < FIXADDR_TOP;
address += PMD_SIZE) {

- unsigned long flags;
struct page *page;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
list_for_each_entry(page, &pgd_list, lru) {
if (!vmalloc_sync_one(page_address(page), address))
break;
}
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}
}

@@ -331,13 +330,12 @@ void vmalloc_sync_all(void)
address += PGDIR_SIZE) {

const pgd_t *pgd_ref = pgd_offset_k(address);
- unsigned long flags;
struct page *page;

if (pgd_none(*pgd_ref))
continue;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
list_for_each_entry(page, &pgd_list, lru) {
pgd_t *pgd;
pgd = (pgd_t *)page_address(page) + pgd_index(address);
@@ -346,7 +344,7 @@ void vmalloc_sync_all(void)
else
BUG_ON(pgd_page_vaddr(*pgd) != pgd_page_vaddr(*pgd_ref));
}
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}
}

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index dd38bfb..6d44087 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -56,12 +56,10 @@ static unsigned long direct_pages_count[PG_LEVEL_NUM];

void update_page_count(int level, unsigned long pages)
{
- unsigned long flags;
-
/* Protect against CPA */
- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
direct_pages_count[level] += pages;
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}

static void split_page_count(int level)
@@ -354,7 +352,7 @@ static int
try_preserve_large_page(pte_t *kpte, unsigned long address,
struct cpa_data *cpa)
{
- unsigned long nextpage_addr, numpages, pmask, psize, flags, addr, pfn;
+ unsigned long nextpage_addr, numpages, pmask, psize, addr, pfn;
pte_t new_pte, old_pte, *tmp;
pgprot_t old_prot, new_prot;
int i, do_split = 1;
@@ -363,7 +361,7 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
if (cpa->force_split)
return 1;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
/*
* Check for races, another CPU might have split this page
* up already:
@@ -458,14 +456,14 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
}

out_unlock:
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);

return do_split;
}

static int split_large_page(pte_t *kpte, unsigned long address)
{
- unsigned long flags, pfn, pfninc = 1;
+ unsigned long pfn, pfninc = 1;
unsigned int i, level;
pte_t *pbase, *tmp;
pgprot_t ref_prot;
@@ -479,7 +477,7 @@ static int split_large_page(pte_t *kpte, unsigned long address)
if (!base)
return -ENOMEM;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
/*
* Check for races, another CPU might have split this page
* up for us already:
@@ -551,7 +549,7 @@ out_unlock:
*/
if (base)
__free_page(base);
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);

return 0;
}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index e0e6fad..cb7cfc8 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -110,14 +110,12 @@ static void pgd_ctor(pgd_t *pgd)

static void pgd_dtor(pgd_t *pgd)
{
- unsigned long flags; /* can be called from interrupt context */
-
if (SHARED_KERNEL_PMD)
return;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);
pgd_list_del(pgd);
- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}

/*
@@ -248,7 +246,6 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
{
pgd_t *pgd;
pmd_t *pmds[PREALLOCATED_PMDS];
- unsigned long flags;

pgd = (pgd_t *)__get_free_page(PGALLOC_GFP);

@@ -268,12 +265,12 @@ pgd_t *pgd_alloc(struct mm_struct *mm)
* respect to anything walking the pgd_list, so that they
* never see a partially populated pgd.
*/
- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);

pgd_ctor(pgd);
pgd_prepopulate_pmd(mm, pgd, pmds);

- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);

return pgd;

diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c
index 3f90a2c..8f4452c 100644
--- a/arch/x86/xen/mmu.c
+++ b/arch/x86/xen/mmu.c
@@ -987,10 +987,9 @@ static void xen_pgd_pin(struct mm_struct *mm)
*/
void xen_mm_pin_all(void)
{
- unsigned long flags;
struct page *page;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);

list_for_each_entry(page, &pgd_list, lru) {
if (!PagePinned(page)) {
@@ -999,7 +998,7 @@ void xen_mm_pin_all(void)
}
}

- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}

/*
@@ -1100,10 +1099,9 @@ static void xen_pgd_unpin(struct mm_struct *mm)
*/
void xen_mm_unpin_all(void)
{
- unsigned long flags;
struct page *page;

- spin_lock_irqsave(&pgd_lock, flags);
+ spin_lock(&pgd_lock);

list_for_each_entry(page, &pgd_list, lru) {
if (PageSavePinned(page)) {
@@ -1113,7 +1111,7 @@ void xen_mm_unpin_all(void)
}
}

- spin_unlock_irqrestore(&pgd_lock, flags);
+ spin_unlock(&pgd_lock);
}

void xen_activate_mm(struct mm_struct *prev, struct mm_struct *next)

Willy Tarreau

unread,

Oct 1, 2012, 7:07:46 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit 156bddd8e505b295540f3ca0e27dda68cb0d49aa upstream.

Code tracking when transaction needs to be committed on fdatasync(2) forgets
to handle a situation when only inode's i_size is changed. Thus in such
situations fdatasync(2) doesn't force transaction with new i_size to disk
and that can result in wrong i_size after a crash.

Fix the issue by updating inode's i_datasync_tid whenever its size is
updated.

Reported-by: Kristian Nielsen <knie...@knielsen-hq.org>

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext3/inode.c | 17 ++++++++++++++---
1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/fs/ext3/inode.c b/fs/ext3/inode.c
index f9d6937..3191a30 100644
--- a/fs/ext3/inode.c
+++ b/fs/ext3/inode.c
@@ -2948,6 +2948,8 @@ static int ext3_do_update_inode(handle_t *handle,
struct ext3_inode_info *ei = EXT3_I(inode);
struct buffer_head *bh = iloc->bh;
int err = 0, rc, block;
+ int need_datasync = 0;
+ __le32 disksize;

again:
/* we can't allow multiple procs in here at once, its a bit racey */
@@ -2985,7 +2987,11 @@ again:
raw_inode->i_gid_high = 0;
}
raw_inode->i_links_count = cpu_to_le16(inode->i_nlink);
- raw_inode->i_size = cpu_to_le32(ei->i_disksize);
+ disksize = cpu_to_le32(ei->i_disksize);
+ if (disksize != raw_inode->i_size) {
+ need_datasync = 1;
+ raw_inode->i_size = disksize;
+ }
raw_inode->i_atime = cpu_to_le32(inode->i_atime.tv_sec);
raw_inode->i_ctime = cpu_to_le32(inode->i_ctime.tv_sec);
raw_inode->i_mtime = cpu_to_le32(inode->i_mtime.tv_sec);
@@ -3001,8 +3007,11 @@ again:
if (!S_ISREG(inode->i_mode)) {
raw_inode->i_dir_acl = cpu_to_le32(ei->i_dir_acl);
} else {
- raw_inode->i_size_high =
- cpu_to_le32(ei->i_disksize >> 32);
+ disksize = cpu_to_le32(ei->i_disksize >> 32);
+ if (disksize != raw_inode->i_size_high) {
+ raw_inode->i_size_high = disksize;
+ need_datasync = 1;
+ }
if (ei->i_disksize > 0x7fffffffULL) {
struct super_block *sb = inode->i_sb;
if (!EXT3_HAS_RO_COMPAT_FEATURE(sb,
@@ -3055,6 +3064,8 @@ again:
ei->i_state &= ~EXT3_STATE_NEW;

atomic_set(&ei->i_sync_tid, handle->h_transaction->t_tid);
+ if (need_datasync)
+ atomic_set(&ei->i_datasync_tid, handle->h_transaction->t_tid);
out_brelse:
brelse (bh);
ext3_std_error(inode->i_sb, err);

Willy Tarreau

unread,

Oct 1, 2012, 7:07:55 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, H. Peter Anvin, Fenghua Yu, Matt Mackall, Herbert Xu, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <h...@zytor.com>

commit 63d77173266c1791f1553e9e8ccea65dc87c4485 upstream.

Add support for architecture-specific hooks into the kernel-directed
random number generator interfaces. This patchset does not use the
architecture random number generator interfaces for the
userspace-directed interfaces (/dev/random and /dev/urandom), thus
eliminating the need to distinguish between them based on a pool
pointer.

Changes in version 3:
- Moved the hooks from extract_entropy() to get_random_bytes().
- Changes the hooks to inlines.

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>

Cc: Fenghua Yu <fengh...@intel.com>

Cc: Matt Mackall <m...@selenic.com>
Cc: Herbert Xu <her...@gondor.apana.org.au>
Cc: "Theodore Ts'o" <ty...@mit.edu>

[PG: .34 already had "unsigned int ret" in get_random_int, so the
diffstat here is slightly smaller than that of 63d7717. ]
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 23 +++++++++++++++++++++--
include/linux/random.h | 13 +++++++++++++
2 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 0adb8c8..93b3c29 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -932,7 +932,21 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,
*/
void get_random_bytes(void *buf, int nbytes)
{
- extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+ char *p = buf;
+
+ while (nbytes) {
+ unsigned long v;
+ int chunk = min(nbytes, (int)sizeof(unsigned long));
+

+ if (!arch_get_random_long(&v))
+ break;
+

+ memcpy(buf, &v, chunk);
+ p += chunk;
+ nbytes -= chunk;
+ }
+
+ extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
}
EXPORT_SYMBOL(get_random_bytes);

@@ -1360,9 +1374,14 @@ late_initcall(random_int_secret_init);
DEFINE_PER_CPU(__u32 [MD5_DIGEST_WORDS], get_random_int_hash);
unsigned int get_random_int(void)
{
- __u32 *hash = get_cpu_var(get_random_int_hash);
+ __u32 *hash;
unsigned int ret;

+ if (arch_get_random_int(&ret))
+ return ret;
+
+ hash = get_cpu_var(get_random_int_hash);
+
hash[0] += current->pid + jiffies + get_cycles();
md5_transform(hash, random_int_secret);
ret = hash[0];
diff --git a/include/linux/random.h b/include/linux/random.h
index 2948046..0bf2936 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -63,6 +63,19 @@ unsigned long randomize_range(unsigned long start, unsigned long end, unsigned l
u32 random32(void);
void srandom32(u32 seed);

+#ifdef CONFIG_ARCH_RANDOM
+# include <asm/archrandom.h>
+#else
+static inline int arch_get_random_long(unsigned long *v)
+{
+ return 0;
+}
+static inline int arch_get_random_int(unsigned int *v)
+{
+ return 0;
+}
+#endif
+
#endif /* __KERNEL___ */

#endif /* _LINUX_RANDOM_H */

Willy Tarreau

unread,

Oct 1, 2012, 7:08:05 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Borislav Petkov, Yinghai Lu, sta...@kernel.org, Bjorn Helgaas, Jesse Barnes, Jiri Slaby, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Bjorn Helgaas <bhel...@google.com>

commit 24d25dbfa63c376323096660bfa9ad45a08870ce upstream.

This factors out the AMD native MMCONFIG discovery so we can use it
outside amd_bus.c.

amd_bus.c reads AMD MSRs so it can remove the MMCONFIG area from the
PCI resources. We may also need the MMCONFIG information to work
around BIOS defects in the ACPI MCFG table.

Cc: Borislav Petkov <borisla...@amd.com>
Cc: Yinghai Lu <yin...@kernel.org>
Cc: sta...@kernel.org
Signed-off-by: Bjorn Helgaas <bhel...@google.com>
Signed-off-by: Jesse Barnes <jba...@virtuousgeek.org>
[WT: this patch was initially not planned for 2.6.32 but is required by commit
2215d910 merged into 2.6.32.55 and which relies on amd_get_mmconfig_range() ]
Cc: Jiri Slaby <jsl...@suse.cz>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/include/asm/k8.h | 2 ++
arch/x86/kernel/k8.c | 31 +++++++++++++++++++++++++++++++
arch/x86/pci/amd_bus.c | 42 +++++++++++-------------------------------
3 files changed, 44 insertions(+), 31 deletions(-)

diff --git a/arch/x86/include/asm/k8.h b/arch/x86/include/asm/k8.h
index f0746f4..41845d2 100644
--- a/arch/x86/include/asm/k8.h
+++ b/arch/x86/include/asm/k8.h
@@ -1,11 +1,13 @@
#ifndef _ASM_X86_K8_H
#define _ASM_X86_K8_H

+#include <linux/ioport.h>
#include <linux/pci.h>

extern struct pci_device_id k8_nb_ids[];

extern int early_is_k8_nb(u32 value);
+extern struct resource *amd_get_mmconfig_range(struct resource *res);
extern struct pci_dev **k8_northbridges;
extern int num_k8_northbridges;
extern int cache_k8_northbridges(void);
diff --git a/arch/x86/kernel/k8.c b/arch/x86/kernel/k8.c
index 9b89546..2831a32 100644
--- a/arch/x86/kernel/k8.c
+++ b/arch/x86/kernel/k8.c
@@ -87,6 +87,37 @@ int __init early_is_k8_nb(u32 device)
return 0;
}

+struct resource *amd_get_mmconfig_range(struct resource *res)
+{
+ u32 address;
+ u64 base, msr;
+ unsigned segn_busn_bits;
+
+ if (boot_cpu_data.x86_vendor != X86_VENDOR_AMD)
+ return NULL;
+
+ /* assume all cpus from fam10h have mmconfig */
+ if (boot_cpu_data.x86 < 0x10)
+ return NULL;
+
+ address = MSR_FAM10H_MMIO_CONF_BASE;
+ rdmsrl(address, msr);
+
+ /* mmconfig is not enabled */
+ if (!(msr & FAM10H_MMIO_CONF_ENABLE))
+ return NULL;
+
+ base = msr & (FAM10H_MMIO_CONF_BASE_MASK<<FAM10H_MMIO_CONF_BASE_SHIFT);
+
+ segn_busn_bits = (msr >> FAM10H_MMIO_CONF_BUSRANGE_SHIFT) &
+ FAM10H_MMIO_CONF_BUSRANGE_MASK;
+
+ res->flags = IORESOURCE_MEM;
+ res->start = base;
+ res->end = base + (1ULL<<(segn_busn_bits + 20)) - 1;
+ return res;
+}
+
void k8_flush_garts(void)
{
int flushed, i;
diff --git a/arch/x86/pci/amd_bus.c b/arch/x86/pci/amd_bus.c
index 572ee97..cb34763 100644
--- a/arch/x86/pci/amd_bus.c
+++ b/arch/x86/pci/amd_bus.c
@@ -190,34 +190,6 @@ static struct pci_hostbridge_probe pci_probes[] __initdata = {
{ 0, 0x18, PCI_VENDOR_ID_AMD, 0x1300 },
};

-static u64 __initdata fam10h_mmconf_start;
-static u64 __initdata fam10h_mmconf_end;
-static void __init get_pci_mmcfg_amd_fam10h_range(void)
-{
- u32 address;
- u64 base, msr;
- unsigned segn_busn_bits;
-
- /* assume all cpus from fam10h have mmconf */
- if (boot_cpu_data.x86 < 0x10)
- return;
-
- address = MSR_FAM10H_MMIO_CONF_BASE;
- rdmsrl(address, msr);
-
- /* mmconfig is not enable */
- if (!(msr & FAM10H_MMIO_CONF_ENABLE))
- return;
-
- base = msr & (FAM10H_MMIO_CONF_BASE_MASK<<FAM10H_MMIO_CONF_BASE_SHIFT);
-
- segn_busn_bits = (msr >> FAM10H_MMIO_CONF_BUSRANGE_SHIFT) &
- FAM10H_MMIO_CONF_BUSRANGE_MASK;
-
- fam10h_mmconf_start = base;
- fam10h_mmconf_end = base + (1ULL<<(segn_busn_bits + 20)) - 1;
-}
-
/**
* early_fill_mp_bus_to_node()
* called before pcibios_scan_root and pci_scan_bus
@@ -243,6 +215,9 @@ static int __init early_fill_mp_bus_info(void)
struct res_range range[RANGE_NUM];
u64 val;
u32 address;
+ struct resource fam10h_mmconf_res, *fam10h_mmconf;
+ u64 fam10h_mmconf_start;
+ u64 fam10h_mmconf_end;

if (!early_pci_allowed())
return -1;
@@ -367,11 +342,16 @@ static int __init early_fill_mp_bus_info(void)
update_range(range, 0, end - 1);

/* get mmconfig */
- get_pci_mmcfg_amd_fam10h_range();
+ fam10h_mmconf = amd_get_mmconfig_range(&fam10h_mmconf_res);
/* need to take out mmconf range */
- if (fam10h_mmconf_end) {
- printk(KERN_DEBUG "Fam 10h mmconf [%llx, %llx]\n", fam10h_mmconf_start, fam10h_mmconf_end);
+ if (fam10h_mmconf) {
+ printk(KERN_DEBUG "Fam 10h mmconf %pR\n", fam10h_mmconf);
+ fam10h_mmconf_start = fam10h_mmconf->start;
+ fam10h_mmconf_end = fam10h_mmconf->end;
update_range(range, fam10h_mmconf_start, fam10h_mmconf_end);
+ } else {
+ fam10h_mmconf_start = 0;
+ fam10h_mmconf_end = 0;
}

/* mmio resource */

Willy Tarreau

unread,

Oct 1, 2012, 7:08:28 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, John Stultz, Prarit Bhargava, Ingo Molnar, Thomas Gleixner, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john....@linaro.org>

This is a -stable backport of bf2ac312195155511a0f79325515cbb61929898a

If update_wall_time() is called and the current offset isn't large
enough to accumulate, avoid re-calling timekeeping_adjust which may
change the clock freq and can cause 1ns inconsistencies with
CLOCK_REALTIME_COARSE/CLOCK_MONOTONIC_COARSE.

Signed-off-by: John Stultz <john....@linaro.org>
Cc: Prarit Bhargava <pra...@redhat.com>

Cc: Ingo Molnar <mi...@kernel.org>
Cc: sta...@vger.kernel.org

Link: http://lkml.kernel.org/r/1345595449-34965-5-git...@linaro.org
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john....@linaro.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/timekeeping.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 85d51c4..b451c93 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -807,6 +807,10 @@ void update_wall_time(void)
#else
offset = timekeeper.cycle_interval;
#endif
+ /* Check if there's really nothing to do */
+ if (offset < timekeeper.cycle_interval)
+ return;
+
timekeeper.xtime_nsec = (s64)xtime.tv_nsec << timekeeper.shift;

/* normally this loop will run just once, however in the

Willy Tarreau

unread,

Oct 1, 2012, 7:08:29 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Eric Sandeen, Theodore Tso, Stefan Bader, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Sandeen <san...@redhat.com>

commit 15291164b22a357cb211b618adfef4fa82fc0de3 upstream.

journal_unmap_buffer()'s zap_buffer: code clears a lot of buffer head
state ala discard_buffer(), but does not touch _Delay or _Unwritten as
discard_buffer() does.

This can be problematic in some areas of the ext4 code which assume
that if they have found a buffer marked unwritten or delay, then it's
a live one. Perhaps those spots should check whether it is mapped
as well, but if jbd2 is going to tear down a buffer, let's really
tear it down completely.

Without this I get some fsx failures on sub-page-block filesystems
up until v3.2, at which point 4e96b2dbbf1d7e81f22047a50f862555a6cb87cb
and 189e868fa8fdca702eb9db9d8afc46b5cb9144c9 make the failures go
away, because buried within that large change is some more flag
clearing. I still think it's worth doing in jbd2, since
->invalidatepage leads here directly, and it's the right place
to clear away these flags.

Signed-off-by: Eric Sandeen <san...@redhat.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Cc: sta...@vger.kernel.org

BugLink: http://bugs.launchpad.net/bugs/929781
CVE-2011-4086

Signed-off-by: Stefan Bader <stefan...@canonical.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/jbd2/transaction.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index a051270..5c156ad 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -1822,6 +1822,8 @@ zap_buffer_unlocked:
clear_buffer_mapped(bh);
clear_buffer_req(bh);
clear_buffer_new(bh);
+ clear_buffer_delay(bh);
+ clear_buffer_unwritten(bh);
bh->b_bdev = NULL;
return may_free;

Willy Tarreau

unread,

Oct 1, 2012, 7:08:59 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit 9c2fc0de1a6e638fe58c354a463f544f42a90a09 upstream.

When a file is stored in ICB (inode), we overwrite part of the file, and
the page containing file's data is not in page cache, we end up corrupting
file's data by overwriting them with zeros. The problem is we use
simple_write_begin() which simply zeroes parts of the page which are not
written to. The problem has been introduced by be021ee4 (udf: convert to
new aops).

Fix the problem by providing a ->write_begin function which makes the page
properly uptodate.

Reported-by: Ian Abbott <abb...@mev.co.uk>

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/udf/file.c | 35 +++++++++++++++++++++++++++++------
1 files changed, 29 insertions(+), 6 deletions(-)

diff --git a/fs/udf/file.c b/fs/udf/file.c
index b80cbd7..78bdef3 100644
--- a/fs/udf/file.c
+++ b/fs/udf/file.c
@@ -40,20 +40,24 @@
#include "udf_i.h"
#include "udf_sb.h"

-static int udf_adinicb_readpage(struct file *file, struct page *page)
+static void __udf_adinicb_readpage(struct page *page)
{
struct inode *inode = page->mapping->host;
char *kaddr;
struct udf_inode_info *iinfo = UDF_I(inode);

- BUG_ON(!PageLocked(page));
-
kaddr = kmap(page);
- memset(kaddr, 0, PAGE_CACHE_SIZE);
memcpy(kaddr, iinfo->i_ext.i_data + iinfo->i_lenEAttr, inode->i_size);
+ memset(kaddr + inode->i_size, 0, PAGE_CACHE_SIZE - inode->i_size);
flush_dcache_page(page);
SetPageUptodate(page);
kunmap(page);
+}
+
+static int udf_adinicb_readpage(struct file *file, struct page *page)
+{
+ BUG_ON(!PageLocked(page));
+ __udf_adinicb_readpage(page);
unlock_page(page);

return 0;
@@ -78,6 +82,25 @@ static int udf_adinicb_writepage(struct page *page,
return 0;
}

+static int udf_adinicb_write_begin(struct file *file,
+ struct address_space *mapping, loff_t pos,
+ unsigned len, unsigned flags, struct page **pagep,
+ void **fsdata)
+{
+ struct page *page;
+
+ if (WARN_ON_ONCE(pos >= PAGE_CACHE_SIZE))
+ return -EIO;
+ page = grab_cache_page_write_begin(mapping, 0, flags);
+ if (!page)
+ return -ENOMEM;
+ *pagep = page;
+
+ if (!PageUptodate(page) && len != PAGE_CACHE_SIZE)
+ __udf_adinicb_readpage(page);
+ return 0;
+}
+
static int udf_adinicb_write_end(struct file *file,
struct address_space *mapping,
loff_t pos, unsigned len, unsigned copied,
@@ -100,8 +123,8 @@ const struct address_space_operations udf_adinicb_aops = {
.readpage = udf_adinicb_readpage,
.writepage = udf_adinicb_writepage,
.sync_page = block_sync_page,
- .write_begin = simple_write_begin,
- .write_end = udf_adinicb_write_end,
+ .write_begin = udf_adinicb_write_begin,
+ .write_end = udf_adinicb_write_end,
};

static ssize_t udf_file_aio_write(struct kiocb *iocb, const struct iovec *iov,

Willy Tarreau

unread,

Oct 1, 2012, 7:09:10 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Linux PM list, John Stultz, Ingo Molnar, Peter Zijlstra, Prarit Bhargava, Thomas Gleixner, Linus Torvalds, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tg...@linutronix.de>

This is a backport of 3e997130bd2e8c6f5aaa49d6e3161d4d29b43ab0

The leap second rework unearthed another issue of inconsistent data.

On timekeeping_resume() the timekeeper data is updated, but nothing
calls timekeeping_update(), so now the update code in the timer
interrupt sees stale values.

This has been the case before those changes, but then the timer
interrupt was using stale data as well so this went unnoticed for quite
some time.

Add the missing update call, so all the data is consistent everywhere.

Reported-by: Andreas Schwab <sch...@linux-m68k.org>
Reported-and-tested-by: "Rafael J. Wysocki" <r...@sisk.pl>
Reported-and-tested-by: Martin Steigerwald <Mar...@lichtvoll.de>
Cc: LKML <linux-...@vger.kernel.org>
Cc: Linux PM list <linu...@vger.kernel.org>
Cc: John Stultz <john...@us.ibm.com>
Cc: Ingo Molnar <mi...@kernel.org>
Cc: Peter Zijlstra <a.p.zi...@chello.nl>,
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: sta...@vger.kernel.org
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>
Signed-off-by: John Stultz <john...@us.ibm.com>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john...@us.ibm.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/timekeeping.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6054b94..3f7e53f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,6 +637,7 @@ static int timekeeping_resume(struct sys_device *dev)
timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock);
timekeeper.ntp_error = 0;
timekeeping_suspended = 0;
+ timekeeping_update(false);
write_sequnlock_irqrestore(&xtime_lock, flags);

touch_softlockup_watchdog();

Willy Tarreau

unread,

Oct 1, 2012, 7:09:33 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Neil Horman, da...@redhat.com, David S. Miller, Vlad Yasevich, Sridhar Samudrala, linux...@vger.kernel.org, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Neil Horman <nho...@tuxdriver.com>

[ Upstream commit 2eebc1e188e9e45886ee00662519849339884d6d ]

A few days ago Dave Jones reported this oops:

[22766.294255] general protection fault: 0000 [#1] PREEMPT SMP
[22766.295376] CPU 0
[22766.295384] Modules linked in:
[22766.387137] ffffffffa169f292 6b6b6b6b6b6b6b6b ffff880147c03a90
ffff880147c03a74
[22766.387135] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 00000000000
[22766.387136] Process trinity-watchdo (pid: 10896, threadinfo ffff88013e7d2000,
[22766.387137] Stack:
[22766.387140] ffff880147c03a10
[22766.387140] ffffffffa169f2b6
[22766.387140] ffff88013ed95728
[22766.387143] 0000000000000002
[22766.387143] 0000000000000000
[22766.387143] ffff880003fad062
[22766.387144] ffff88013c120000
[22766.387144]
[22766.387145] Call Trace:
[22766.387145] <IRQ>
[22766.387150] [<ffffffffa169f292>] ? __sctp_lookup_association+0x62/0xd0
[sctp]
[22766.387154] [<ffffffffa169f2b6>] __sctp_lookup_association+0x86/0xd0 [sctp]
[22766.387157] [<ffffffffa169f597>] sctp_rcv+0x207/0xbb0 [sctp]
[22766.387161] [<ffffffff810d4da8>] ? trace_hardirqs_off_caller+0x28/0xd0
[22766.387163] [<ffffffff815827e3>] ? nf_hook_slow+0x133/0x210
[22766.387166] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387168] [<ffffffff8159043d>] ip_local_deliver_finish+0x18d/0x4c0
[22766.387169] [<ffffffff815902fc>] ? ip_local_deliver_finish+0x4c/0x4c0
[22766.387171] [<ffffffff81590a07>] ip_local_deliver+0x47/0x80
[22766.387172] [<ffffffff8158fd80>] ip_rcv_finish+0x150/0x680
[22766.387174] [<ffffffff81590c54>] ip_rcv+0x214/0x320
[22766.387176] [<ffffffff81558c07>] __netif_receive_skb+0x7b7/0x910
[22766.387178] [<ffffffff8155856c>] ? __netif_receive_skb+0x11c/0x910
[22766.387180] [<ffffffff810d423e>] ? put_lock_stats.isra.25+0xe/0x40
[22766.387182] [<ffffffff81558f83>] netif_receive_skb+0x23/0x1f0
[22766.387183] [<ffffffff815596a9>] ? dev_gro_receive+0x139/0x440
[22766.387185] [<ffffffff81559280>] napi_skb_finish+0x70/0xa0
[22766.387187] [<ffffffff81559cb5>] napi_gro_receive+0xf5/0x130
[22766.387218] [<ffffffffa01c4679>] e1000_receive_skb+0x59/0x70 [e1000e]
[22766.387242] [<ffffffffa01c5aab>] e1000_clean_rx_irq+0x28b/0x460 [e1000e]
[22766.387266] [<ffffffffa01c9c18>] e1000e_poll+0x78/0x430 [e1000e]
[22766.387268] [<ffffffff81559fea>] net_rx_action+0x1aa/0x3d0
[22766.387270] [<ffffffff810a495f>] ? account_system_vtime+0x10f/0x130
[22766.387273] [<ffffffff810734d0>] __do_softirq+0xe0/0x420
[22766.387275] [<ffffffff8169826c>] call_softirq+0x1c/0x30
[22766.387278] [<ffffffff8101db15>] do_softirq+0xd5/0x110
[22766.387279] [<ffffffff81073bc5>] irq_exit+0xd5/0xe0
[22766.387281] [<ffffffff81698b03>] do_IRQ+0x63/0xd0
[22766.387283] [<ffffffff8168ee2f>] common_interrupt+0x6f/0x6f
[22766.387283] <EOI>
[22766.387284]
[22766.387285] [<ffffffff8168eed9>] ? retint_swapgs+0x13/0x1b
[22766.387285] Code: c0 90 5d c3 66 0f 1f 44 00 00 4c 89 c8 5d c3 0f 1f 00 55 48
89 e5 48 83
ec 20 48 89 5d e8 4c 89 65 f0 4c 89 6d f8 66 66 66 66 90 <0f> b7 87 98 00 00 00
48 89 fb
49 89 f5 66 c1 c0 08 66 39 46 02
[22766.387307]
[22766.387307] RIP
[22766.387311] [<ffffffffa168a2c9>] sctp_assoc_is_match+0x19/0x90 [sctp]
[22766.387311] RSP <ffff880147c039b0>
[22766.387142] ffffffffa16ab120
[22766.599537] ---[ end trace 3f6dae82e37b17f5 ]---
[22766.601221] Kernel panic - not syncing: Fatal exception in interrupt

It appears from his analysis and some staring at the code that this is likely
occuring because an association is getting freed while still on the
sctp_assoc_hashtable. As a result, we get a gpf when traversing the hashtable
while a freed node corrupts part of the list.

Nominally I would think that an mibalanced refcount was responsible for this,
but I can't seem to find any obvious imbalance. What I did note however was
that the two places where we create an association using
sctp_primitive_ASSOCIATE (__sctp_connect and sctp_sendmsg), have failure paths
which free a newly created association after calling sctp_primitive_ASSOCIATE.
sctp_primitive_ASSOCIATE brings us into the sctp_sf_do_prm_asoc path, which
issues a SCTP_CMD_NEW_ASOC side effect, which in turn adds a new association to
the aforementioned hash table. the sctp command interpreter that process side
effects has not way to unwind previously processed commands, so freeing the
association from the __sctp_connect or sctp_sendmsg error path would lead to a
freed association remaining on this hash table.

I've fixed this but modifying sctp_[un]hash_established to use hlist_del_init,
which allows us to proerly use hlist_unhashed to check if the node is on a
hashlist safely during a delete. That in turn alows us to safely call
sctp_unhash_established in the __sctp_connect and sctp_sendmsg error paths
before freeing them, regardles of what the associations state is on the hash
list.

I noted, while I was doing this, that the __sctp_unhash_endpoint was using
hlist_unhsashed in a simmilar fashion, but never nullified any removed nodes
pointers to make that function work properly, so I fixed that up in a simmilar
fashion.

I attempted to test this using a virtual guest running the SCTP_RR test from
netperf in a loop while running the trinity fuzzer, both in a loop. I wasn't
able to recreate the problem prior to this fix, nor was I able to trigger the
failure after (neither of which I suppose is suprising). Given the trace above
however, I think its likely that this is what we hit.

Signed-off-by: Neil Horman <nho...@tuxdriver.com>
Reported-by: da...@redhat.com
CC: da...@redhat.com
CC: "David S. Miller" <da...@davemloft.net>
CC: Vlad Yasevich <vyas...@gmail.com>
CC: Sridhar Samudrala <s...@us.ibm.com>
CC: linux...@vger.kernel.org

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sctp/input.c | 7 ++-----
net/sctp/socket.c | 12 ++++++++++--
2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/net/sctp/input.c b/net/sctp/input.c
index 254afea..e8e73f1 100644
--- a/net/sctp/input.c
+++ b/net/sctp/input.c
@@ -739,15 +739,12 @@ static void __sctp_unhash_endpoint(struct sctp_endpoint *ep)

epb = &ep->base;

- if (hlist_unhashed(&epb->node))
- return;
-
epb->hashent = sctp_ep_hashfn(epb->bind_addr.port);

head = &sctp_ep_hashtable[epb->hashent];

sctp_write_lock(&head->lock);
- __hlist_del(&epb->node);
+ hlist_del_init(&epb->node);
sctp_write_unlock(&head->lock);
}

@@ -828,7 +825,7 @@ static void __sctp_unhash_established(struct sctp_association *asoc)
head = &sctp_assoc_hashtable[epb->hashent];

sctp_write_lock(&head->lock);
- __hlist_del(&epb->node);
+ hlist_del_init(&epb->node);
sctp_write_unlock(&head->lock);
}

diff --git a/net/sctp/socket.c b/net/sctp/socket.c
index 3a95fcb..1f9843e 100644
--- a/net/sctp/socket.c
+++ b/net/sctp/socket.c
@@ -1142,8 +1142,14 @@ out_free:
SCTP_DEBUG_PRINTK("About to exit __sctp_connect() free asoc: %p"
" kaddrs: %p err: %d\n",
asoc, kaddrs, err);
- if (asoc)
+ if (asoc) {
+ /* sctp_primitive_ASSOCIATE may have added this association
+ * To the hash table, try to unhash it, just in case, its a noop
+ * if it wasn't hashed so we're safe
+ */
+ sctp_unhash_established(asoc);
sctp_association_free(asoc);
+ }
return err;
}

@@ -1851,8 +1857,10 @@ SCTP_STATIC int sctp_sendmsg(struct kiocb *iocb, struct sock *sk,
goto out_unlock;

out_free:
- if (new_asoc)
+ if (new_asoc) {
+ sctp_unhash_established(asoc);
sctp_association_free(asoc);
+ }
out_unlock:
sctp_release_sock(sk);

Willy Tarreau

unread,

Oct 1, 2012, 7:09:46 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Darren Hart, Dave Jones, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvh...@linux.intel.com>

commit f27071cb7fe3e1d37a9dbe6c0dfc5395cd40fa43 upstream.

The WARN_ON in futex_wait_requeue_pi() for a NULL q.pi_state was testing
the address (&q.pi_state) of the pointer instead of the value
(q.pi_state) of the pointer. Correct it accordingly.

Signed-off-by: Darren Hart <dvh...@linux.intel.com>
Cc: Dave Jones <da...@redhat.com>
Link: http://lkml.kernel.org/r/1c85d97f6e5f79ec389a4ead3e36736...@linux.intel.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/futex.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 6f745b6..7fd3dac 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2318,7 +2318,7 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
* signal. futex_unlock_pi() will not destroy the lock_ptr nor
* the pi_state.
*/
- WARN_ON(!&q.pi_state);
+ WARN_ON(!q.pi_state);
pi_mutex = &q.pi_state->pi_mutex;
ret = rt_mutex_finish_proxy_lock(pi_mutex, to, &rt_waiter, 1);
debug_rt_mutex_free_waiter(&rt_waiter);

Willy Tarreau

unread,

Oct 1, 2012, 7:09:52 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Linus Torvalds, Greg KH, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit b04b3156a20d395a7faa8eed98698d1e17a36000 upstream.

Send the USB device's serial, product, and manufacturer strings to the
/dev/random driver to help seed its pools.

Cc: Linus Torvalds <torv...@linux-foundation.org>
Acked-by: Greg KH <gr...@kroah.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/core/hub.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 069de19..02aad50 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -23,6 +23,7 @@
#include <linux/mutex.h>
#include <linux/freezer.h>
#include <linux/usb/quirks.h>
+#include <linux/random.h>

#include <asm/uaccess.h>
#include <asm/byteorder.h>
@@ -1831,6 +1832,14 @@ int usb_new_device(struct usb_device *udev)
/* Tell the world! */
announce_device(udev);

+ if (udev->serial)
+ add_device_randomness(udev->serial, strlen(udev->serial));
+ if (udev->product)
+ add_device_randomness(udev->product, strlen(udev->product));
+ if (udev->manufacturer)
+ add_device_randomness(udev->manufacturer,
+ strlen(udev->manufacturer));
+
/* Register the device. The device driver is responsible
* for configuring the device and invoking the add-device
* notifier chain (used by usbfs and possibly others).

Willy Tarreau

unread,

Oct 1, 2012, 7:09:52 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Dave Jones, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Dave Jones <da...@redhat.com>

commit 80de7c3138ee9fd86a98696fd2cf7ad89b995d0a upstream.

Trivially triggerable, found by trinity:

kernel BUG at mm/mempolicy.c:2546!
Process trinity-child2 (pid: 23988, threadinfo ffff88010197e000, task ffff88007821a670)
Call Trace:
show_numa_map+0xd5/0x450
show_pid_numa_map+0x13/0x20
traverse+0xf2/0x230
seq_read+0x34b/0x3e0
vfs_read+0xac/0x180
sys_pread64+0xa2/0xc0
system_call_fastpath+0x1a/0x1f
RIP: mpol_to_str+0x156/0x360

Signed-off-by: Dave Jones <da...@redhat.com>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

mm/mempolicy.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 3c6e3e2..a6563fb 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -2259,7 +2259,7 @@ int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context)
break;

default:
- BUG();
+ return -EINVAL;
}

l = strlen(policy_types[mode]);

Willy Tarreau

unread,

Oct 1, 2012, 7:10:29 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Thomas Gleixner, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Thomas Gleixner <tg...@linutronix.de>

commit a7f4255f906f60f72e00aad2fb000939449ff32e upstream.

Commit f0fbf0abc093 ("x86: integrate delay functions") converted
delay_tsc() into a random delay generator for 64 bit. The reason is
that it merged the mostly identical versions of delay_32.c and
delay_64.c. Though the subtle difference of the result was:

static void delay_tsc(unsigned long loops)
{
- unsigned bclock, now;
+ unsigned long bclock, now;

Now the function uses rdtscl() which returns the lower 32bit of the
TSC. On 32bit that's not problematic as unsigned long is 32bit. On 64
bit this fails when the lower 32bit are close to wrap around when
bclock is read, because the following check

if ((now - bclock) >= loops)
break;

evaluated to true on 64bit for e.g. bclock = 0xffffffff and now = 0
because the unsigned long (now - bclock) of these values results in
0xffffffff00000001 which is definitely larger than the loops
value. That explains Tvortkos observation:

"Because I am seeing udelay(500) (_occasionally_) being short, and
that by delaying for some duration between 0us (yep) and 491us."

Make those variables explicitely u32 again, so this works for both 32
and 64 bit.

Reported-by: Tvrtko Ursulin <tvrtko....@onelan.co.uk>
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/lib/delay.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/lib/delay.c b/arch/x86/lib/delay.c
index ff485d3..b6372ce 100644
--- a/arch/x86/lib/delay.c
+++ b/arch/x86/lib/delay.c
@@ -48,9 +48,9 @@ static void delay_loop(unsigned long loops)
}

/* TSC based delay: */
-static void delay_tsc(unsigned long loops)
+static void delay_tsc(unsigned long __loops)
{
- unsigned long bclock, now;
+ u32 bclock, now, loops = __loops;
int cpu;

preempt_disable();

Willy Tarreau

unread,

Oct 1, 2012, 7:10:52 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Stuart Hayes, Shyam Iyer, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Stuart Hayes <stuart...@dell.com>

This was fixed upstream by commit e22bee782b3b00bd4534ae9b1c5fb2e8e6573c5c
('workqueue: implement concurrency managed dynamic worker pool'), but
that is far too large a change for stable.

When Dell iDRAC is reset, the iDRAC's USB keyboard/mouse device stops
responding but is not actually disconnected. This causes usbhid to
hid hid_io_error(), and you get a chain of calls like...

hid_reset()
usb_reset_device()
usb_reset_and_verify_device()
usb_ep0_reinit()
usb_disble_endpoint()
usb_hcd_disable_endpoint()
ehci_endpoint_disable()

Along the way, as a result of an error/timeout with a USB transaction,
ehci_clear_tt_buffer() calls usb_hub_clear_tt_buffer() (to clear a failed
transaction out of the transaction translator in the hub), which schedules
hub_tt_work() to be run (using keventd), and then sets qh->clearing_tt=1 so
that nobody will mess with that qh until the TT is cleared.

But run_workqueue() never happens for the keventd workqueue on that CPU, so
hub_tt_work() never gets run. And qh->clearing_tt never gets changed back to
0.

This causes ehci_endpoint_disable() to get stuck in a loop waiting for
qh->clearing_tt to go to 0.

Part of the problem is hid_reset() is itself running on keventd. So
when that thread gets a timeout trying to talk to the HID device, it
schedules clear_work (to run hub_tt_work) to run, and then gets stuck
in ehci_endpoint_disable waiting for it to run.

However, clear_work never gets run because the workqueue for that CPU
is still waiting for hid_reset to finish.

A much less invasive patch for earlier kernels is to just schedule
clear_work on khubd if the usb code needs to clear the TT and it sees
that it is already running on keventd. Khubd isn't used by default
because it can get blocked by device enumeration sometimes, but I
think it should be ok for a backup for unusual cases like this just to
prevent deadlock.

Signed-off-by: Stuart Hayes <stuart...@dell.com>
Signed-off-by: Shyam Iyer <shyam...@dell.com>
[bwh: Use current_is_keventd() rather than checking current->{flags,comm}]
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/core/hub.c | 31 +++++++++++++++++++++++++++----
kernel/workqueue.c | 1 +
2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c
index 2b428fc..069de19 100644
--- a/drivers/usb/core/hub.c
+++ b/drivers/usb/core/hub.c
@@ -458,10 +458,8 @@ hub_clear_tt_buffer (struct usb_device *hdev, u16 devinfo, u16 tt)
* talking to TTs must queue control transfers (not just bulk and iso), so
* both can talk to the same hub concurrently.
*/
-static void hub_tt_work(struct work_struct *work)
+void _hub_tt_work(struct usb_hub *hub)
{
- struct usb_hub *hub =
- container_of(work, struct usb_hub, tt.clear_work);
unsigned long flags;
int limit = 100;

@@ -496,6 +494,14 @@ static void hub_tt_work(struct work_struct *work)
spin_unlock_irqrestore (&hub->tt.lock, flags);
}

+void hub_tt_work(struct work_struct *work)
+{
+ struct usb_hub *hub =
+ container_of(work, struct usb_hub, tt.clear_work);
+
+ _hub_tt_work(hub);
+}
+
/**
* usb_hub_clear_tt_buffer - clear control/bulk TT state in high speed hub
* @urb: an URB associated with the failed or incomplete split transaction
@@ -543,7 +549,20 @@ int usb_hub_clear_tt_buffer(struct urb *urb)
/* tell keventd to clear state for this TT */
spin_lock_irqsave (&tt->lock, flags);
list_add_tail (&clear->clear_list, &tt->clear_list);
- schedule_work(&tt->clear_work);
+ /* don't schedule on kevent if we're running on keventd (e.g.,
+ * in hid_reset we can get here on kevent) unless on >=2.6.36
+ */
+ if (!current_is_keventd())
+ /* put it on keventd */
+ schedule_work(&tt->clear_work);
+ else {
+ /* let khubd do it */
+ struct usb_hub *hub =
+ container_of(&tt->clear_work, struct usb_hub,
+ tt.clear_work);
+ kick_khubd(hub);
+ }
+
spin_unlock_irqrestore (&tt->lock, flags);
return 0;
}
@@ -3274,6 +3293,10 @@ static void hub_events(void)
if (hub->quiescing)
goto loop_autopm;

+ /* _hub_tt_work usually run on keventd */
+ if (!list_empty(&hub->tt.clear_list))
+ _hub_tt_work(hub);
+
if (hub->error) {
dev_dbg (hub_dev, "resetting for error %d\n",
hub->error);
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 67e526b..b617e0c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -772,6 +772,7 @@ int current_is_keventd(void)
return ret;

}
+EXPORT_SYMBOL_GPL(current_is_keventd);

static struct cpu_workqueue_struct *
init_cpu_workqueue(struct workqueue_struct *wq, int cpu)

Willy Tarreau

unread,

Oct 1, 2012, 7:11:11 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Matt Mackall, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit 330e0a01d54c2b8606c56816f99af6ebc58ec92c upstream.

Matt Mackall stepped down as the /dev/random driver maintainer last
year, so Theodore Ts'o is taking back the /dev/random driver.

Cc: Matt Mackall <m...@selenic.com>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

MAINTAINERS | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 613da5d..334258c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4379,7 +4379,7 @@ F: Documentation/blockdev/ramdisk.txt
F: drivers/block/brd.c

RANDOM NUMBER DRIVER
-M: Matt Mackall <m...@selenic.com>
+M: Theodore Ts'o" <ty...@mit.edu>
S: Maintained
F: drivers/char/random.c

Willy Tarreau

unread,

Oct 1, 2012, 7:11:19 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Stephen M. Cameron, Jens Axboe, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Stephen M. Cameron <scam...@beardog.cce.hp.com>

commit b0cf0b118c90477d1a6811f2cd2307f6a5578362 upstream.

Delete code which sets SCSI status incorrectly as it's already been set
correctly above this incorrect code. The bug was introduced in 2009 by
commit b0e15f6db111 ("cciss: fix typo that causes scsi status to be
lost.")

Signed-off-by: Stephen M. Cameron <scam...@beardog.cce.hp.com>
Reported-by: Roel van Meer <roel.v...@bokxing.nl>
Tested-by: Roel van Meer <roel.v...@bokxing.nl>
Cc: Jens Axboe <ax...@kernel.dk>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/block/cciss_scsi.c | 12 +-----------
1 files changed, 1 insertions(+), 11 deletions(-)

diff --git a/drivers/block/cciss_scsi.c b/drivers/block/cciss_scsi.c
index 3315268..ad8e592 100644
--- a/drivers/block/cciss_scsi.c
+++ b/drivers/block/cciss_scsi.c
@@ -747,17 +747,7 @@ complete_scsi_command( CommandList_struct *cp, int timeout, __u32 tag)
{
case CMD_TARGET_STATUS:
/* Pass it up to the upper layers... */
- if( ei->ScsiStatus)
- {
-#if 0
- printk(KERN_WARNING "cciss: cmd %p "
- "has SCSI Status = %x\n",
- cp,
- ei->ScsiStatus);
-#endif
- cmd->result |= (ei->ScsiStatus < 1);
- }
- else { /* scsi status is zero??? How??? */
+ if (!ei->ScsiStatus) {

/* Ordinarily, this case should never happen, but there is a bug
in some released firmware revisions that allows it to happen

Willy Tarreau

unread,

Oct 1, 2012, 7:11:49 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Stephan Baerwolf, Marcelo Tosatti, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan....@tu-ilmenau.de>

commit 0769c5de24621141c953fbe1f943582d37cb4244 upstream

In order to be able to proceed checks on CPU-specific properties
within the emulator, function "get_cpuid" is introduced.
With "get_cpuid" it is possible to virtually call the guests
"cpuid"-opcode without changing the VM's context.

[mtosatti: cleanup/beautify code]

[bwh: Backport to 2.6.32:
- Don't use emul_to_vcpu
- Adjust context]

Signed-off-by: Stephan Baerwolf <stephan....@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtos...@redhat.com>
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/include/asm/kvm_emulate.h | 2 ++
arch/x86/kvm/x86.c | 23 +++++++++++++++++++++++
2 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 5ed59ec..61bf2eb 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -109,6 +109,8 @@ struct x86_emulate_ops {
unsigned int bytes,
struct kvm_vcpu *vcpu);

+ bool (*get_cpuid)(struct x86_emulate_ctxt *ctxt,
+ u32 *eax, u32 *ebx, u32 *ecx, u32 *edx);
};

/* Type, address-of, and value of an instruction's operand. */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index df1cefb..23b5a71 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2871,12 +2871,35 @@ void kvm_report_emulation_failure(struct kvm_vcpu *vcpu, const char *context)
}
EXPORT_SYMBOL_GPL(kvm_report_emulation_failure);

+static bool emulator_get_cpuid(struct x86_emulate_ctxt *ctxt,
+ u32 *eax, u32 *ebx, u32 *ecx, u32 *edx)
+{
+ struct kvm_cpuid_entry2 *cpuid = NULL;
+
+ if (eax && ecx)
+ cpuid = kvm_find_cpuid_entry(ctxt->vcpu,
+ *eax, *ecx);
+
+ if (cpuid) {
+ *eax = cpuid->eax;
+ *ecx = cpuid->ecx;
+ if (ebx)
+ *ebx = cpuid->ebx;
+ if (edx)
+ *edx = cpuid->edx;

+ return true;
+ }
+

+ return false;
+}
+
static struct x86_emulate_ops emulate_ops = {
.read_std = kvm_read_guest_virt_system,
.fetch = kvm_fetch_guest_virt,
.read_emulated = emulator_read_emulated,
.write_emulated = emulator_write_emulated,
.cmpxchg_emulated = emulator_cmpxchg_emulated,
+ .get_cpuid = emulator_get_cpuid,
};

static void cache_all_regs(struct kvm_vcpu *vcpu)

Willy Tarreau

unread,

Oct 1, 2012, 7:12:03 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Oleg Nesterov, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Oleg Nesterov <ol...@redhat.com>

commit 971316f0503a5c50633d07b83b6db2f15a3a5b00 upstream.

signalfd_cleanup() ensures that ->signalfd_wqh is not used, but
this is not enough. eppoll_entry->whead still points to the memory
we are going to free, ep_unregister_pollwait()->remove_wait_queue()
is obviously unsafe.

Change ep_poll_callback(POLLFREE) to set eppoll_entry->whead = NULL,
change ep_unregister_pollwait() to check pwq->whead != NULL under
rcu_read_lock() before remove_wait_queue(). We add the new helper,
ep_remove_wait_queue(), for this.

This works because sighand_cachep is SLAB_DESTROY_BY_RCU and because
->signalfd_wqh is initialized in sighand_ctor(), not in copy_sighand.
ep_unregister_pollwait()->remove_wait_queue() can play with already
freed and potentially reused ->sighand, but this is fine. This memory
must have the valid ->signalfd_wqh until rcu_read_unlock().

Reported-by: Maxime Bizon <mbi...@freebox.fr>
Signed-off-by: Oleg Nesterov <ol...@redhat.com>

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/eventpoll.c | 30 +++++++++++++++++++++++++++---
fs/signalfd.c | 6 +++++-
2 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 15a7ef3..42f2c12 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -300,6 +300,11 @@ static inline int ep_is_linked(struct list_head *p)
return !list_empty(p);
}

+static inline struct eppoll_entry *ep_pwq_from_wait(wait_queue_t *p)
+{
+ return container_of(p, struct eppoll_entry, wait);
+}
+
/* Get the "struct epitem" from a wait queue pointer */
static inline struct epitem *ep_item_from_wait(wait_queue_t *p)
{
@@ -434,6 +439,18 @@ static void ep_poll_safewake(wait_queue_head_t *wq)
put_cpu();
}

+static void ep_remove_wait_queue(struct eppoll_entry *pwq)
+{
+ wait_queue_head_t *whead;
+
+ rcu_read_lock();
+ /* If it is cleared by POLLFREE, it should be rcu-safe */
+ whead = rcu_dereference(pwq->whead);
+ if (whead)
+ remove_wait_queue(whead, &pwq->wait);
+ rcu_read_unlock();
+}
+
/*
* This function unregisters poll callbacks from the associated file
* descriptor. Must be called with "mtx" held (or "epmutex" if called from
@@ -448,7 +465,7 @@ static void ep_unregister_pollwait(struct eventpoll *ep, struct epitem *epi)
pwq = list_first_entry(lsthead, struct eppoll_entry, llink);

list_del(&pwq->llink);
- remove_wait_queue(pwq->whead, &pwq->wait);
+ ep_remove_wait_queue(pwq);
kmem_cache_free(pwq_cache, pwq);
}
}
@@ -814,9 +831,16 @@ static int ep_poll_callback(wait_queue_t *wait, unsigned mode, int sync, void *k
struct epitem *epi = ep_item_from_wait(wait);
struct eventpoll *ep = epi->ep;

- /* the caller holds eppoll_entry->whead->lock */
- if ((unsigned long)key & POLLFREE)
+ if ((unsigned long)key & POLLFREE) {
+ ep_pwq_from_wait(wait)->whead = NULL;
+ /*
+ * whead = NULL above can race with ep_remove_wait_queue()
+ * which can do another remove_wait_queue() after us, so we
+ * can't use __remove_wait_queue(). whead->lock is held by
+ * the caller.
+ */
list_del_init(&wait->task_list);
+ }

spin_lock_irqsave(&ep->lock, flags);

diff --git a/fs/signalfd.c b/fs/signalfd.c
index 6339cb4..02c25d7 100644
--- a/fs/signalfd.c
+++ b/fs/signalfd.c
@@ -32,7 +32,11 @@
void signalfd_cleanup(struct sighand_struct *sighand)
{
wait_queue_head_t *wqh = &sighand->signalfd_wqh;
-
+ /*
+ * The lockless check can race with remove_wait_queue() in progress,
+ * but in this case its caller should run under rcu_read_lock() and
+ * sighand_cachep is SLAB_DESTROY_BY_RCU, we can safely return.
+ */
if (likely(!waitqueue_active(wqh)))
return;

Willy Tarreau

unread,

Oct 1, 2012, 7:12:31 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Mike Galbraith, Peter Zijlstra, Ingo Molnar, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mike Galbraith <efa...@gmx.de>

commit d7d8294415f0ce4254827d4a2a5ee88b00be52a8 upstream.

Signed unsigned comparison may lead to superfluous resched if leftmost
is right of the current task, wasting a few cycles, and inadvertently
_lengthening_ the current task's slice.

Reported-by: Venkatesh Pallipadi <ve...@google.com>
Signed-off-by: Mike Galbraith <efa...@gmx.de>
Signed-off-by: Peter Zijlstra <a.p.zi...@chello.nl>
LKML-Reference: <1294202477....@marge.simson.net>
Signed-off-by: Ingo Molnar <mi...@elte.hu>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/sched_fair.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/kernel/sched_fair.c b/kernel/sched_fair.c
index cd9a40b..fd6e5f2 100644
--- a/kernel/sched_fair.c
+++ b/kernel/sched_fair.c
@@ -862,6 +862,9 @@ check_preempt_tick(struct cfs_rq *cfs_rq, struct sched_entity *curr)
struct sched_entity *se = __pick_next_entity(cfs_rq);
s64 delta = curr->vruntime - se->vruntime;

+ if (delta < 0)
+ return;
+
if (delta > ideal_runtime)
resched_task(rq_of(cfs_rq)->curr);

Willy Tarreau

unread,

Oct 1, 2012, 7:12:55 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Kees Cook, H. Peter Anvin, Fenghua Yu, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Kees Cook <kees...@canonical.com>

commit 7ccafc5f75c87853f3c49845d5a884f2376e03ce upstream.

The Intel manual changed the name of the CPUID bit to match the
instruction name. We should follow suit for sanity's sake. (See Intel SDM
Volume 2, Table 3-20 "Feature Information Returned in the ECX Register".)

[ hpa: we can only do this at this time because there are currently no CPUs
with this feature on the market, hence this is pre-hardware enabling.
However, Cc:'ing stable so that stable can present a consistent ABI. ]

Signed-off-by: Kees Cook <kees...@canonical.com>
Link: http://lkml.kernel.org/r/20110524232...@outflux.net

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
Cc: Fenghua Yu <fengh...@intel.com>

Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/include/asm/cpufeature.h | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 84d5337..27929b8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -125,7 +125,7 @@
#define X86_FEATURE_OSXSAVE (4*32+27) /* "" XSAVE enabled in the OS */
#define X86_FEATURE_AVX (4*32+28) /* Advanced Vector Extensions */
#define X86_FEATURE_F16C (4*32+29) /* 16-bit fp conversions */
-#define X86_FEATURE_RDRND (4*32+30) /* The RDRAND instruction */
+#define X86_FEATURE_RDRAND (4*32+30) /* The RDRAND instruction */
#define X86_FEATURE_HYPERVISOR (4*32+31) /* Running on a hypervisor */

/* VIA/Cyrix/Centaur-defined CPU features, CPUID level 0xC0000001, word 5 */

Willy Tarreau

unread,

Oct 1, 2012, 7:13:01 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit 1415dd8705394399d59a3df1ab48d149e1e41e77 upstream.

When insert_inode_locked() fails in ext3_new_inode() it most likely
means inode bitmap got corrupted and we allocated again inode which
is already in use. Also doing unlock_new_inode() during error recovery
is wrong since inode does not have I_NEW set. Fix the problem by jumping
to fail: (instead of fail_drop:) which declares filesystem error and
does not call unlock_new_inode().

Reviewed-by: Eric Sandeen <san...@redhat.com>

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ext3/ialloc.c | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/fs/ext3/ialloc.c b/fs/ext3/ialloc.c
index b399912..108f4fc 100644
--- a/fs/ext3/ialloc.c
+++ b/fs/ext3/ialloc.c
@@ -575,8 +575,12 @@ got:
if (IS_DIRSYNC(inode))
handle->h_sync = 1;
if (insert_inode_locked(inode) < 0) {
- err = -EINVAL;
- goto fail_drop;
+ /*
+ * Likely a bitmap corruption causing inode to be allocated
+ * twice.
+ */
+ err = -EIO;
+ goto fail;
}
spin_lock(&sbi->s_next_gen_lock);
inode->i_generation = sbi->s_next_generation++;

Willy Tarreau

unread,

Oct 1, 2012, 7:13:19 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Dan Carpenter, Artem Bityutskiy, David Woodhouse, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Dan Carpenter <dan.ca...@oracle.com>

commit 48f8b641297df49021093763a3271119a84990a2 upstream.

The intent here was clearly to set result to true if the 0x40000000 flag
was set. But instead there was a | vs & typo and we always set result
to true.

Artem: check the spec at
wiki.laptop.org/images/5/5c/88ALP01_Datasheet_July_2007.pdf
and this fix looks correct.

Signed-off-by: Dan Carpenter <dan.ca...@oracle.com>
Signed-off-by: Artem Bityutskiy <artem.bi...@linux.intel.com>
Signed-off-by: David Woodhouse <David.W...@intel.com>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/mtd/nand/cafe_nand.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/mtd/nand/cafe_nand.c b/drivers/mtd/nand/cafe_nand.c
index c828d9a..97b9c7b 100644
--- a/drivers/mtd/nand/cafe_nand.c
+++ b/drivers/mtd/nand/cafe_nand.c
@@ -103,7 +103,7 @@ static const char *part_probes[] = { "cmdlinepart", "RedBoot", NULL };
static int cafe_device_ready(struct mtd_info *mtd)
{
struct cafe_priv *cafe = mtd->priv;
- int result = !!(cafe_readl(cafe, NAND_STATUS) | 0x40000000);
+ int result = !!(cafe_readl(cafe, NAND_STATUS) & 0x40000000);
uint32_t irqs = cafe_readl(cafe, NAND_IRQ);

cafe_writel(cafe, irqs, NAND_IRQ);

Willy Tarreau

unread,

Oct 1, 2012, 7:13:36 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jiri Kosina, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jiri Kosina <jko...@suse.cz>

[ Upstream commit 59ea33a68a9083ac98515e4861c00e71efdc49a1 ]

Back in 2006, commit 1a2449a87b ("[I/OAT]: TCP recv offload to I/OAT")
added support for receive offloading to IOAT dma engine if available.

The code in tcp_rcv_established() tries to perform early DMA copy if
applicable. It however does so without checking whether the userspace
task is actually expecting the data in the buffer.

This is not a problem under normal circumstances, but there is a corner
case where this doesn't work -- and that's when MSG_TRUNC flag to
recvmsg() is used.

If the IOAT dma engine is not used, the code properly checks whether
there is a valid ucopy.task and the socket is owned by userspace, but
misses the check in the dmaengine case.

This problem can be observed in real trivially -- for example 'tbench' is a
good reproducer, as it makes a heavy use of MSG_TRUNC. On systems utilizing
IOAT, you will soon find tbench waiting indefinitely in sk_wait_data(), as they
have been already early-copied in tcp_rcv_established() using dma engine.

This patch introduces the same check we are performing in the simple
iovec copy case to the IOAT case as well. It fixes the indefinite
recvmsg(MSG_TRUNC) hangs.

Signed-off-by: Jiri Kosina <jko...@suse.cz>

Signed-off-by: David S. Miller <da...@davemloft.net>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/tcp_input.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index ce1ce82..4b148e5 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5239,7 +5239,9 @@ int tcp_rcv_established(struct sock *sk, struct sk_buff *skb,
if (tp->copied_seq == tp->rcv_nxt &&
len - tcp_header_len <= tp->ucopy.len) {
#ifdef CONFIG_NET_DMA
- if (tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
+ if (tp->ucopy.task == current &&
+ sock_owned_by_user(sk) &&
+ tcp_dma_try_early_copy(sk, skb, tcp_header_len)) {
copied_early = 1;
eaten = 1;

Willy Tarreau

unread,

Oct 1, 2012, 7:13:49 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Linus Torvalds, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Linus Torvalds <torv...@linux-foundation.org>

commit a2080a67abe9e314f9e9c2cc3a4a176e8a8f8793 upstream.

Add a new interface, add_device_randomness() for adding data to the
random pool that is likely to differ between two devices (or possibly
even per boot). This would be things like MAC addresses or serial
numbers, or the read-out of the RTC. This does *not* add any actual
entropy to the pool, but it initializes the pool to different values
for devices that might otherwise be identical and have very little
entropy available to them (particularly common in the embedded world).

[ Modified by tytso to mix in a timestamp, since there may be some
variability caused by the time needed to detect/configure the hardware
in question. ]

Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 28 ++++++++++++++++++++++++++++
include/linux/random.h | 1 +
2 files changed, 29 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d7135b9..cfcb31d 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -125,11 +125,20 @@
* The current exported interfaces for gathering environmental noise
* from the devices are:
*
+ * void add_device_randomness(const void *buf, unsigned int size);
* void add_input_randomness(unsigned int type, unsigned int code,
* unsigned int value);
* void add_interrupt_randomness(int irq, int irq_flags);
* void add_disk_randomness(struct gendisk *disk);
*
+ * add_device_randomness() is for adding data to the random pool that
+ * is likely to differ between two devices (or possibly even per boot).
+ * This would be things like MAC addresses or serial numbers, or the
+ * read-out of the RTC. This does *not* add any actual entropy to the
+ * pool, but it initializes the pool to different values for devices
+ * that might otherwise be identical and have very little entropy
+ * available to them (particularly common in the embedded world).
+ *
* add_input_randomness() uses the input layer interrupt timing, as well as
* the event type information from the hardware.
*
@@ -646,6 +655,25 @@ static void set_timer_rand_state(unsigned int irq,
}
#endif

+/*
+ * Add device- or boot-specific data to the input and nonblocking
+ * pools to help initialize them to unique values.
+ *
+ * None of this adds any entropy, it is meant to avoid the
+ * problem of the nonblocking pool having similar initial state
+ * across largely identical devices.
+ */
+void add_device_randomness(const void *buf, unsigned int size)
+{
+ unsigned long time = get_cycles() ^ jiffies;
+
+ mix_pool_bytes(&input_pool, buf, size, NULL);
+ mix_pool_bytes(&input_pool, &time, sizeof(time), NULL);
+ mix_pool_bytes(&nonblocking_pool, buf, size, NULL);
+ mix_pool_bytes(&nonblocking_pool, &time, sizeof(time), NULL);
+}
+EXPORT_SYMBOL(add_device_randomness);
+
static struct timer_rand_state input_timer_state;

/*
diff --git a/include/linux/random.h b/include/linux/random.h
index 8a85602..7451093 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -46,6 +46,7 @@ struct rand_pool_info {

extern void rand_initialize_irq(int irq);

+extern void add_device_randomness(const void *, unsigned int);
extern void add_input_randomness(unsigned int type, unsigned int code,
unsigned int value);
extern void add_interrupt_randomness(int irq, int irq_flags);

Willy Tarreau

unread,

Oct 1, 2012, 7:14:11 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Dan Williams, James Bottomley, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.w...@intel.com>

commit b17caa174a7e1fd2e17b26e210d4ee91c4c28b37 upstream.

commit 198439e4 [SCSI] libsas: do not set res = 0 in sas_ex_discover_dev()
commit 19252de6 [SCSI] libsas: fix wide port hotplug issues

The above commits seem to have confused the return value of
sas_ex_discover_dev which is non-zero on failure and
sas_ex_join_wide_port which just indicates short circuiting discovery on
already established ports. The result is random discovery failures
depending on configuration.

Calls to sas_ex_join_wide_port are the source of the trouble as its
return value is errantly assigned to 'res'. Convert it to bool and stop
returning its result up the stack.

Tested-by: Dan Melnic <dan.m...@amd.com>
Reported-by: Dan Melnic <dan.m...@amd.com>
Signed-off-by: Dan Williams <dan.j.w...@intel.com>
Reviewed-by: Jack Wang <jack...@usish.com>
Signed-off-by: James Bottomley <JBott...@Parallels.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/scsi/libsas/sas_expander.c | 39 +++++++++++------------------------
1 files changed, 12 insertions(+), 27 deletions(-)

diff --git a/drivers/scsi/libsas/sas_expander.c b/drivers/scsi/libsas/sas_expander.c
index bac091d..1bdfde1 100644
--- a/drivers/scsi/libsas/sas_expander.c
+++ b/drivers/scsi/libsas/sas_expander.c
@@ -754,7 +754,7 @@ static struct domain_device *sas_ex_discover_end_dev(
}

/* See if this phy is part of a wide port */
-static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
+static bool sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
{
struct ex_phy *phy = &parent->ex_dev.ex_phy[phy_id];
int i;
@@ -770,11 +770,11 @@ static int sas_ex_join_wide_port(struct domain_device *parent, int phy_id)
sas_port_add_phy(ephy->port, phy->phy);
phy->port = ephy->port;
phy->phy_state = PHY_DEVICE_DISCOVERED;
- return 0;
+ return true;
}
}

- return -ENODEV;
+ return false;
}

static struct domain_device *sas_ex_discover_expander(
@@ -912,8 +912,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
return res;
}

- res = sas_ex_join_wide_port(dev, phy_id);
- if (!res) {
+ if (sas_ex_join_wide_port(dev, phy_id)) {
SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
phy_id, SAS_ADDR(ex_phy->attached_sas_addr));
return res;
@@ -958,8 +957,7 @@ static int sas_ex_discover_dev(struct domain_device *dev, int phy_id)
if (SAS_ADDR(ex->ex_phy[i].attached_sas_addr) ==
SAS_ADDR(child->sas_addr)) {
ex->ex_phy[i].phy_state= PHY_DEVICE_DISCOVERED;
- res = sas_ex_join_wide_port(dev, i);
- if (!res)
+ if (sas_ex_join_wide_port(dev, i))
SAS_DPRINTK("Attaching ex phy%d to wide port %016llx\n",
i, SAS_ADDR(ex->ex_phy[i].attached_sas_addr));

@@ -1812,32 +1810,20 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
{
struct ex_phy *ex_phy = &dev->ex_dev.ex_phy[phy_id];
struct domain_device *child;
- bool found = false;
- int res, i;
+ int res;

SAS_DPRINTK("ex %016llx phy%d new device attached\n",
SAS_ADDR(dev->sas_addr), phy_id);
res = sas_ex_phy_discover(dev, phy_id);
if (res)
- goto out;
- /* to support the wide port inserted */
- for (i = 0; i < dev->ex_dev.num_phys; i++) {
- struct ex_phy *ex_phy_temp = &dev->ex_dev.ex_phy[i];
- if (i == phy_id)
- continue;
- if (SAS_ADDR(ex_phy_temp->attached_sas_addr) ==
- SAS_ADDR(ex_phy->attached_sas_addr)) {
- found = true;
- break;
- }
- }
- if (found) {
- sas_ex_join_wide_port(dev, phy_id);
+ return res;
+
+ if (sas_ex_join_wide_port(dev, phy_id))
return 0;
- }
+
res = sas_ex_discover_devices(dev, phy_id);
- if (!res)
- goto out;
+ if (res)
+ return res;
list_for_each_entry(child, &dev->ex_dev.children, siblings) {
if (SAS_ADDR(child->sas_addr) ==
SAS_ADDR(ex_phy->attached_sas_addr)) {
@@ -1847,7 +1833,6 @@ static int sas_discover_new(struct domain_device *dev, int phy_id)
break;
}
}
-out:
return res;

Willy Tarreau

unread,

Oct 1, 2012, 7:14:24 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Darren Hart, Dave Jones, Thomas Gleixner, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Darren Hart <dvh...@linux.intel.com>

commit 6f7b0a2a5c0fb03be7c25bd1745baa50582348ef upstream.

If uaddr == uaddr2, then we have broken the rule of only requeueing
from a non-pi futex to a pi futex with this call. If we attempt this,
as the trinity test suite manages to do, we miss early wakeups as
q.key is equal to key2 (because they are the same uaddr). We will then
attempt to dereference the pi_mutex (which would exist had the futex_q
been properly requeued to a pi futex) and trigger a NULL pointer
dereference.

Signed-off-by: Darren Hart <dvh...@linux.intel.com>
Cc: Dave Jones <da...@redhat.com>

Link: http://lkml.kernel.org/r/ad82bfe7f7d130247fbe2b5b4275654...@linux.intel.com
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/futex.c | 13 ++++++++-----
1 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/kernel/futex.c b/kernel/futex.c
index 7fd3dac..9c5ffe1 100644
--- a/kernel/futex.c
+++ b/kernel/futex.c
@@ -2204,11 +2204,11 @@ int handle_early_requeue_pi_wakeup(struct futex_hash_bucket *hb,
* @uaddr2: the pi futex we will take prior to returning to user-space
*
* The caller will wait on uaddr and will be requeued by futex_requeue() to
- * uaddr2 which must be PI aware. Normal wakeup will wake on uaddr2 and
- * complete the acquisition of the rt_mutex prior to returning to userspace.
- * This ensures the rt_mutex maintains an owner when it has waiters; without
- * one, the pi logic wouldn't know which task to boost/deboost, if there was a
- * need to.
+ * uaddr2 which must be PI aware and unique from uaddr. Normal wakeup will wake
+ * on uaddr2 and complete the acquisition of the rt_mutex prior to returning to
+ * userspace. This ensures the rt_mutex maintains an owner when it has waiters;
+ * without one, the pi logic would not know which task to boost/deboost, if
+ * there was a need to.
*
* We call schedule in futex_wait_queue_me() when we enqueue and return there
* via the following:
@@ -2245,6 +2245,9 @@ static int futex_wait_requeue_pi(u32 __user *uaddr, int fshared,
struct futex_q q;
int res, ret;

+ if (uaddr == uaddr2)
+ return -EINVAL;
+
if (!bitset)
return -EINVAL;

Willy Tarreau

unread,

Oct 1, 2012, 7:14:44 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit c2557a303ab6712bb6e09447df828c557c710ac9 upstream.

Create a new function, get_random_bytes_arch() which will use the
architecture-specific hardware random number generator if it is
present. Change get_random_bytes() to not use the HW RNG, even if it
is avaiable.

The reason for this is that the hw random number generator is fast (if
it is present), but it requires that we trust the hardware
manufacturer to have not put in a back door. (For example, an
increasing counter encrypted by an AES key known to the NSA.)

It's unlikely that Intel (for example) was paid off by the US
Government to do this, but it's impossible for them to prove otherwise
--- especially since Bull Mountain is documented to use AES as a
whitener. Hence, the output of an evil, trojan-horse version of
RDRAND is statistically indistinguishable from an RDRAND implemented
to the specifications claimed by Intel. Short of using a tunnelling
electronic microscope to reverse engineer an Ivy Bridge chip and
disassembling and analyzing the CPU microcode, there's no way for us
to tell for sure.

Since users of get_random_bytes() in the Linux kernel need to be able
to support hardware systems where the HW RNG is not present, most
time-sensitive users of this interface have already created their own
cryptographic RNG interface which uses get_random_bytes() as a seed.
So it's much better to use the HW RNG to improve the existing random
number generator, by mixing in any entropy returned by the HW RNG into
/dev/random's entropy pool, but to always _use_ /dev/random's entropy
pool.

This way we get almost of the benefits of the HW RNG without any
potential liabilities. The only benefits we forgo is the
speed/performance enhancements --- and generic kernel code can't
depend on depend on get_random_bytes() having the speed of a HW RNG
anyway.

For those places that really want access to the arch-specific HW RNG,
if it is available, we provide get_random_bytes_arch().

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 27 +++++++++++++++++++++++----
include/linux/random.h | 1 +
2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index b5b16f1..b038751 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1038,11 +1038,28 @@ static ssize_t extract_entropy_user(struct entropy_store *r, void __user *buf,

/*
* This function is the exported kernel interface. It returns some
- * number of good random numbers, suitable for seeding TCP sequence
- * numbers, etc.
+ * number of good random numbers, suitable for key generation, seeding
+ * TCP sequence numbers, etc. It does not use the hw random number
+ * generator, if available; use get_random_bytes_arch() for that.

*/
void get_random_bytes(void *buf, int nbytes)
{

+ extract_entropy(&nonblocking_pool, buf, nbytes, 0, 0);
+}
+EXPORT_SYMBOL(get_random_bytes);
+
+/*
+ * This function will use the architecture-specific hardware random
+ * number generator if it is available. The arch-specific hw RNG will
+ * almost certainly be faster than what we can do in software, but it
+ * is impossible to verify that it is implemented securely (as
+ * opposed, to, say, the AES encryption of a sequence number using a
+ * key known by the NSA). So it's useful if we need the speed, but
+ * only if we're willing to trust the hardware manufacturer not to
+ * have put in a back door.
+ */
+void get_random_bytes_arch(void *buf, int nbytes)

+{
char *p = buf;

while (nbytes) {
@@ -1057,9 +1074,11 @@ void get_random_bytes(void *buf, int nbytes)
nbytes -= chunk;
}

- extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
+ if (nbytes)

+ extract_entropy(&nonblocking_pool, p, nbytes, 0, 0);
}

-EXPORT_SYMBOL(get_random_bytes);
+EXPORT_SYMBOL(get_random_bytes_arch);
+

/*
* init_std_data - initialize pool with system data
diff --git a/include/linux/random.h b/include/linux/random.h
index 7451093..5a376bc 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -52,6 +52,7 @@ extern void add_input_randomness(unsigned int type, unsigned int code,

extern void add_interrupt_randomness(int irq, int irq_flags);

extern void get_random_bytes(void *buf, int nbytes);
+extern void get_random_bytes_arch(void *buf, int nbytes);
void generate_random_uuid(unsigned char uuid_out[16]);

#ifndef MODULE

Willy Tarreau

unread,

Oct 1, 2012, 7:14:53 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, John Stultz, Thomas Gleixner, Prarit Bhargava, John Stultz, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: John Stultz <john....@linaro.org>

This is a backport of fad0c66c4bb836d57a5f125ecd38bed653ca863a
which resolves a bug the previous commit.

Commit 6b43ae8a61 (ntp: Fix leap-second hrtimer livelock) broke the
leapsecond update of CLOCK_MONOTONIC. The missing leapsecond update to
wall_to_monotonic causes discontinuities in CLOCK_MONOTONIC.

Adjust wall_to_monotonic when NTP inserted a leapsecond.

Reported-by: Richard Cochran <richard...@gmail.com>
Signed-off-by: John Stultz <john....@linaro.org>
Tested-by: Richard Cochran <richard...@gmail.com>
Link: http://lkml.kernel.org/r/1338400497-12420-1-git...@linaro.org
Signed-off-by: Thomas Gleixner <tg...@linutronix.de>

Cc: Prarit Bhargava <pra...@redhat.com>
Cc: Thomas Gleixner <tg...@linutronix.de>
Cc: Linux Kernel <linux-...@vger.kernel.org>
Signed-off-by: John Stultz <john...@us.ibm.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

kernel/time/timekeeping.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 00e2fae..6d19a00 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -780,6 +780,7 @@ void update_wall_time(void)
xtime.tv_sec++;
leap = second_overflow(xtime.tv_sec);
xtime.tv_sec += leap;
+ wall_to_monotonic.tv_sec -= leap;
}

raw_time.tv_nsec += timekeeper.raw_interval;

Willy Tarreau

unread,

Oct 1, 2012, 7:14:53 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Matt Mackall, Herbert Xu, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Matt Mackall <m...@selenic.com>

commit e954bc91bdd4bb08b8325478c5004b24a23a3522 upstream.

Rather than dynamically allocate 10 bytes, move it to static allocation.
This saves space and avoids the need for error checking.

Signed-off-by: Matt Mackall <m...@selenic.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>
[PG: adding this simplifies required updates to random for .34 stable]
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 10 +++-------
1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index de325792..0adb8c8 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -264,6 +264,7 @@
#define INPUT_POOL_WORDS 128
#define OUTPUT_POOL_WORDS 32
#define SEC_XFER_SIZE 512
+#define EXTRACT_SIZE 10

/*
* The minimum number of bits of entropy before we wake up a read on
@@ -421,7 +422,7 @@ struct entropy_store {
unsigned add_ptr;
int entropy_count;
int input_rotate;
- __u8 *last_data;
+ __u8 last_data[EXTRACT_SIZE];
};

static __u32 input_pool_data[INPUT_POOL_WORDS];
@@ -721,8 +722,6 @@ void add_disk_randomness(struct gendisk *disk)
}
#endif

-#define EXTRACT_SIZE 10
-
/*********************************************************************
*
* Entropy extraction routines
@@ -869,7 +868,7 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
while (nbytes) {
extract_buf(r, tmp);

- if (r->last_data) {
+ if (fips_enabled) {
spin_lock_irqsave(&r->lock, flags);
if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
panic("Hardware RNG duplicated output!\n");
@@ -958,9 +957,6 @@ static void init_std_data(struct entropy_store *r)
now = ktime_get_real();
mix_pool_bytes(r, &now, sizeof(now));
mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
- /* Enable continuous test in fips mode */
- if (fips_enabled)
- r->last_data = kmalloc(EXTRACT_SIZE, GFP_KERNEL);
}

static int rand_initialize(void)

Willy Tarreau

unread,

Oct 1, 2012, 7:15:15 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Alan Cox, David S. Miller, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Alan Cox <al...@linux.intel.com>

[ Upstream commit 8b72ff6484fe303e01498b58621810a114f3cf09 ]

gcc really should warn about these !

Signed-off-by: Alan Cox <al...@linux.intel.com>

Signed-off-by: David S. Miller <da...@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/wanrouter/wanmain.c | 51 +++++++++++++++++++++-------------------------
1 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/net/wanrouter/wanmain.c b/net/wanrouter/wanmain.c
index 258daa8..0d8380a 100644
--- a/net/wanrouter/wanmain.c
+++ b/net/wanrouter/wanmain.c
@@ -603,36 +603,31 @@ static int wanrouter_device_new_if(struct wan_device *wandev,
* successfully, add it to the interface list.
*/

- if (dev->name == NULL) {
- err = -EINVAL;
- } else {
+#ifdef WANDEBUG
+ printk(KERN_INFO "%s: registering interface %s...\n",
+ wanrouter_modname, dev->name);
+#endif

- #ifdef WANDEBUG
- printk(KERN_INFO "%s: registering interface %s...\n",
- wanrouter_modname, dev->name);
- #endif
-
- err = register_netdev(dev);
- if (!err) {
- struct net_device *slave = NULL;
- unsigned long smp_flags=0;
-
- lock_adapter_irq(&wandev->lock, &smp_flags);
-
- if (wandev->dev == NULL) {
- wandev->dev = dev;
- } else {
- for (slave=wandev->dev;
- DEV_TO_SLAVE(slave);
- slave = DEV_TO_SLAVE(slave))
- DEV_TO_SLAVE(slave) = dev;
- }
- ++wandev->ndev;
-
- unlock_adapter_irq(&wandev->lock, &smp_flags);
- err = 0; /* done !!! */
- goto out;
+ err = register_netdev(dev);
+ if (!err) {
+ struct net_device *slave = NULL;
+ unsigned long smp_flags=0;
+
+ lock_adapter_irq(&wandev->lock, &smp_flags);
+
+ if (wandev->dev == NULL) {
+ wandev->dev = dev;
+ } else {
+ for (slave=wandev->dev;
+ DEV_TO_SLAVE(slave);
+ slave = DEV_TO_SLAVE(slave))
+ DEV_TO_SLAVE(slave) = dev;
}
+ ++wandev->ndev;
+
+ unlock_adapter_irq(&wandev->lock, &smp_flags);
+ err = 0; /* done !!! */
+ goto out;
}
if (wandev->del_if)
wandev->del_if(wandev, dev);

Willy Tarreau

unread,

Oct 1, 2012, 7:15:36 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Tyler Hicks, Colin Ian King, Stefan Bader, Tim Gardner, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Tyler Hicks <tyh...@canonical.com>

commit 4a26620df451ad46151ad21d711ed43e963c004e upstream.

BugLink: http://bugs.launchpad.net/bugs/885744

statfs() calls on eCryptfs files returned the wrong filesystem type and,
when using filename encryption, the wrong maximum filename length.

If mount-wide filename encryption is enabled, the cipher block size and
the lower filesystem's max filename length will determine the max
eCryptfs filename length. Pre-tested, known good lengths are used when
the lower filesystem's namelen is 255 and a cipher with 8 or 16 byte
block sizes is used. In other, less common cases, we fall back to a safe
rounded-down estimate when determining the eCryptfs namelen.

https://launchpad.net/bugs/885744

Signed-off-by: Tyler Hicks <tyh...@canonical.com>
Reported-by: Kees Cook <kees...@chromium.org>
Reviewed-by: Kees Cook <kees...@chromium.org>
Reviewed-by: John Johansen <john.j...@canonical.com>
Signed-off-by: Colin Ian King <colin...@canonical.com>
Acked-by: Stefan Bader <stefan...@canonical.com>
Signed-off-by: Tim Gardner <tim.g...@canonical.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/ecryptfs/crypto.c | 68 ++++++++++++++++++++++++++++++++++++----
fs/ecryptfs/ecryptfs_kernel.h | 11 ++++++
fs/ecryptfs/keystore.c | 9 ++---
fs/ecryptfs/super.c | 18 ++++++++++-
4 files changed, 92 insertions(+), 14 deletions(-)

diff --git a/fs/ecryptfs/crypto.c b/fs/ecryptfs/crypto.c
index 7e164bb..7786bf6 100644
--- a/fs/ecryptfs/crypto.c
+++ b/fs/ecryptfs/crypto.c
@@ -2039,6 +2039,17 @@ out:
return;
}

+static size_t ecryptfs_max_decoded_size(size_t encoded_size)
+{
+ /* Not exact; conservatively long. Every block of 4
+ * encoded characters decodes into a block of 3
+ * decoded characters. This segment of code provides
+ * the caller with the maximum amount of allocated
+ * space that @dst will need to point to in a
+ * subsequent call. */
+ return ((encoded_size + 1) * 3) / 4;
+}
+
/**
* ecryptfs_decode_from_filename
* @dst: If NULL, this function only sets @dst_size and returns. If
@@ -2057,13 +2068,7 @@ ecryptfs_decode_from_filename(unsigned char *dst, size_t *dst_size,
size_t dst_byte_offset = 0;

if (dst == NULL) {
- /* Not exact; conservatively long. Every block of 4
- * encoded characters decodes into a block of 3
- * decoded characters. This segment of code provides
- * the caller with the maximum amount of allocated
- * space that @dst will need to point to in a
- * subsequent call. */
- (*dst_size) = (((src_size + 1) * 3) / 4);
+ (*dst_size) = ecryptfs_max_decoded_size(src_size);
goto out;
}
while (src_byte_offset < src_size) {
@@ -2289,3 +2294,52 @@ out_free:
out:
return rc;
}
+
+#define ENC_NAME_MAX_BLOCKLEN_8_OR_16 143
+
+int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
+ struct ecryptfs_mount_crypt_stat *mount_crypt_stat)
+{
+ struct blkcipher_desc desc;
+ struct mutex *tfm_mutex;
+ size_t cipher_blocksize;
+ int rc;
+
+ if (!(mount_crypt_stat->flags & ECRYPTFS_GLOBAL_ENCRYPT_FILENAMES)) {
+ (*namelen) = lower_namelen;

+ return 0;
+ }
+

+ rc = ecryptfs_get_tfm_and_mutex_for_cipher_name(&desc.tfm, &tfm_mutex,
+ mount_crypt_stat->global_default_fn_cipher_name);
+ if (unlikely(rc)) {
+ (*namelen) = 0;
+ return rc;
+ }
+
+ mutex_lock(tfm_mutex);
+ cipher_blocksize = crypto_blkcipher_blocksize(desc.tfm);
+ mutex_unlock(tfm_mutex);
+
+ /* Return an exact amount for the common cases */
+ if (lower_namelen == NAME_MAX
+ && (cipher_blocksize == 8 || cipher_blocksize == 16)) {
+ (*namelen) = ENC_NAME_MAX_BLOCKLEN_8_OR_16;

+ return 0;
+ }
+

+ /* Return a safe estimate for the uncommon cases */
+ (*namelen) = lower_namelen;
+ (*namelen) -= ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX_SIZE;
+ /* Since this is the max decoded size, subtract 1 "decoded block" len */
+ (*namelen) = ecryptfs_max_decoded_size(*namelen) - 3;
+ (*namelen) -= ECRYPTFS_TAG_70_MAX_METADATA_SIZE;
+ (*namelen) -= ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES;
+ /* Worst case is that the filename is padded nearly a full block size */
+ (*namelen) -= cipher_blocksize - 1;
+
+ if ((*namelen) < 0)
+ (*namelen) = 0;
+
+ return 0;
+}
diff --git a/fs/ecryptfs/ecryptfs_kernel.h b/fs/ecryptfs/ecryptfs_kernel.h
index 9685315..4181136 100644
--- a/fs/ecryptfs/ecryptfs_kernel.h
+++ b/fs/ecryptfs/ecryptfs_kernel.h
@@ -219,12 +219,21 @@ ecryptfs_get_key_payload_data(struct key *key)
* dentry name */
#define ECRYPTFS_TAG_73_PACKET_TYPE 0x49 /* FEK-encrypted filename as
* metadata */
+#define ECRYPTFS_MIN_PKT_LEN_SIZE 1 /* Min size to specify packet length */
+#define ECRYPTFS_MAX_PKT_LEN_SIZE 2 /* Pass at least this many bytes to
+ * ecryptfs_parse_packet_length() and
+ * ecryptfs_write_packet_length()
+ */
/* Constraint: ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES >=
* ECRYPTFS_MAX_IV_BYTES */
#define ECRYPTFS_FILENAME_MIN_RANDOM_PREPEND_BYTES 16
#define ECRYPTFS_NON_NULL 0x42 /* A reasonable substitute for NULL */
#define MD5_DIGEST_SIZE 16
#define ECRYPTFS_TAG_70_DIGEST_SIZE MD5_DIGEST_SIZE
+#define ECRYPTFS_TAG_70_MIN_METADATA_SIZE (1 + ECRYPTFS_MIN_PKT_LEN_SIZE \
+ + ECRYPTFS_SIG_SIZE + 1 + 1)
+#define ECRYPTFS_TAG_70_MAX_METADATA_SIZE (1 + ECRYPTFS_MAX_PKT_LEN_SIZE \
+ + ECRYPTFS_SIG_SIZE + 1 + 1)
#define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FEK_ENCRYPTED."
#define ECRYPTFS_FEK_ENCRYPTED_FILENAME_PREFIX_SIZE 23
#define ECRYPTFS_FNEK_ENCRYPTED_FILENAME_PREFIX "ECRYPTFS_FNEK_ENCRYPTED."
@@ -762,6 +771,8 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
size_t *packet_size,
struct ecryptfs_mount_crypt_stat *mount_crypt_stat,
char *data, size_t max_packet_size);
+int ecryptfs_set_f_namelen(long *namelen, long lower_namelen,
+ struct ecryptfs_mount_crypt_stat *mount_crypt_stat);
int ecryptfs_derive_iv(char *iv, struct ecryptfs_crypt_stat *crypt_stat,
loff_t offset);

diff --git a/fs/ecryptfs/keystore.c b/fs/ecryptfs/keystore.c
index 8f1a525..4f1feeb 100644
--- a/fs/ecryptfs/keystore.c
+++ b/fs/ecryptfs/keystore.c
@@ -548,10 +548,7 @@ ecryptfs_write_tag_70_packet(char *dest, size_t *remaining_bytes,
* Octets N3-N4: Block-aligned encrypted filename
* - Consists of a minimum number of random characters, a \0
* separator, and then the filename */
- s->max_packet_size = (1 /* Tag 70 identifier */
- + 3 /* Max Tag 70 packet size */
- + ECRYPTFS_SIG_SIZE /* FNEK sig */
- + 1 /* Cipher identifier */
+ s->max_packet_size = (ECRYPTFS_TAG_70_MAX_METADATA_SIZE
+ s->block_aligned_filename_size);
if (dest == NULL) {
(*packet_size) = s->max_packet_size;
@@ -806,10 +803,10 @@ ecryptfs_parse_tag_70_packet(char **filename, size_t *filename_size,
goto out;
}
s->desc.flags = CRYPTO_TFM_REQ_MAY_SLEEP;
- if (max_packet_size < (1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1)) {
+ if (max_packet_size < ECRYPTFS_TAG_70_MIN_METADATA_SIZE) {
printk(KERN_WARNING "%s: max_packet_size is [%zd]; it must be "
"at least [%d]\n", __func__, max_packet_size,
- (1 + 1 + ECRYPTFS_SIG_SIZE + 1 + 1));
+ ECRYPTFS_TAG_70_MIN_METADATA_SIZE);
rc = -EINVAL;
goto out;
}
diff --git a/fs/ecryptfs/super.c b/fs/ecryptfs/super.c
index 1a037f7..557469a 100644
--- a/fs/ecryptfs/super.c
+++ b/fs/ecryptfs/super.c
@@ -30,6 +30,8 @@
#include <linux/smp_lock.h>
#include <linux/file.h>
#include <linux/crypto.h>
+#include <linux/statfs.h>
+#include <linux/magic.h>
#include "ecryptfs_kernel.h"

struct kmem_cache *ecryptfs_inode_info_cache;
@@ -137,7 +139,21 @@ static void ecryptfs_put_super(struct super_block *sb)
*/
static int ecryptfs_statfs(struct dentry *dentry, struct kstatfs *buf)
{
- return vfs_statfs(ecryptfs_dentry_to_lower(dentry), buf);
+ struct dentry *lower_dentry = ecryptfs_dentry_to_lower(dentry);
+ int rc;
+
+ if (!lower_dentry->d_sb->s_op->statfs)
+ return -ENOSYS;
+
+ rc = lower_dentry->d_sb->s_op->statfs(lower_dentry, buf);
+ if (rc)
+ return rc;
+
+ buf->f_type = ECRYPTFS_SUPER_MAGIC;
+ rc = ecryptfs_set_f_namelen(&buf->f_namelen, buf->f_namelen,
+ &ecryptfs_superblock_to_private(dentry->d_sb)->mount_crypt_stat);
+
+ return rc;
}

/**

Willy Tarreau

unread,

Oct 1, 2012, 7:16:02 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Mark Brown, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Mark Brown <bro...@opensource.wolfsonmicro.com>

commit 27130f0cc3ab97560384da437e4621fc4e94f21c upstream.

wm831x devices contain a unique ID value. Feed this into the newly added
device_add_randomness() to add some per device seed data to the pool.

Signed-off-by: Mark Brown <bro...@opensource.wolfsonmicro.com>

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/mfd/wm831x-otp.c | 8 ++++++++
1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/drivers/mfd/wm831x-otp.c b/drivers/mfd/wm831x-otp.c
index f742745..b90f3e0 100644
--- a/drivers/mfd/wm831x-otp.c
+++ b/drivers/mfd/wm831x-otp.c
@@ -18,6 +18,7 @@
#include <linux/bcd.h>
#include <linux/delay.h>
#include <linux/mfd/core.h>
+#include <linux/random.h>

#include <linux/mfd/wm831x/core.h>
#include <linux/mfd/wm831x/otp.h>
@@ -66,6 +67,7 @@ static DEVICE_ATTR(unique_id, 0444, wm831x_unique_id_show, NULL);

int wm831x_otp_init(struct wm831x *wm831x)
{
+ char uuid[WM831X_UNIQUE_ID_LEN];
int ret;

ret = device_create_file(wm831x->dev, &dev_attr_unique_id);
@@ -73,6 +75,12 @@ int wm831x_otp_init(struct wm831x *wm831x)
dev_err(wm831x->dev, "Unique ID attribute not created: %d\n",
ret);

+ ret = wm831x_unique_id_read(wm831x, uuid);
+ if (ret == 0)
+ add_device_randomness(uuid, sizeof(uuid));
+ else
+ dev_err(wm831x->dev, "Failed to read UUID: %d\n", ret);
+
return ret;

Willy Tarreau

unread,

Oct 1, 2012, 7:16:09 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jarod Wilson, Matt Mackall, Herbert Xu, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jarod Wilson <ja...@redhat.com>

commit 442a4fffffa26fc3080350b4d50172f7589c3ac2 upstream.

At present, the comment header in random.c makes no mention of
add_disk_randomness, and instead, suggests that disk activity adds to the
random pool by way of add_interrupt_randomness, which appears to not have
been the case since sometime prior to the existence of git, and even prior
to bitkeeper. Didn't look any further back. At least, as far as I can
tell, there are no storage drivers setting IRQF_SAMPLE_RANDOM, which is a
requirement for add_interrupt_randomness to trigger, so the only way for a
disk to contribute entropy is by way of add_disk_randomness. Update
comments accordingly, complete with special mention about solid state
drives being a crappy source of entropy (see e2e1a148bc for reference).

Signed-off-by: Jarod Wilson <ja...@redhat.com>
Acked-by: Matt Mackall <m...@selenic.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 13 ++++++++++---
1 files changed, 10 insertions(+), 3 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a6e258b..de325792 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -128,6 +128,7 @@

* void add_input_randomness(unsigned int type, unsigned int code,
* unsigned int value);

* void add_interrupt_randomness(int irq);
+ * void add_disk_randomness(struct gendisk *disk);
*

* add_input_randomness() uses the input layer interrupt timing, as well as
* the event type information from the hardware.

@@ -136,9 +137,15 @@
* inputs to the entropy pool. Note that not all interrupts are good
* sources of randomness! For example, the timer interrupts is not a
* good choice, because the periodicity of the interrupts is too
- * regular, and hence predictable to an attacker. Disk interrupts are
- * a better measure, since the timing of the disk interrupts are more
- * unpredictable.
+ * regular, and hence predictable to an attacker. Network Interface
+ * Controller interrupts are a better measure, since the timing of the
+ * NIC interrupts are more unpredictable.
+ *
+ * add_disk_randomness() uses what amounts to the seek time of block
+ * layer request events, on a per-disk_devt basis, as input to the
+ * entropy pool. Note that high-speed solid state drives with very low
+ * seek times do not make for good sources of entropy, as their seek
+ * times are usually fairly consistent.
*
* All of these routines try to estimate how many bits of randomness a
* particular randomness source. They do this by keeping track of the

Willy Tarreau

unread,

Oct 1, 2012, 7:16:32 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Salman Qazi, John Stultz, Peter Zijlstra, Paul Turner, john stultz, Ingo Molnar, Mike Galbraith, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Salman Qazi <sq...@google.com>

commit 9993bc635d01a6ee7f6b833b4ee65ce7c06350b1 upstream.

When a machine boots up, the TSC generally gets reset. However,
when kexec is used to boot into a kernel, the TSC value would be
carried over from the previous kernel. The computation of
cycns_offset in set_cyc2ns_scale is prone to an overflow, if the
machine has been up more than 208 days prior to the kexec. The
overflow happens when we multiply *scale, even though there is
enough room to store the final answer.

We fix this issue by decomposing tsc_now into the quotient and
remainder of division by CYC2NS_SCALE_FACTOR and then performing
the multiplication separately on the two components.

Refactor code to share the calculation with the previous
fix in __cycles_2_ns().

Signed-off-by: Salman Qazi <sq...@google.com>
Acked-by: John Stultz <john....@linaro.org>
Acked-by: Peter Zijlstra <a.p.zi...@chello.nl>
Cc: Paul Turner <p...@google.com>
Cc: john stultz <john...@us.ibm.com>
Link: http://lkml.kernel.org/r/20120310004027.1...@dungbeetle.mtv.corp.google.com
Signed-off-by: Ingo Molnar <mi...@elte.hu>
Cc: Mike Galbraith <efa...@gmx.de>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/include/asm/timer.h | 8 ++------
arch/x86/kernel/tsc.c | 3 ++-
include/linux/kernel.h | 13 +++++++++++++
3 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index b93a9aa..18e1ca7 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -63,14 +63,10 @@ DECLARE_PER_CPU(unsigned long long, cyc2ns_offset);

static inline unsigned long long __cycles_2_ns(unsigned long long cyc)
{
- unsigned long long quot;
- unsigned long long rem;
int cpu = smp_processor_id();
unsigned long long ns = per_cpu(cyc2ns_offset, cpu);
- quot = (cyc >> CYC2NS_SCALE_FACTOR);
- rem = cyc & ((1ULL << CYC2NS_SCALE_FACTOR) - 1);
- ns += quot * per_cpu(cyc2ns, cpu) +
- ((rem * per_cpu(cyc2ns, cpu)) >> CYC2NS_SCALE_FACTOR);
+ ns += mult_frac(cyc, per_cpu(cyc2ns, cpu),
+ (1UL << CYC2NS_SCALE_FACTOR));
return ns;
}

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index bc07543..9972276 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -623,7 +623,8 @@ static void set_cyc2ns_scale(unsigned long cpu_khz, int cpu)

if (cpu_khz) {
*scale = (NSEC_PER_MSEC << CYC2NS_SCALE_FACTOR)/cpu_khz;
- *offset = ns_now - (tsc_now * *scale >> CYC2NS_SCALE_FACTOR);
+ *offset = ns_now - mult_frac(tsc_now, *scale,
+ (1UL << CYC2NS_SCALE_FACTOR));
}

sched_clock_idle_wakeup_event(0);
diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 9acb92d..3526cd4 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -55,6 +55,19 @@ extern const char linux_proc_banner[];
} \
)

+/*
+ * Multiplies an integer by a fraction, while avoiding unnecessary
+ * overflow or loss of precision.
+ */
+#define mult_frac(x, numer, denom)( \
+{ \
+ typeof(x) quot = (x) / (denom); \
+ typeof(x) rem = (x) % (denom); \
+ (quot * (numer)) + ((rem * (numer)) / (denom)); \
+} \
+)
+
+
#define _RET_IP_ (unsigned long)__builtin_return_address(0)
#define _THIS_IP_ ({ __label__ __here; __here: (unsigned long)&&__here; })

Willy Tarreau

unread,

Oct 1, 2012, 7:17:03 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Sven Schnelle, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Sven Schnelle <sv...@stackframe.org>

commit 99f347caa4568cb803862730b3b1f1942639523f upstream.

If a device specifies zero endpoints in its interface descriptor,
the kernel oopses in acm_probe(). Even though that's clearly an
invalid descriptor, we should test wether we have all endpoints.
This is especially bad as this oops can be triggered by just
plugging a USB device in.

Signed-off-by: Sven Schnelle <sv...@stackframe.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/class/cdc-acm.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/class/cdc-acm.c b/drivers/usb/class/cdc-acm.c
index 653f853..8ad9dfb 100644
--- a/drivers/usb/class/cdc-acm.c
+++ b/drivers/usb/class/cdc-acm.c
@@ -1120,7 +1120,8 @@ skip_normal_probe:
}

- if (data_interface->cur_altsetting->desc.bNumEndpoints < 2)
+ if (data_interface->cur_altsetting->desc.bNumEndpoints < 2 ||
+ control_interface->cur_altsetting->desc.bNumEndpoints == 0)
return -EINVAL;

epctrl = &control_interface->cur_altsetting->endpoint[0].desc;

Willy Tarreau

unread,

Oct 1, 2012, 7:17:43 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Bjørn Mork, Oliver Neukum, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Bj=F8rn=20Mork?= <bj...@mork.no>

commit b086b6b10d9f182cd8d2f0dcfd7fd11edba93fc9 upstream.

Clear the WDM_READ flag on empty reads to avoid running
forever in an infinite tight loop, causing lockups:

Jul 1 21:58:11 nemi kernel: [ 3658.898647] qmi_wwan 2-1:1.2: Unexpected error -71
Jul 1 21:58:36 nemi kernel: [ 3684.072021] BUG: soft lockup - CPU#0 stuck for 23s! [qmi.pl:12235]
Jul 1 21:58:36 nemi kernel: [ 3684.072212] CPU 0
Jul 1 21:58:36 nemi kernel: [ 3684.072355]
Jul 1 21:58:36 nemi kernel: [ 3684.072367] Pid: 12235, comm: qmi.pl Tainted: P O 3.5.0-rc2+ #13 LENOVO 2776LEG/2776LEG
Jul 1 21:58:36 nemi kernel: [ 3684.072383] RIP: 0010:[<ffffffffa0635008>] [<ffffffffa0635008>] spin_unlock_irq+0x8/0xc [cdc_wdm]
Jul 1 21:58:36 nemi kernel: [ 3684.072388] RSP: 0018:ffff88022dca1e70 EFLAGS: 00000282
Jul 1 21:58:36 nemi kernel: [ 3684.072393] RAX: ffff88022fc3f650 RBX: ffffffff811c56f7 RCX: 00000001000ce8c1
Jul 1 21:58:36 nemi kernel: [ 3684.072398] RDX: 0000000000000010 RSI: 000000000267d810 RDI: ffff88022fc3f650
Jul 1 21:58:36 nemi kernel: [ 3684.072403] RBP: ffff88022dca1eb0 R08: ffffffffa063578e R09: 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072407] R10: 0000000000000008 R11: 0000000000000246 R12: 0000000000000002
Jul 1 21:58:36 nemi kernel: [ 3684.072412] R13: 0000000000000246 R14: ffffffff00000002 R15: ffff8802281d8c88
Jul 1 21:58:36 nemi kernel: [ 3684.072418] FS: 00007f666a260700(0000) GS:ffff88023bc00000(0000) knlGS:0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072423] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jul 1 21:58:36 nemi kernel: [ 3684.072428] CR2: 000000000270d9d8 CR3: 000000022e865000 CR4: 00000000000007f0
Jul 1 21:58:36 nemi kernel: [ 3684.072433] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072438] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jul 1 21:58:36 nemi kernel: [ 3684.072444] Process qmi.pl (pid: 12235, threadinfo ffff88022dca0000, task ffff88022ff76380)
Jul 1 21:58:36 nemi kernel: [ 3684.072448] Stack:
Jul 1 21:58:36 nemi kernel: [ 3684.072458] ffffffffa063592e 0000000100020000 ffff88022fc3f650 ffff88022fc3f6a8
Jul 1 21:58:36 nemi kernel: [ 3684.072466] 0000000000000200 0000000100000000 000000000267d810 0000000000000000
Jul 1 21:58:36 nemi kernel: [ 3684.072475] 0000000000000000 ffff880212cfb6d0 0000000000000200 ffff880212cfb6c0
Jul 1 21:58:36 nemi kernel: [ 3684.072479] Call Trace:
Jul 1 21:58:36 nemi kernel: [ 3684.072489] [<ffffffffa063592e>] ? wdm_read+0x1a0/0x263 [cdc_wdm]
Jul 1 21:58:36 nemi kernel: [ 3684.072500] [<ffffffff8110adb7>] ? vfs_read+0xa1/0xfb
Jul 1 21:58:36 nemi kernel: [ 3684.072509] [<ffffffff81040589>] ? alarm_setitimer+0x35/0x64
Jul 1 21:58:36 nemi kernel: [ 3684.072517] [<ffffffff8110aec7>] ? sys_read+0x45/0x6e
Jul 1 21:58:36 nemi kernel: [ 3684.072525] [<ffffffff813725f9>] ? system_call_fastpath+0x16/0x1b
Jul 1 21:58:36 nemi kernel: [ 3684.072557] Code: <66> 66 90 c3 83 ff ed 89 f8 74 16 7f 06 83 ff a1 75 0a c3 83 ff f4

The WDM_READ flag is normally cleared by wdm_int_callback
before resubmitting the read urb, and set by wdm_in_callback
when this urb returns with data or an error. But a crashing
device may cause both a read error and cancelling all urbs.
Make sure that the flag is cleared by wdm_read if the buffer
is empty.

We don't clear the flag on errors, as there may be pending
data in the buffer which should be processed. The flag will
instead be cleared on the next wdm_read call.

Signed-off-by: Bjørn Mork <bj...@mork.no>
Acked-by: Oliver Neukum <one...@suse.de>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/usb/class/cdc-wdm.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/drivers/usb/class/cdc-wdm.c b/drivers/usb/class/cdc-wdm.c
index d71514b..37f2899 100644
--- a/drivers/usb/class/cdc-wdm.c
+++ b/drivers/usb/class/cdc-wdm.c
@@ -441,6 +441,8 @@ retry:
goto retry;
}
if (!desc->reslength) { /* zero length read */
+ dev_dbg(&desc->intf->dev, "%s: zero length - clearing WDM_READ\n", __func__);
+ clear_bit(WDM_READ, &desc->flags);
spin_unlock_irq(&desc->iuspin);
goto retry;

Willy Tarreau

unread,

Oct 1, 2012, 7:17:52 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Tony Luck, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Tony Luck <tony...@intel.com>

commit cbc96b7594b5691d61eba2db8b2ea723645be9ca upstream.

Many platforms have per-machine instance data (serial numbers,
asset tags, etc.) squirreled away in areas that are accessed
during early system bringup. Mixing this data into the random
pools has a very high value in providing better random data,
so we should allow (and even encourage) architecture code to
call add_device_randomness() from the setup_arch() paths.

However, this limits our options for internal structure of
the random driver since random_initialize() is not called
until long after setup_arch().

Add a big fat comment to rand_initialize() spelling out
this requirement.

Suggested-by: Theodore Ts'o <ty...@mit.edu>
Signed-off-by: Tony Luck <tony...@intel.com>

Signed-off-by: Theodore Ts'o <ty...@mit.edu>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 10 ++++++++++
1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index cbb63b0..446b20a 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1077,6 +1077,16 @@ static void init_std_data(struct entropy_store *r)
mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
}

+/*
+ * Note that setup_arch() may call add_device_randomness()
+ * long before we get here. This allows seeding of the pools
+ * with some platform dependent data very early in the boot
+ * process. But it limits our options here. We must use
+ * statically allocated structures that already have all
+ * initializations complete at compile time. We should also
+ * take care not to overwrite the precious per platform data
+ * we were given.
+ */
static int rand_initialize(void)
{
init_std_data(&input_pool);

Willy Tarreau

unread,

Oct 1, 2012, 7:18:06 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Hugh Dickins, Miklos Szeredi, Badari Pulavarty, Nick Piggin, Ben Hutchings, Andy Lutomirski, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Andy Lutomirski <lu...@amacapital.net>

commit 9ab4233dd08036fe34a89c7dc6f47a8bf2eb29eb upstream.

Otherwise the code races with munmap (causing a use-after-free
of the vma) or with close (causing a use-after-free of the struct
file).

The bug was introduced by commit 90ed52ebe481 ("[PATCH] holepunch: fix
mmap_sem i_mutex deadlock")

[bwh: Backported to 3.2:
- Adjust context
- madvise_remove() calls vmtruncate_range(), not do_fallocate()]
[luto: Backported to 3.0: Adjust context]

Cc: Hugh Dickins <hu...@veritas.com>
Cc: Miklos Szeredi <msze...@suse.cz>
Cc: Badari Pulavarty <pba...@us.ibm.com>
Cc: Nick Piggin <npi...@suse.de>
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>
Signed-off-by: Andy Lutomirski <lu...@amacapital.net>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

mm/madvise.c | 16 +++++++++++++---
1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 35b1479..e405c5f 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -12,6 +12,7 @@
#include <linux/hugetlb.h>
#include <linux/sched.h>
#include <linux/ksm.h>
+#include <linux/file.h>

/*
* Any behaviour which results in changes to the vma->vm_flags needs to
@@ -190,14 +191,16 @@ static long madvise_remove(struct vm_area_struct *vma,
struct address_space *mapping;
loff_t offset, endoff;
int error;
+ struct file *f;

*prev = NULL; /* tell sys_madvise we drop mmap_sem */

if (vma->vm_flags & (VM_LOCKED|VM_NONLINEAR|VM_HUGETLB))
return -EINVAL;

- if (!vma->vm_file || !vma->vm_file->f_mapping
- || !vma->vm_file->f_mapping->host) {
+ f = vma->vm_file;
+
+ if (!f || !f->f_mapping || !f->f_mapping->host) {
return -EINVAL;
}

@@ -211,9 +214,16 @@ static long madvise_remove(struct vm_area_struct *vma,
endoff = (loff_t)(end - vma->vm_start - 1)
+ ((loff_t)vma->vm_pgoff << PAGE_SHIFT);

- /* vmtruncate_range needs to take i_mutex and i_alloc_sem */
+ /*
+ * vmtruncate_range may need to take i_mutex and i_alloc_sem.
+ * We need to explicitly grab a reference because the vma (and
+ * hence the vma's reference to the file) can go away as soon as
+ * we drop mmap_sem.
+ */
+ get_file(f);
up_read(&current->mm->mmap_sem);
error = vmtruncate_range(mapping->host, offset, endoff);
+ fput(f);
down_read(&current->mm->mmap_sem);
return error;

Willy Tarreau

unread,

Oct 1, 2012, 7:18:21 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Roger Blofeld, Benjamin Herrenschmidt, Paul Gortmaker, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: roger blofeld <blof...@yahoo.com>

commit fd5a42980e1cf327b7240adf5e7b51ea41c23437 upstream.

Just like the module loader, ftrace needs to be updated to use r12
instead of r11 with newer gcc's.

Signed-off-by: Roger Blofeld <blof...@yahoo.com>
Signed-off-by: Benjamin Herrenschmidt <be...@kernel.crashing.org>
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

commit a8ed5765b5a8bf44a86284d80afd24f37a23e369 upstream.

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/powerpc/kernel/ftrace.c | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/ftrace.c b/arch/powerpc/kernel/ftrace.c
index ce1f3e4..eda40d2 100644
--- a/arch/powerpc/kernel/ftrace.c
+++ b/arch/powerpc/kernel/ftrace.c
@@ -244,9 +244,9 @@ __ftrace_make_nop(struct module *mod,

/*
* On PPC32 the trampoline looks like:
- * 0x3d, 0x60, 0x00, 0x00 lis r11,sym@ha
- * 0x39, 0x6b, 0x00, 0x00 addi r11,r11,sym@l
- * 0x7d, 0x69, 0x03, 0xa6 mtctr r11
+ * 0x3d, 0x80, 0x00, 0x00 lis r12,sym@ha
+ * 0x39, 0x8c, 0x00, 0x00 addi r12,r12,sym@l
+ * 0x7d, 0x89, 0x03, 0xa6 mtctr r12
* 0x4e, 0x80, 0x04, 0x20 bctr
*/

@@ -261,9 +261,9 @@ __ftrace_make_nop(struct module *mod,
pr_devel(" %08x %08x ", jmp[0], jmp[1]);

/* verify that this is what we expect it to be */
- if (((jmp[0] & 0xffff0000) != 0x3d600000) ||
- ((jmp[1] & 0xffff0000) != 0x396b0000) ||
- (jmp[2] != 0x7d6903a6) ||
+ if (((jmp[0] & 0xffff0000) != 0x3d800000) ||
+ ((jmp[1] & 0xffff0000) != 0x398c0000) ||
+ (jmp[2] != 0x7d8903a6) ||
(jmp[3] != 0x4e800420)) {
printk(KERN_ERR "Not a trampoline\n");
return -EINVAL;

Willy Tarreau

unread,

Oct 1, 2012, 7:18:44 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jan Kara, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jan Kara <ja...@suse.cz>

commit 1df2ae31c724e57be9d7ac00d78db8a5dabdd050 upstream.

Add sanity checks when loading sparing table from disk to avoid accessing
unallocated memory or writing to it.

Signed-off-by: Jan Kara <ja...@suse.cz>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/udf/super.c | 86 ++++++++++++++++++++++++++++++++++---------------------
1 files changed, 53 insertions(+), 33 deletions(-)

diff --git a/fs/udf/super.c b/fs/udf/super.c
index 0388d43..f0314da 100644
--- a/fs/udf/super.c
+++ b/fs/udf/super.c
@@ -57,6 +57,7 @@
#include <linux/seq_file.h>
#include <linux/bitmap.h>
#include <linux/crc-itu-t.h>
+#include <linux/log2.h>
#include <asm/byteorder.h>

#include "udf_sb.h"
@@ -1239,11 +1240,59 @@ out_bh:
return ret;
}

+static int udf_load_sparable_map(struct super_block *sb,
+ struct udf_part_map *map,
+ struct sparablePartitionMap *spm)
+{
+ uint32_t loc;
+ uint16_t ident;
+ struct sparingTable *st;
+ struct udf_sparing_data *sdata = &map->s_type_specific.s_sparing;
+ int i;
+ struct buffer_head *bh;
+
+ map->s_partition_type = UDF_SPARABLE_MAP15;
+ sdata->s_packet_len = le16_to_cpu(spm->packetLength);
+ if (!is_power_of_2(sdata->s_packet_len)) {
+ udf_error(sb, __func__, "error loading logical volume descriptor: "
+ "Invalid packet length %u\n",
+ (unsigned)sdata->s_packet_len);
+ return -EIO;
+ }
+ if (spm->numSparingTables > 4) {
+ udf_error(sb, __func__, "error loading logical volume descriptor: "
+ "Too many sparing tables (%d)\n",
+ (int)spm->numSparingTables);
+ return -EIO;
+ }
+
+ for (i = 0; i < spm->numSparingTables; i++) {
+ loc = le32_to_cpu(spm->locSparingTable[i]);
+ bh = udf_read_tagged(sb, loc, loc, &ident);
+ if (!bh)
+ continue;
+
+ st = (struct sparingTable *)bh->b_data;
+ if (ident != 0 ||
+ strncmp(st->sparingIdent.ident, UDF_ID_SPARING,
+ strlen(UDF_ID_SPARING)) ||
+ sizeof(*st) + le16_to_cpu(st->reallocationTableLen) >
+ sb->s_blocksize) {
+ brelse(bh);
+ continue;
+ }
+
+ sdata->s_spar_map[i] = bh;
+ }
+ map->s_partition_func = udf_get_pblock_spar15;

+ return 0;
+}
+

static int udf_load_logicalvol(struct super_block *sb, sector_t block,
struct kernel_lb_addr *fileset)
{
struct logicalVolDesc *lvd;
- int i, j, offset;
+ int i, offset;
uint8_t type;
struct udf_sb_info *sbi = UDF_SB(sb);
struct genericPartitionMap *gpm;
@@ -1308,38 +1357,9 @@ static int udf_load_logicalvol(struct super_block *sb, sector_t block,
} else if (!strncmp(upm2->partIdent.ident,
UDF_ID_SPARABLE,
strlen(UDF_ID_SPARABLE))) {
- uint32_t loc;
- struct sparingTable *st;
- struct sparablePartitionMap *spm =
- (struct sparablePartitionMap *)gpm;
-
- map->s_partition_type = UDF_SPARABLE_MAP15;
- map->s_type_specific.s_sparing.s_packet_len =
- le16_to_cpu(spm->packetLength);
- for (j = 0; j < spm->numSparingTables; j++) {
- struct buffer_head *bh2;
-
- loc = le32_to_cpu(
- spm->locSparingTable[j]);
- bh2 = udf_read_tagged(sb, loc, loc,
- &ident);
- map->s_type_specific.s_sparing.
- s_spar_map[j] = bh2;
-
- if (bh2 == NULL)
- continue;
-
- st = (struct sparingTable *)bh2->b_data;
- if (ident != 0 || strncmp(
- st->sparingIdent.ident,
- UDF_ID_SPARING,
- strlen(UDF_ID_SPARING))) {
- brelse(bh2);
- map->s_type_specific.s_sparing.
- s_spar_map[j] = NULL;
- }
- }
- map->s_partition_func = udf_get_pblock_spar15;
+ if (udf_load_sparable_map(sb, map,
+ (struct sparablePartitionMap *)gpm) < 0)
+ goto out_bh;
} else if (!strncmp(upm2->partIdent.ident,
UDF_ID_METADATA,
strlen(UDF_ID_METADATA))) {

Willy Tarreau

unread,

Oct 1, 2012, 7:18:54 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Theodore Tso, Sedat Dilek, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit c5857ccf293968348e5eb4ebedc68074de3dcda6 upstream.

With the new interrupt sampling system, we are no longer using the
timer_rand_state structure in the irq descriptor, so we can stop
initializing it now.

[ Merged in fixes from Sedat to find some last missing references to
rand_initialize_irq() ]

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Sedat Dilek <sedat...@gmail.com>
[PG: in .34 the irqdesc.h content is in irq.h instead.]
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/ia64/kernel/irq_ia64.c | 1 -
drivers/char/random.c | 55 -------------------------------------------
include/linux/irq.h | 1 -
include/linux/random.h | 2 -
kernel/irq/manage.c | 17 -------------
5 files changed, 0 insertions(+), 76 deletions(-)

diff --git a/arch/ia64/kernel/irq_ia64.c b/arch/ia64/kernel/irq_ia64.c
index dd9d7b5..463b8a7 100644
--- a/arch/ia64/kernel/irq_ia64.c
+++ b/arch/ia64/kernel/irq_ia64.c
@@ -24,7 +24,6 @@
#include <linux/kernel_stat.h>
#include <linux/slab.h>
#include <linux/ptrace.h>
-#include <linux/random.h> /* for rand_initialize_irq() */
#include <linux/signal.h>
#include <linux/smp.h>
#include <linux/threads.h>
diff --git a/drivers/char/random.c b/drivers/char/random.c
index 3ea1ddb..cbb63b0 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -621,43 +621,6 @@ struct timer_rand_state {
unsigned dont_count_entropy:1;
};

-#ifndef CONFIG_GENERIC_HARDIRQS
-
-static struct timer_rand_state *irq_timer_state[NR_IRQS];
-
-static struct timer_rand_state *get_timer_rand_state(unsigned int irq)
-{
- return irq_timer_state[irq];
-}
-
-static void set_timer_rand_state(unsigned int irq,
- struct timer_rand_state *state)
-{
- irq_timer_state[irq] = state;
-}
-
-#else
-
-static struct timer_rand_state *get_timer_rand_state(unsigned int irq)
-{
- struct irq_desc *desc;
-
- desc = irq_to_desc(irq);
-
- return desc->timer_rand_state;
-}
-
-static void set_timer_rand_state(unsigned int irq,
- struct timer_rand_state *state)
-{
- struct irq_desc *desc;
-
- desc = irq_to_desc(irq);
-
- desc->timer_rand_state = state;
-}
-#endif
-
/*

* Add device- or boot-specific data to the input and nonblocking

* pools to help initialize them to unique values.

@@ -1123,24 +1086,6 @@ static int rand_initialize(void)
}
module_init(rand_initialize);

-void rand_initialize_irq(int irq)
-{
- struct timer_rand_state *state;
-
- state = get_timer_rand_state(irq);
-
- if (state)
- return;
-
- /*
- * If kzalloc returns null, we just won't use that entropy
- * source.
- */
- state = kzalloc(sizeof(struct timer_rand_state), GFP_KERNEL);
- if (state)
- set_timer_rand_state(irq, state);
-}
-
#ifdef CONFIG_BLOCK
void rand_initialize_disk(struct gendisk *disk)
{
diff --git a/include/linux/irq.h b/include/linux/irq.h
index 9e5f45a..2333710 100644
--- a/include/linux/irq.h
+++ b/include/linux/irq.h
@@ -174,7 +174,6 @@ struct irq_2_iommu;
*/
struct irq_desc {
unsigned int irq;
- struct timer_rand_state *timer_rand_state;
unsigned int *kstat_irqs;
#ifdef CONFIG_INTR_REMAP
struct irq_2_iommu *irq_2_iommu;
diff --git a/include/linux/random.h b/include/linux/random.h
index 5a376bc..1864957 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -44,8 +44,6 @@ struct rand_pool_info {

#ifdef __KERNEL__

-extern void rand_initialize_irq(int irq);
-

extern void add_device_randomness(const void *, unsigned int);

extern void add_input_randomness(unsigned int type, unsigned int code,
unsigned int value);
diff --git a/kernel/irq/manage.c b/kernel/irq/manage.c
index 315705c..5dd29f3 100644
--- a/kernel/irq/manage.c
+++ b/kernel/irq/manage.c
@@ -633,22 +633,6 @@ __setup_irq(unsigned int irq, struct irq_desc *desc, struct irqaction *new)

if (desc->chip == &no_irq_chip)
return -ENOSYS;
- /*
- * Some drivers like serial.c use request_irq() heavily,
- * so we have to be careful not to interfere with a
- * running system.
- */
- if (new->flags & IRQF_SAMPLE_RANDOM) {
- /*
- * This function might sleep, we want to call it first,
- * outside of the atomic block.
- * Yes, this might clear the entropy pool if the wrong
- * driver is attempted to be loaded, without actually
- * installing a new handler, but is this really a problem,
- * only the sysadmin is able to do this.
- */
- rand_initialize_irq(irq);
- }

/* Oneshot interrupts are not allowed with shared */
if ((new->flags & IRQF_ONESHOT) && (new->flags & IRQF_SHARED))
@@ -1021,7 +1005,6 @@ EXPORT_SYMBOL(free_irq);
*
* IRQF_SHARED Interrupt is shared
* IRQF_DISABLED Disable local interrupts while processing
- * IRQF_SAMPLE_RANDOM The interrupt can be used for entropy
* IRQF_TRIGGER_* Specify active edge(s) or level
*
*/

Willy Tarreau

unread,

Oct 1, 2012, 7:19:14 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Lan Tianyu, Len Brown, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Lan Tianyu <tiany...@intel.com>

commit f197ac13f6eeb351b31250b9ab7d0da17434ea36 upstream.

In the ac.c, power_supply_register()'s return value is not checked.

As a result, the driver's add() ops may return success
even though the device failed to initialize.

For example, some BIOS may describe two ACADs in the same DSDT.
The second ACAD device will fail to register,
but ACPI driver's add() ops returns sucessfully.
The ACPI device will receive ACPI notification and cause OOPS.

https://bugzilla.redhat.com/show_bug.cgi?id=772730

Signed-off-by: Lan Tianyu <tiany...@intel.com>
Signed-off-by: Len Brown <len....@intel.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/acpi/ac.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/drivers/acpi/ac.c b/drivers/acpi/ac.c
index b6ed60b..bc3f918 100644
--- a/drivers/acpi/ac.c
+++ b/drivers/acpi/ac.c
@@ -287,7 +287,9 @@ static int acpi_ac_add(struct acpi_device *device)
ac->charger.properties = ac_props;
ac->charger.num_properties = ARRAY_SIZE(ac_props);
ac->charger.get_property = get_ac_property;
- power_supply_register(&ac->device->dev, &ac->charger);
+ result = power_supply_register(&ac->device->dev, &ac->charger);
+ if (result)
+ goto end;
#endif

printk(KERN_INFO PREFIX "%s [%s] (%s)\n",

Willy Tarreau

unread,

Oct 1, 2012, 7:19:25 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Eric Dumazet, David S. Miller, Ben Hutchings, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Eric Dumazet <eric.d...@gmail.com>

commit fdf5af0daf8019cec2396cdef8fb042d80fe71fa upstream.

Denys Fedoryshchenko reported that SYN+FIN attacks were bringing his
linux machines to their limits.

Dont call conn_request() if the TCP flags includes SYN flag

Reported-by: Denys Fedoryshchenko <de...@visp.net.lb>
Signed-off-by: Eric Dumazet <eric.d...@gmail.com>

Signed-off-by: David S. Miller <da...@davemloft.net>

Cc: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/ipv4/tcp_input.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 4b148e5..db755c4 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -5634,6 +5634,8 @@ int tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb,
goto discard;

if (th->syn) {
+ if (th->fin)
+ goto discard;
if (icsk->icsk_af_ops->conn_request(sk, skb) < 0)
return 1;

Willy Tarreau

unread,

Oct 1, 2012, 7:19:38 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Stephan Baerwolf, Marcelo Tosatti, Ben Hutchings, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: =?latin1?q?Stephan=20B=E4rwolf?= <stephan....@tu-ilmenau.de>

commit bdb42f5afebe208eae90406959383856ae2caf2b upstream

On hosts without this patch, 32bit guests will crash (and 64bit guests
may behave in a wrong way) for example by simply executing following
nasm-demo-application:

[bits 32]
global _start
SECTION .text
_start: syscall

(I tested it with winxp and linux - both always crashed)

Disassembly of section .text:

00000000 <_start>:
0: 0f 05 syscall

The reason seems a missing "invalid opcode"-trap (int6) for the
syscall opcode "0f05", which is not available on Intel CPUs
within non-longmodes, as also on some AMD CPUs within legacy-mode.
(depending on CPU vendor, MSR_EFER and cpuid)

Because previous mentioned OSs may not engage corresponding
syscall target-registers (STAR, LSTAR, CSTAR), they remain
NULL and (non trapping) syscalls are leading to multiple
faults and finally crashs.

Depending on the architecture (AMD or Intel) pretended by
guests, various checks according to vendor's documentation
are implemented to overcome the current issue and behave
like the CPUs physical counterparts.

[mtosatti: cleanup/beautify code]

[bwh: Backport to 2.6.32:
- Add the prerequisite read of EFER
- Return -1 in the error cases rather than invoking emulate_ud()
directly
- Adjust context]
[dannf: fix build by passing x86_emulate_ops through each call]

Signed-off-by: Stephan Baerwolf <stephan....@tu-ilmenau.de>
Signed-off-by: Marcelo Tosatti <mtos...@redhat.com>
Signed-off-by: Ben Hutchings <b...@decadent.org.uk>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

arch/x86/include/asm/kvm_emulate.h | 13 ++++++++
arch/x86/kvm/emulate.c | 57 ++++++++++++++++++++++++++++++++++-
2 files changed, 68 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h
index 61bf2eb..cc44e3d 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -192,6 +192,19 @@ struct x86_emulate_ctxt {
#define X86EMUL_MODE_HOST X86EMUL_MODE_PROT64
#endif

+/* CPUID vendors */
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx 0x68747541
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx 0x444d4163
+#define X86EMUL_CPUID_VENDOR_AuthenticAMD_edx 0x69746e65
+
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx 0x69444d41
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx 0x21726574
+#define X86EMUL_CPUID_VENDOR_AMDisbetterI_edx 0x74656273
+
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_ebx 0x756e6547
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_ecx 0x6c65746e
+#define X86EMUL_CPUID_VENDOR_GenuineIntel_edx 0x49656e69
+
int x86_decode_insn(struct x86_emulate_ctxt *ctxt,
struct x86_emulate_ops *ops);
int x86_emulate_insn(struct x86_emulate_ctxt *ctxt,
diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index 1350e43..aa2d905 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -1495,20 +1495,73 @@ setup_syscalls_segments(struct x86_emulate_ctxt *ctxt,
ss->present = 1;
}

+static bool em_syscall_is_enabled(struct x86_emulate_ctxt *ctxt,
+ struct x86_emulate_ops *ops)
+{
+ u32 eax, ebx, ecx, edx;
+
+ /*
+ * syscall should always be enabled in longmode - so only become
+ * vendor specific (cpuid) if other modes are active...
+ */
+ if (ctxt->mode == X86EMUL_MODE_PROT64)
+ return true;
+
+ eax = 0x00000000;
+ ecx = 0x00000000;
+ if (ops->get_cpuid(ctxt, &eax, &ebx, &ecx, &edx)) {
+ /*
+ * Intel ("GenuineIntel")
+ * remark: Intel CPUs only support "syscall" in 64bit
+ * longmode. Also an 64bit guest with a
+ * 32bit compat-app running will #UD !! While this
+ * behaviour can be fixed (by emulating) into AMD
+ * response - CPUs of AMD can't behave like Intel.
+ */
+ if (ebx == X86EMUL_CPUID_VENDOR_GenuineIntel_ebx &&
+ ecx == X86EMUL_CPUID_VENDOR_GenuineIntel_ecx &&
+ edx == X86EMUL_CPUID_VENDOR_GenuineIntel_edx)
+ return false;
+
+ /* AMD ("AuthenticAMD") */
+ if (ebx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ebx &&
+ ecx == X86EMUL_CPUID_VENDOR_AuthenticAMD_ecx &&
+ edx == X86EMUL_CPUID_VENDOR_AuthenticAMD_edx)
+ return true;
+
+ /* AMD ("AMDisbetter!") */
+ if (ebx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ebx &&
+ ecx == X86EMUL_CPUID_VENDOR_AMDisbetterI_ecx &&
+ edx == X86EMUL_CPUID_VENDOR_AMDisbetterI_edx)
+ return true;
+ }
+
+ /* default: (not Intel, not AMD), apply Intel's stricter rules... */
+ return false;
+}
+
static int
-emulate_syscall(struct x86_emulate_ctxt *ctxt)
+emulate_syscall(struct x86_emulate_ctxt *ctxt, struct x86_emulate_ops *ops)
{
struct decode_cache *c = &ctxt->decode;
struct kvm_segment cs, ss;
u64 msr_data;
+ u64 efer = 0;

/* syscall is not available in real mode */
if (c->lock_prefix || ctxt->mode == X86EMUL_MODE_REAL
|| ctxt->mode == X86EMUL_MODE_VM86)
return -1;

+ if (!(em_syscall_is_enabled(ctxt, ops)))
+ return -1;
+
+ kvm_x86_ops->get_msr(ctxt->vcpu, MSR_EFER, &efer);
setup_syscalls_segments(ctxt, &cs, &ss);

+ if (!(efer & EFER_SCE))
+ return -1;
+
kvm_x86_ops->get_msr(ctxt->vcpu, MSR_STAR, &msr_data);
msr_data >>= 32;
cs.selector = (u16)(msr_data & 0xfffc);
@@ -2342,7 +2395,7 @@ twobyte_insn:
}
break;
case 0x05: /* syscall */
- if (emulate_syscall(ctxt) == -1)
+ if (emulate_syscall(ctxt, ops) == -1)
goto cannot_emulate;
else
goto writeback;

Willy Tarreau

unread,

Oct 1, 2012, 7:19:48 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, J. Bruce Fields, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: J. Bruce Fields <bfi...@redhat.com>

commit f06f00a24d76e168ecb38d352126fd203937b601 upstream.

svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately. Meanwhile other
threads could send further replies before the socket is really shut
down. This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Reported-by: Malahal Naineni <mal...@us.ibm.com>
Tested-by: Malahal Naineni <mal...@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfi...@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

net/sunrpc/svc_xprt.c | 3 ++-
1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/net/sunrpc/svc_xprt.c b/net/sunrpc/svc_xprt.c
index 0ab2fed..8d72660 100644
--- a/net/sunrpc/svc_xprt.c
+++ b/net/sunrpc/svc_xprt.c
@@ -808,7 +808,8 @@ int svc_send(struct svc_rqst *rqstp)

/* Grab mutex to serialize outgoing data. */
mutex_lock(&xprt->xpt_mutex);
- if (test_bit(XPT_DEAD, &xprt->xpt_flags))
+ if (test_bit(XPT_DEAD, &xprt->xpt_flags)
+ || test_bit(XPT_CLOSE, &xprt->xpt_flags))
len = -ENOTCONN;
else
len = xprt->xpt_ops->xpo_sendto(rqstp);

Willy Tarreau

unread,

Oct 1, 2012, 7:19:58 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Theodore Ts'o <ty...@mit.edu>

commit 902c098a3663de3fa18639efbb71b6080f0bcd3c upstream.

The real-time Linux folks don't like add_interrupt_randomness() taking
a spinlock since it is called in the low-level interrupt routine.
This also allows us to reduce the overhead in the fast path, for the
random driver, which is the interrupt collection path.

Signed-off-by: "Theodore Ts'o" <ty...@mit.edu>

Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 78 ++++++++++++++++++++++++------------------------
1 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 30eae12..d7135b9 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -418,9 +418,9 @@ struct entropy_store {
/* read-write data: */
spinlock_t lock;
unsigned add_ptr;
+ unsigned input_rotate;
int entropy_count;
int entropy_total;
- int input_rotate;
unsigned int initialized:1;
__u8 last_data[EXTRACT_SIZE];
};
@@ -468,26 +468,24 @@ static __u32 const twist_table[8] = {
* it's cheap to do so and helps slightly in the expected case where
* the entropy is concentrated in the low-order bits.
*/
-static void mix_pool_bytes_extract(struct entropy_store *r, const void *in,
- int nbytes, __u8 out[64])
+static void __mix_pool_bytes(struct entropy_store *r, const void *in,
+ int nbytes, __u8 out[64])
{
unsigned long i, j, tap1, tap2, tap3, tap4, tap5;
int input_rotate;
int wordmask = r->poolinfo->poolwords - 1;
const char *bytes = in;
__u32 w;
- unsigned long flags;

- /* Taps are constant, so we can load them without holding r->lock. */
tap1 = r->poolinfo->tap1;
tap2 = r->poolinfo->tap2;
tap3 = r->poolinfo->tap3;
tap4 = r->poolinfo->tap4;
tap5 = r->poolinfo->tap5;

- spin_lock_irqsave(&r->lock, flags);
- input_rotate = r->input_rotate;
- i = r->add_ptr;
+ smp_rmb();
+ input_rotate = ACCESS_ONCE(r->input_rotate);
+ i = ACCESS_ONCE(r->add_ptr);

/* mix one byte at a time to simplify size handling and churn faster */
while (nbytes--) {
@@ -514,19 +512,23 @@ static void mix_pool_bytes_extract(struct entropy_store *r, const void *in,
input_rotate += i ? 7 : 14;
}

- r->input_rotate = input_rotate;
- r->add_ptr = i;
+ ACCESS_ONCE(r->input_rotate) = input_rotate;
+ ACCESS_ONCE(r->add_ptr) = i;
+ smp_wmb();

if (out)
for (j = 0; j < 16; j++)
((__u32 *)out)[j] = r->pool[(i - j) & wordmask];
-
- spin_unlock_irqrestore(&r->lock, flags);
}

-static void mix_pool_bytes(struct entropy_store *r, const void *in, int bytes)
+static void mix_pool_bytes(struct entropy_store *r, const void *in,
+ int nbytes, __u8 out[64])
{
- mix_pool_bytes_extract(r, in, bytes, NULL);
+ unsigned long flags;
+
+ spin_lock_irqsave(&r->lock, flags);
+ __mix_pool_bytes(r, in, nbytes, out);
+ spin_unlock_irqrestore(&r->lock, flags);
}

struct fast_pool {
@@ -564,23 +566,22 @@ static void fast_mix(struct fast_pool *f, const void *in, int nbytes)
*/
static void credit_entropy_bits(struct entropy_store *r, int nbits)
{
- unsigned long flags;
- int entropy_count;
+ int entropy_count, orig;

if (!nbits)
return;

- spin_lock_irqsave(&r->lock, flags);
-
DEBUG_ENT("added %d entropy credits to %s\n", nbits, r->name);
- entropy_count = r->entropy_count;
+retry:
+ entropy_count = orig = ACCESS_ONCE(r->entropy_count);
entropy_count += nbits;
if (entropy_count < 0) {
DEBUG_ENT("negative entropy/overflow\n");
entropy_count = 0;
} else if (entropy_count > r->poolinfo->POOLBITS)
entropy_count = r->poolinfo->POOLBITS;
- r->entropy_count = entropy_count;
+ if (cmpxchg(&r->entropy_count, orig, entropy_count) != orig)
+ goto retry;

if (!r->initialized && nbits > 0) {
r->entropy_total += nbits;
@@ -593,7 +594,6 @@ static void credit_entropy_bits(struct entropy_store *r, int nbits)
wake_up_interruptible(&random_read_wait);
kill_fasync(&fasync, SIGIO, POLL_IN);
}
- spin_unlock_irqrestore(&r->lock, flags);
}

/*********************************************************************
@@ -680,7 +680,7 @@ static void add_timer_randomness(struct timer_rand_state *state, unsigned num)
sample.cycles = get_cycles();

sample.num = num;
- mix_pool_bytes(&input_pool, &sample, sizeof(sample));
+ mix_pool_bytes(&input_pool, &sample, sizeof(sample), NULL);

/*
* Calculate number of bits of randomness we probably added.
@@ -764,7 +764,7 @@ void add_interrupt_randomness(int irq, int irq_flags)
fast_pool->last = now;

r = nonblocking_pool.initialized ? &input_pool : &nonblocking_pool;
- mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool));
+ __mix_pool_bytes(r, &fast_pool->pool, sizeof(fast_pool->pool), NULL);
/*
* If we don't have a valid cycle counter, and we see
* back-to-back timer interrupts, then skip giving credit for
@@ -829,7 +829,7 @@ static void xfer_secondary_pool(struct entropy_store *r, size_t nbytes)

bytes = extract_entropy(r->pull, tmp, bytes,
random_read_wakeup_thresh / 8, rsvd);
- mix_pool_bytes(r, tmp, bytes);
+ mix_pool_bytes(r, tmp, bytes, NULL);
credit_entropy_bits(r, bytes*8);
}
}
@@ -890,9 +890,11 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
int i;
__u32 hash[5], workspace[SHA_WORKSPACE_WORDS];
__u8 extract[64];
+ unsigned long flags;

/* Generate a hash across the pool, 16 words (512 bits) at a time */
sha_init(hash);
+ spin_lock_irqsave(&r->lock, flags);
for (i = 0; i < r->poolinfo->poolwords; i += 16)
sha_transform(hash, (__u8 *)(r->pool + i), workspace);

@@ -905,7 +907,8 @@ static void extract_buf(struct entropy_store *r, __u8 *out)
* brute-forcing the feedback as hard as brute-forcing the
* hash.
*/
- mix_pool_bytes_extract(r, hash, sizeof(hash), extract);
+ __mix_pool_bytes(r, hash, sizeof(hash), extract);
+ spin_unlock_irqrestore(&r->lock, flags);

/*
* To avoid duplicates, we atomically extract a portion of the
@@ -928,11 +931,10 @@ static void extract_buf(struct entropy_store *r, __u8 *out)

}

static ssize_t extract_entropy(struct entropy_store *r, void *buf,

- size_t nbytes, int min, int reserved)
+ size_t nbytes, int min, int reserved)
{
ssize_t ret = 0, i;
__u8 tmp[EXTRACT_SIZE];
- unsigned long flags;

xfer_secondary_pool(r, nbytes);
nbytes = account(r, nbytes, min, reserved);
@@ -941,6 +943,8 @@ static ssize_t extract_entropy(struct entropy_store *r, void *buf,
extract_buf(r, tmp);

if (fips_enabled) {
+ unsigned long flags;
+

spin_lock_irqsave(&r->lock, flags);
if (!memcmp(tmp, r->last_data, EXTRACT_SIZE))
panic("Hardware RNG duplicated output!\n");

@@ -1034,22 +1038,18 @@ EXPORT_SYMBOL(get_random_bytes);

static void init_std_data(struct entropy_store *r)

{
int i;
- ktime_t now;
- unsigned long flags;
+ ktime_t now = ktime_get_real();
+ unsigned long rv;

- spin_lock_irqsave(&r->lock, flags);
r->entropy_count = 0;
r->entropy_total = 0;
- spin_unlock_irqrestore(&r->lock, flags);
-
- now = ktime_get_real();
- mix_pool_bytes(r, &now, sizeof(now));
- for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof flags) {
- if (!arch_get_random_long(&flags))
+ mix_pool_bytes(r, &now, sizeof(now), NULL);
+ for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof(rv)) {
+ if (!arch_get_random_long(&rv))
break;
- mix_pool_bytes(r, &flags, sizeof(flags));
+ mix_pool_bytes(r, &rv, sizeof(rv), NULL);
}
- mix_pool_bytes(r, utsname(), sizeof(*(utsname())));
+ mix_pool_bytes(r, utsname(), sizeof(*(utsname())), NULL);
}

static int rand_initialize(void)
@@ -1186,7 +1186,7 @@ write_pool(struct entropy_store *r, const char __user *buffer, size_t count)
count -= bytes;
p += bytes;

- mix_pool_bytes(r, buf, bytes);
+ mix_pool_bytes(r, buf, bytes, NULL);
cond_resched();

Willy Tarreau

unread,

Oct 1, 2012, 7:20:17 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Dan Williams, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.w...@intel.com>

commit 281befa5592b0c5f9a3856b5666c62ac66d3d9ee upstream.

The pending == 2 case no longer exists in the driver so, we can use
ioat2_ring_pending() outside the lock to determine if there might be any
descriptors in the ring that the hardware has not seen.

Signed-off-by: Dan Williams <dan.j.w...@intel.com>
Backported-by: Mike Galbraith <efa...@gmx.de>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/dma/ioat/dma_v2.c | 34 ++++++++++++----------------------
drivers/dma/ioat/dma_v2.h | 2 --
2 files changed, 12 insertions(+), 24 deletions(-)

diff --git a/drivers/dma/ioat/dma_v2.c b/drivers/dma/ioat/dma_v2.c
index 5cc37af..d1be371 100644
--- a/drivers/dma/ioat/dma_v2.c
+++ b/drivers/dma/ioat/dma_v2.c
@@ -51,48 +51,40 @@ MODULE_PARM_DESC(ioat_ring_max_alloc_order,

void __ioat2_issue_pending(struct ioat2_dma_chan *ioat)
{
- void * __iomem reg_base = ioat->base.reg_base;
+ struct ioat_chan_common *chan = &ioat->base;

- ioat->pending = 0;
ioat->dmacount += ioat2_ring_pending(ioat);
ioat->issued = ioat->head;
/* make descriptor updates globally visible before notifying channel */
wmb();
- writew(ioat->dmacount, reg_base + IOAT_CHAN_DMACOUNT_OFFSET);
- dev_dbg(to_dev(&ioat->base),
+ writew(ioat->dmacount, chan->reg_base + IOAT_CHAN_DMACOUNT_OFFSET);
+ dev_dbg(to_dev(chan),
"%s: head: %#x tail: %#x issued: %#x count: %#x\n",
__func__, ioat->head, ioat->tail, ioat->issued, ioat->dmacount);
}

-void ioat2_issue_pending(struct dma_chan *chan)
+void ioat2_issue_pending(struct dma_chan *c)
{
- struct ioat2_dma_chan *ioat = to_ioat2_chan(chan);
+ struct ioat2_dma_chan *ioat = to_ioat2_chan(c);

- spin_lock_bh(&ioat->ring_lock);
- if (ioat->pending == 1)
+ if (ioat2_ring_pending(ioat)) {
+ spin_lock_bh(&ioat->ring_lock);
__ioat2_issue_pending(ioat);
- spin_unlock_bh(&ioat->ring_lock);
+ spin_unlock_bh(&ioat->ring_lock);
+ }
}

/**
* ioat2_update_pending - log pending descriptors
* @ioat: ioat2+ channel
*
- * set pending to '1' unless pending is already set to '2', pending == 2
- * indicates that submission is temporarily blocked due to an in-flight
- * reset. If we are already above the ioat_pending_level threshold then
- * just issue pending.
- *
- * called with ring_lock held
+ * Check if the number of unsubmitted descriptors has exceeded the
+ * watermark. Called with ring_lock held
*/
static void ioat2_update_pending(struct ioat2_dma_chan *ioat)
{
- if (unlikely(ioat->pending == 2))
- return;
- else if (ioat2_ring_pending(ioat) > ioat_pending_level)
+ if (ioat2_ring_pending(ioat) > ioat_pending_level)
__ioat2_issue_pending(ioat);
- else
- ioat->pending = 1;
}

static void __ioat2_start_null_desc(struct ioat2_dma_chan *ioat)
@@ -546,7 +538,6 @@ int ioat2_alloc_chan_resources(struct dma_chan *c)
ioat->head = 0;
ioat->issued = 0;
ioat->tail = 0;
- ioat->pending = 0;
ioat->alloc_order = order;
spin_unlock_bh(&ioat->ring_lock);

@@ -815,7 +806,6 @@ void ioat2_free_chan_resources(struct dma_chan *c)

chan->last_completion = 0;
chan->completion_dma = 0;
- ioat->pending = 0;
ioat->dmacount = 0;
}

diff --git a/drivers/dma/ioat/dma_v2.h b/drivers/dma/ioat/dma_v2.h
index 3afad8d..d211335 100644
--- a/drivers/dma/ioat/dma_v2.h
+++ b/drivers/dma/ioat/dma_v2.h
@@ -47,7 +47,6 @@ extern int ioat_ring_alloc_order;
* @head: allocated index
* @issued: hardware notification point
* @tail: cleanup index
- * @pending: lock free indicator for issued != head
* @dmacount: identical to 'head' except for occasionally resetting to zero
* @alloc_order: log2 of the number of allocated descriptors
* @ring: software ring buffer implementation of hardware ring
@@ -61,7 +60,6 @@ struct ioat2_dma_chan {
u16 tail;
u16 dmacount;
u16 alloc_order;
- int pending;
struct ioat_ring_ent **ring;
spinlock_t ring_lock;

Willy Tarreau

unread,

Oct 1, 2012, 7:20:37 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, H. Peter Anvin, Theodore Tso, Paul Gortmaker, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: H. Peter Anvin <h...@linux.intel.com>

commit 2dac8e54f988ab58525505d7ef982493374433c3 upstream.

When we are initializing using arch_get_random_long() we only need to
loop enough times to touch all the bytes in the buffer; using
poolwords for that does twice the number of operations necessary on a
64-bit machine, since in the random number generator code "word" means
32 bits.

Signed-off-by: H. Peter Anvin <h...@linux.intel.com>
Cc: "Theodore Ts'o" <ty...@mit.edu>
Link: http://lkml.kernel.org/r/1324589281-31931-1-...@mit.edu
Signed-off-by: Paul Gortmaker <paul.go...@windriver.com>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

drivers/char/random.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index d67890f..5e106b1 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -975,7 +975,7 @@ static void init_std_data(struct entropy_store *r)

now = ktime_get_real();
mix_pool_bytes(r, &now, sizeof(now));
- for (i = r->poolinfo->poolwords; i; i--) {
+ for (i = r->poolinfo->POOLBYTES; i > 0; i -= sizeof flags) {
if (!arch_get_random_long(&flags))
break;
mix_pool_bytes(r, &flags, sizeof(flags));

Willy Tarreau

unread,

Oct 1, 2012, 7:20:57 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Richard Kennedy, Matt Mackall, Herbert Xu, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Richard Kennedy <ric...@rsk.demon.co.uk>

commit 4015d9a865e3bcc42d88bedc8ce1551000bab664 upstream.

Re-order structure entropy_store to remove 8 bytes of padding on
64 bit builds, so shrinking this structure from 72 to 64 bytes
and allowing it to fit into one cache line.

Signed-off-by: Richard Kennedy <ric...@rsk.demon.co.uk>
Signed-off-by: Matt Mackall <m...@selenic.com>
Signed-off-by: Herbert Xu <her...@gondor.apana.org.au>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---
drivers/char/random.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c

index 3a19e2d..a6e258b 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -406,8 +406,8 @@ struct entropy_store {
struct poolinfo *poolinfo;
__u32 *pool;
const char *name;
- int limit;
struct entropy_store *pull;
+ int limit;

/* read-write data: */
spinlock_t lock;

Willy Tarreau

unread,

Oct 1, 2012, 7:21:10 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Trond Myklebust, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Trond Myklebust <Trond.M...@netapp.com>

commit 086600430493e04b802bee6e5b3ce0458e4eb77f upstream.

If the rpc call to NFS3PROC_FSINFO fails, then we need to report that
error so that the mount fails. Otherwise we can end up with a
superblock with completely unusable values for block sizes, maxfilesize,
etc.

Reported-by: Yuanming Chen <hikvisi...@163.com>
Signed-off-by: Trond Myklebust <Trond.M...@netapp.com>
Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>

Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/nfs/nfs3proc.c | 2 +-

1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/fs/nfs/nfs3proc.c b/fs/nfs/nfs3proc.c
index 3f8881d..59d9304 100644
--- a/fs/nfs/nfs3proc.c
+++ b/fs/nfs/nfs3proc.c
@@ -66,7 +66,7 @@ do_proc_get_root(struct rpc_clnt *client, struct nfs_fh *fhandle,
nfs_fattr_init(info->fattr);
status = rpc_call_sync(client, &msg, 0);
dprintk("%s: reply fsinfo: %d\n", __func__, status);
- if (!(info->fattr->valid & NFS_ATTR_FATTR)) {
+ if (status == 0 && !(info->fattr->valid & NFS_ATTR_FATTR)) {
msg.rpc_proc = &nfs3_procedures[NFS3PROC_GETATTR];
msg.rpc_resp = info->fattr;
status = rpc_call_sync(client, &msg, 0);

Willy Tarreau

unread,

Oct 1, 2012, 7:21:38 PM10/1/12

to linux-...@vger.kernel.org, sta...@vger.kernel.org, Jason Baron, Andrew Morton, Linus Torvalds, Greg Kroah-Hartman, Willy Tarreau

2.6.32-longterm review patch. If anyone has any objections, please let me know.

------------------

From: Jason Baron <jba...@redhat.com>

commit 93dc6107a76daed81c07f50215fa6ae77691634f upstream.

Commit 28d82dc1c4ed ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578

Signed-off-by: Jason Baron <jba...@redhat.com>
Cc: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Linus Torvalds <torv...@linux-foundation.org>

Signed-off-by: Greg Kroah-Hartman <gre...@linuxfoundation.org>
Signed-off-by: Willy Tarreau <w...@1wt.eu>
---

fs/eventpoll.c | 4 ++++
1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/fs/eventpoll.c b/fs/eventpoll.c
index 8da83d8..802b28d 100644
--- a/fs/eventpoll.c
+++ b/fs/eventpoll.c
@@ -975,6 +975,10 @@ static int path_count[PATH_ARR_SIZE];

static int path_count_inc(int nests)
{
+ /* Allow an arbitrary number of depth 1 paths */
+ if (nests == 0)
+ return 0;
+
if (++path_count[nests] > path_limits[nests])
return -1;
return 0;

[ 007/180] futex: Fix uninterruptible loop due to gate_area

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau

Willy Tarreau